API¶
This is the supported integrator contract for AbstractVoice.
Start with README.md and docs/getting-started.md for setup. Use
docs/faq.md for cache/history reset and troubleshooting, docs/repl_guide.md
for the interactive REPL, and docs/architecture.md for implementation details.
Implementation map:
- abstractvoice/voice_manager.py → abstractvoice/vm/manager.py (constructor + wiring)
- abstractvoice/vm/tts_mixin.py (TTS + cloning methods)
- abstractvoice/vm/stt_mixin.py (STT + listening methods)
- abstractvoice/vm/core.py (voice-mode behavior during playback)
Primary entry point¶
abstractvoice.VoiceManagerabstractvoice.VoiceProfile(data type; used by the voice-profile APIs)
from abstractvoice import VoiceManager
vm = VoiceManager(language="en", remote_api_key="sk-...", allow_downloads=True)
Constructor (most-used knobs)¶
The source of truth is abstractvoice/vm/manager.py:
VoiceManager(
language: str = "en",
tts_model: str | None = None,
whisper_model: str = "base",
debug_mode: bool = False,
tts_engine: str = "openai",
stt_engine: str = "openai",
allow_downloads: bool = True,
cloned_tts_streaming: bool = True,
cloning_engine: str = "f5_tts",
tts_delivery_mode: str | None = None, # buffered|streamed (override)
stt_model: str | None = None,
remote_base_url: str | None = None,
remote_api_key: str | None = None,
remote_timeout_s: float | None = None,
)
Notes:
- VoiceManager() and auto are remote-first. Hosted OpenAI audio requires OPENAI_API_KEY or remote_api_key=....
- allow_downloads gates implicit local model downloads in adapters. The REPL sets False (offline-first).
- whisper_model controls the faster-whisper model size used by local listen() / transcribe_*() paths.
- tts_engine supports:
- openai (default; remote OpenAI /v1/audio/speech; requires OPENAI_API_KEY)
- auto (deterministic default: resolves to openai)
- piper (local TTS; requires abstractvoice[local] or abstractvoice[piper])
- openai-compatible (remote compatible /v1/audio/speech; configure remote_base_url or ABSTRACTVOICE_REMOTE_BASE_URL)
- audiodit (LongCat-AudioDiT; requires abstractvoice[audiodit]; upstream focuses on EN/ZH; direct/base TTS has a known quality caveat in 0.8.1)
- omnivoice (OmniVoice; requires abstractvoice[omnivoice]; upstream supports 600+ languages)
- stt_engine supports openai|auto|faster_whisper|openai-compatible. auto resolves to openai. The local faster-whisper path requires abstractvoice[local] or abstractvoice[stt]. Missing credentials or missing explicit local dependencies raise actionable errors; the legacy OpenAI Whisper fallback was removed.
- tts_model is reserved/back-compat for local Piper (selection is language-driven today); for remote TTS it maps to the request model.
- For remote STT, stt_model maps to the transcription model.
- Remote configuration can be passed in the constructor or via env vars:
- OpenAI: OPENAI_API_KEY, optional ABSTRACTVOICE_OPENAI_TTS_MODEL, ABSTRACTVOICE_OPENAI_STT_MODEL
- Compatible endpoints: ABSTRACTVOICE_REMOTE_BASE_URL, optional ABSTRACTVOICE_REMOTE_API_KEY, ABSTRACTVOICE_REMOTE_TTS_MODEL, ABSTRACTVOICE_REMOTE_STT_MODEL
- OpenAI-compatible profile discovery: GET /audio/voices is tried by default for compatible endpoints; override with ABSTRACTVOICE_REMOTE_VOICE_PROFILE_PATH or ABSTRACTVOICE_REMOTE_VOICE_PROFILE_PATHS. Static voice/profile ids can also be supplied with ABSTRACTVOICE_REMOTE_TTS_VOICES.
- tts_delivery_mode is an optional override that applies consistently to both base TTS and cloned voices:
- buffered: synthesize full audio first (one payload)
- streamed: deliver audio in chunks when available (lower time-to-first-audio)
Supported language codes for the Piper mapping: en, fr, de, es, ru, zh (see abstractvoice/config/voice_catalog.py and abstractvoice/adapters/tts_piper.py).
For non-Piper engines (e.g. OmniVoice or remote OpenAI-compatible engines), language is treated as a pass-through hint and the engine decides what it supports.
TTS (text → audio)¶
speak(text: str, speed: float = 1.0, callback=None, voice: str | None = None, *, sanitize_syntax: bool = True) -> bool- Plays audio locally (non-blocking playback; synthesis time depends on backend).
- If
voiceis provided, it is treated as a clonedvoice_id(requiresabstractvoice[cloning]). -
By default, common Markdown syntax is stripped from spoken output (headers + emphasis). Set
sanitize_syntax=Falseto speak raw text. -
set_speed(speed: float) -> bool,get_speed() -> float -
Adjusts the default speaking speed used by
speak_to_*()and the REPL. -
set_tts_quality_preset(preset: str) -> bool,get_tts_quality_preset() -> str | None - Engine-agnostic speed/quality knob (
low|standard|high). Back-compat aliases:fast→low,balanced→standard. - Engines that don’t support quality tuning may return
False/None(Piper is typically a no-op). -
For AudioDiT this primarily maps to diffusion
steps(and a small guidance-strength tweak). -
get_profiles(*, kind: str = "tts") -> list[VoiceProfile] set_profile(profile_id: str, *, kind: str = "tts") -> boolget_active_profile(*, kind: str = "tts") -> VoiceProfile | None- Cross-engine voice profile abstraction (preset packs).
- Profiles are engine-local: you select
tts_enginefirst, then apply a profile id for that engine. - Engines without profiles return an empty list / False / None.
- Concurrency note: profile selection mutates engine state. For servers, prefer one
VoiceManagerper session (or guard profile changes with a lock). - Remote OpenAI note: hosted built-in voices are always exposed as profiles (for example
vm.set_profile("alloy")), and the adapter also tries OpenAI voice discovery for account/org-specific voices such asvoice_....tts_engine="openai"defaults tohttps://api.openai.com/v1and readsOPENAI_API_KEY. - Remote compatible note: compatible endpoints may expose
GET /v1/audio/voices(adapter path:GET /audio/voices) returningprofiles,voices,cloned_voices, or OpenAI-styledata. Returned ids are exposed asVoiceProfiles and used as the requestvoicefor/audio/speech. - The
voice=argument onspeak_to_bytes(...)remains the cloned-voice handle path for backward compatibility; select base-provider voices withset_profile(...). -
OmniVoice notes:
- Some profiles may enable persistent prompt caching (a tokenized
voice_clone_prompt). The firstset_profile(...)can pay a one-time build cost; later synthesis reuses cached tokens for stable voice identity. Prompt-conditioned synthesis can be heavier than pure voice design; use/tts quality low|standard|high(orVoiceManager.set_tts_quality_preset(...)) to tune the trade-off. - On macOS / Apple Silicon, OmniVoice uses MPS (Metal) by default when
device="auto".
- Some profiles may enable persistent prompt caching (a tokenized
-
pause_speaking() -> bool,resume_speaking() -> bool,stop_speaking() -> bool -
Playback control.
-
is_speaking() -> bool,is_paused() -> bool -
Playback state helpers.
-
set_tts_delivery_mode(mode: str | None) -> bool,get_tts_delivery_mode() -> str,get_tts_delivery_modes() -> dict - Toggle buffered vs streamed delivery (applies to both base TTS and cloned voices).
-
Behavior note: streamed delivery is implemented as a pipeline:
- text is chunked into short segments (sentence-first),
- then each segment is synthesized and enqueued as soon as possible.
- Engines that can stream audio natively may further reduce TTFB by yielding multiple audio chunks per segment.
-
speak_to_bytes(text: str, format: str = "wav", voice: str | None = None, *, sanitize_syntax: bool = True) -> bytes -
Headless/server‑friendly: returns encoded audio bytes.
-
speak_to_audio_chunks(text: str, *, voice: str | None = None, sanitize_syntax: bool = True) -> Iterator[tuple[np.ndarray, int]] -
Headless/server‑friendly: yields
(audio_chunk, sample_rate)tuples for incremental delivery. -
open_tts_text_stream(*, voice: str | None = None, callback=None, sanitize_syntax: bool = True, max_chars: int | None = None, min_chars: int | None = None) -> TextToSpeechStream - Push-based streaming bridge for LLM streaming → TTS streaming pipelining.
-
Returned object supports:
.push(delta),.close(),.cancel(),.join(timeout=...). -
speak_to_file(text: str, output_path: str, format: str | None = None, voice: str | None = None, *, sanitize_syntax: bool = True) -> str - Writes an audio file and returns the path.
Language & voice selection (Piper path)¶
set_language(language: str) -> bool- Switches the active language.
- For explicit Piper, validation uses the curated Piper mapping in
abstractvoice/config/voice_catalog.py. - For non-Piper engines such as OmniVoice, the language code is passed through to the adapter and the engine decides what it supports.
-
If microphone listening is active, the recognizer is recreated on the next
listen(...)call so STT receives the updated language. -
get_language() -> str,get_language_name(language_code: str | None = None) -> str -
get_supported_languages() -> list[str] -
list_available_models(language: str | None = None) -> dict - Lists voice/model catalog entries for CLI/web display (see
abstractvoice/vm/tts_mixin.py). - Piper returns local voice cache status by language.
- OpenAI/OpenAI-compatible TTS returns remote voice profiles plus configured/discovered TTS model ids when the active adapter supports model listing.
-
Back-compat alias:
list_voices(). -
set_voice(language: str, voice_id: str) -> bool - Backward-compatible method; Piper voice selection is currently best-effort.
STT (audio → text)¶
transcribe_file(audio_path: str, language: str | None = None) -> str-
Transcribes audio from a file.
-
transcribe_from_bytes(audio_bytes: bytes, language: str | None = None) -> str - Transcribes audio sent over the network.
STT configuration¶
set_whisper(model_name: str) -> None | bool-
Updates the faster‑whisper model size used for subsequent operations.
-
get_whisper() -> str
Microphone capture (local assistant mode)¶
listen(on_transcription, on_stop=None) -> bool- Starts microphone capture + VAD + STT in-process (
abstractvoice/recognition.py). -
Stop phrase(s):
"ok stop","okay stop", and (conservatively)"stop"; seeabstractvoice/recognition.pyandabstractvoice/stop_phrase.py. -
stop_listening() -> bool -
Stops microphone capture.
-
pause_listening() -> bool,resume_listening() -> bool -
Pauses/resumes audio processing while keeping the listening thread alive.
-
is_listening() -> bool -
Whether the background recognizer thread is running.
-
cleanup() -> bool - Best-effort cleanup for long-lived apps (stop listening, stop speaking, release audio resources).
Advanced tuning (best-effort)¶
change_vad_aggressiveness(aggressiveness: int) -> bool- For advanced mic/VAD tuning; see
abstractvoice/recognition.py.
Voice modes (behavior while speaking)¶
Voice modes control what the microphone loop does while TTS is playing. Set via:
set_voice_mode(mode: str) -> boolwheremode ∈ {"full","wait","stop","ptt"}
Mode semantics (implemented in abstractvoice/vm/core.py):
- full: keep listening and allow barge‑in (interrupt TTS on detected speech). Best with AEC or headset; speakers can cause self-interruption (mitigations exist; see echo gating in
abstractvoice/recognition.py). - wait: pause microphone processing while speaking. No barge‑in and no stop‑phrase detection during TTS. Good for strict turn‑taking.
- stop: keep listening, but suppress normal transcriptions during TTS and disable “interrupt on any speech”; a rolling stop‑phrase detector stays active so users can say “ok stop” to cut playback.
- ptt: push‑to‑talk profile (thresholds tuned for short utterances). During TTS it behaves like stop mode; the integrator controls when to start/stop capture.
The REPL defaults to mic input off, and recommends --voice-mode stop for hands‑free usage; see docs/repl_guide.md.
Acoustic echo cancellation (optional)¶
enable_aec(enabled: bool = True, stream_delay_ms: int = 0) -> bool- Opt‑in AEC support for true barge‑in (requires
abstractvoice[aec]). - Playback audio chunks are fed to the recognizer via
abstractvoice/vm/core.py→VoiceRecognizer.feed_far_end_audio()inabstractvoice/recognition.py.
Voice cloning (optional; heavy)¶
Requires installing at least one cloning backend extra (and explicit artifact downloads; see docs/installation.md):
abstractvoice[cloning]→f5_ttsabstractvoice[chroma]→chromaabstractvoice[audiodit]→audioditabstractvoice[omnivoice]→omnivoice
Remote clone-compatible endpoints can also be used without local cloning model
weights by selecting cloning_engine="openai-compatible" (or
engine="openai-compatible" per call). Configure remote_base_url or
ABSTRACTVOICE_REMOTE_BASE_URL; the default clone endpoint is POST /voice/clone
and must return a remote voice id (voice_id or id). The local clone store
keeps a handle and routes later speak_to_bytes(..., voice=<local_id>) calls to
remote /audio/speech with that remote voice id.
cloning_engine="openai" targets OpenAI's hosted API by default. Custom voice
creation is provider/org gated and requires explicit consent configuration such
as ABSTRACTVOICE_OPENAI_VOICE_CONSENT_ID; otherwise the adapter raises an
actionable error instead of silently pretending cloning is standardized.
Core cloning calls:
clone_voice(reference_audio_path: str, name: str | None = None, *, reference_text: str | None = None, engine: str | None = None) -> strclone_voice_from_wav_bytes(wav_bytes: bytes, name: str | None = None, *, reference_text: str | None = None, engine: str | None = None) -> strspeak(..., voice="<voice_id>")/speak_to_bytes(..., voice="<voice_id>")/speak_to_file(..., voice="<voice_id>")list_cloned_voices(),get_cloned_voice(voice_id: str) -> dict
Clone management helpers:
set_cloned_voice_reference_text(voice_id: str, reference_text: str) -> boolrename_cloned_voice(voice_id: str, new_name: str) -> booldelete_cloned_voice(voice_id: str) -> boolexport_voice(voice_id: str, path: str) -> str,import_voice(path: str) -> strset_cloned_tts_quality(preset: str) -> bool(low|standard|high; aliases:fast,balanced)get_cloning_runtime_info() -> dictunload_cloning_engines(*, keep_engine: str | None = None) -> int(best-effort memory relief)unload_piper_voice() -> bool(best-effort memory relief)
For the user-facing workflow and commands, see docs/repl_guide.md.
Engine caveats that affect release choice are tracked in docs/known-issues.md.
Metrics (optional)¶
pop_last_tts_metrics() -> dict | None- Best-effort last-utterance stats used by the REPL verbose mode.
Callbacks & hooks¶
- Per-utterance callback:
speak(..., callback=...)(invoked after playback drains). - TTS lifecycle callbacks:
vm.tts_engine.on_playback_start/vm.tts_engine.on_playback_end(synthesis/queue lifecycle). - Audio lifecycle callbacks (actual output):
vm.on_audio_start/vm.on_audio_end/vm.on_audio_pause/vm.on_audio_resume(wired inabstractvoice/vm/core.py).
Explicit downloads (offline-first)¶
For offline deployments, prefetch explicitly (cross-platform):
python -m abstractvoice download --stt small
python -m abstractvoice download --piper en
python -m abstractvoice download --openf5 # optional; requires abstractvoice[cloning]
python -m abstractvoice download --chroma # optional; requires abstractvoice[chroma] (GPU-heavy)
python -m abstractvoice download --audiodit # optional; requires abstractvoice[audiodit]
python -m abstractvoice download --omnivoice # optional; requires abstractvoice[omnivoice]
Or use the convenience entrypoint:
abstractvoice-prefetch --stt small
abstractvoice-prefetch --piper en
abstractvoice-prefetch --openf5 # optional; requires abstractvoice[cloning]
abstractvoice-prefetch --chroma # optional; requires abstractvoice[chroma] (GPU-heavy)
abstractvoice-prefetch --audiodit # optional; requires abstractvoice[audiodit]
abstractvoice-prefetch --omnivoice # optional; requires abstractvoice[omnivoice]
Notes:
- --chroma artifacts may require Hugging Face access to download.
See also: docs/installation.md, docs/model-management.md, and docs/voices-and-licenses.md.
Performance note: prefetch vs preload (important for servers)¶
- Prefetch (download to disk):
python -m abstractvoice download .../abstractvoice-prefetch ... - Preload (load into memory): create a long-lived
VoiceManager(or adapter) and reuse it.
If you construct a new VoiceManager for every request, heavy engines (AudioDiT/OmniVoice) will pay a large one-time cost repeatedly (imports + weight load + accelerator kernel compilation).
Recommended pattern (server/process startup):
from abstractvoice import VoiceManager
# Load once, reuse for all requests.
vm = VoiceManager(
language="en",
tts_engine="omnivoice",
stt_engine="openai",
remote_api_key="sk-...",
allow_downloads=False,
)
Integrations (AbstractFramework ecosystem)¶
AbstractVoice is designed to work standalone, and also integrate cleanly into the AbstractFramework ecosystem (AbstractCore + AbstractRuntime). Overview and links: README.md.
Boundary note:
- AbstractVoice owns the in-process voice backend (VoiceManager, adapters, model/cache policy).
- AbstractCore owns agent orchestration, provider routing, capability selection, and OpenAI-compatible HTTP endpoints.
- When both are installed, AbstractCore can expose AbstractVoice-backed audio endpoints such as POST /v1/audio/speech and POST /v1/audio/transcriptions.
AbstractCore capability plugin (auto-discovery)¶
AbstractVoice exposes an AbstractCore capability plugin entry point:
- Entry point declaration:
pyproject.toml→[project.entry-points."abstractcore.capabilities_plugins"] - Implementation:
abstractvoice/integrations/abstractcore_plugin.py
The plugin registers:
- a voice backend (backend_id="abstractvoice:default") for TTS+STT
- an audio backend (backend_id="abstractvoice:stt") for STT-only
Audio outputs can optionally be stored into an AbstractRuntime-like artifact_store via the duck-typed adapter in abstractvoice/artifacts.py.
The voice backend also exposes thin catalog discovery methods for Core/Gateway
integration code:
- list_profiles(kind="tts") -> list[dict]
- list_tts_models() -> list[str]
- voice_catalog() -> {kind, engine_id, active_profile, active_model, profiles, tts_models, catalog}
These methods delegate to the active VoiceManager and keep voice/profile/model
semantics in AbstractVoice. AbstractCore still owns HTTP routing, auth, and
browser/security policy.
Plugin configuration (owner config dict, best-effort). In AbstractCore
integrations, the env/default path uses OpenAI remote TTS/STT
(OPENAI_API_KEY, optional ABSTRACTVOICE_OPENAI_* overrides) unless owner
config or ABSTRACTVOICE_TTS_ENGINE / ABSTRACTVOICE_STT_ENGINE selects a
different engine:
- voice_language: default language (e.g. "en")
- voice_allow_downloads: allow on-demand downloads (bool)
- voice_tts_engine: base TTS engine ("auto"|"piper"|"openai"|"openai-compatible"|"audiodit"|"omnivoice")
- voice_stt_engine: STT engine ("auto"|"faster_whisper"|"openai"|"openai-compatible")
- voice_tts_model: model id for remote TTS engines
- voice_stt_model: model id for remote STT engines
- voice_remote_base_url: base URL for OpenAI-compatible remote audio endpoints
- voice_remote_api_key: optional bearer key for remote audio endpoints
- voice_remote_timeout_s: request timeout for remote audio endpoints
- voice_whisper_model: faster-whisper model size (e.g. "base", "small")
- voice_cloning_engine: default cloning backend ("f5_tts"|"chroma"|"audiodit"|"omnivoice"|"openai"|"openai-compatible")
- voice_cloned_tts_streaming: stream cloned-voice chunks for faster time-to-first-audio (bool). Used when voice_tts_delivery_mode is unset.
- voice_tts_delivery_mode: unified audio delivery mode for base + cloned voices ("buffered"|"streamed"). Takes precedence over voice_cloned_tts_streaming.
- voice_tts_streaming: bool alias for voice_tts_delivery_mode (true → "streamed", false → "buffered").
- voice_debug_mode: enable debug prints (bool)
Boolean owner config/env values accept common strings such as true, false,
on, off, 1, and 0; string values like "false" are not treated as
truthy.
Performance note:
- The capability plugin caches VoiceManager instances in-process (keyed by the config above) so engines are not reloaded per request.
TTS metrics:
- After synthesis, the plugin stores best-effort stats in artifact metadata under abstractvoice_tts (when artifact_store is used).
AbstractCore OpenAI-compatible audio endpoints¶
The production OpenAI-compatible HTTP server lives in AbstractCore. AbstractVoice
also ships a local FastAPI web example (abstractvoice web) for package-level
smoke testing, but the supported API server path is AbstractCore Server.
With abstractcore[server] and abstractvoice installed in the same
environment, AbstractCore delegates its audio endpoints to the discovered
capability plugin:
POST /v1/audio/speech->core.voice.tts(...)->VoiceManager.speak_to_bytes(...)POST /v1/audio/transcriptions->core.audio.transcribe(...)->VoiceManager.transcribe_*()
Example:
OPENAI_API_KEY=... python -m abstractcore.server.app
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input":"Hello.","format":"wav"}' \
--output hello.wav
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F "file=@hello.wav" \
-F "language=en"
If AbstractCore Server is configured with ABSTRACTCORE_SERVER_API_KEY, include
the standard Authorization: Bearer <key> header. If the plugin is unavailable,
AbstractCore returns 501 with install/config guidance instead of silently
falling back.
For openai-compatible plugin configuration, do not set voice_remote_base_url
to the same AbstractCore Server instance that is currently routing the
/v1/audio/* request. That configuration recurses through the plugin path; use
an upstream compatible provider/gateway, or select local engines explicitly.
The local abstractvoice web example exposes these smoke-test routes. They are
example routes, not a replacement for AbstractCore Server, and they do not
inherit AbstractCore/Gateway authentication or browser-origin policy:
abstractvoice web --tts-engine openai --stt-engine openai
abstractvoice web --tts-engine openai-compatible --stt-engine openai-compatible --remote-base-url http://localhost:8000/v1
GET /api/status-> lightweight server/config statusGET /api/voices->VoiceManager.get_profiles(),list_available_models(),list_cloned_voices()GET /v1/audio/voices-> compatible extension for remote profile/voice discovery (VoiceManager.get_profiles()+list_cloned_voices())POST /api/voices/select-> select base TTS, a cloned voice, or a TTS profile; optional localrole="assistant"|"user"stores browser-example defaults; optionalpreload=truewarms a cloned voice by calling a tinyVoiceManager.speak_to_bytes(...)POST /api/voices/clone-> example-only multipart upload for browser voice cloning; stores the uploaded/recorded reference withVoiceManager.clone_voice(...)and validates by synthesizing a short sample by default (validate=falseskips validation)POST /v1/voice/clone-> compatible extension for remote clone creation; returnsvoice_id/idfor later/v1/audio/speechvoicePOST /api/tts->VoiceManager.speak_to_bytes(...); acceptsinput/text,voice,role,language,speed,format/response_format, andsanitize_syntaxPOST /api/stt/transcriptions->VoiceManager.transcribe_file(...)POST /api/stt/transcribe-> compatibility alias for/api/stt/transcriptionsGET /api/llm/models-> example-only model listing for an OpenAI-compatible local provider such as OllamaPOST /api/chat-> example-only non-streaming chat completion proxy; the browser owns history and sends the full short message listPOST /v1/audio/speechandPOST /v1/audio/transcriptions-> local aliases for quick AbstractCore-compatible smoke tests. In the web example,voicemay be either a cloned voice id/name or an active-engine profile id.
Local web example payload sketches:
# Browser-example role default; still resolves to a cloned voice_id.
curl -X POST http://127.0.0.1:5000/api/voices/select \
-H "Content-Type: application/json" \
-d '{"role":"assistant","kind":"clone","voice":"my_voice","preload":true}'
# Browser-example cloned voice creation.
curl -X POST http://127.0.0.1:5000/api/voices/clone \
-F "name=my_voice" \
-F "engine=f5_tts" \
-F "reference_text=Exact transcript of the reference audio." \
-F "file=@reference.wav"
# TTS still maps to VoiceManager.speak_to_bytes(...).
curl -X POST http://127.0.0.1:5000/api/tts \
-H "Content-Type: application/json" \
-d '{"input":"Hello.","role":"assistant","response_format":"wav"}' \
--output hello.wav
# Compatible extension: discover profiles/cloned voices from another
# AbstractVoice client configured with remote_base_url=http://127.0.0.1:5000/v1.
curl http://127.0.0.1:5000/v1/audio/voices
# Compatible extension: create a remote cloned voice handle.
curl -X POST http://127.0.0.1:5000/v1/voice/clone \
-F "name=my_remote_voice" \
-F "reference_text=Exact transcript of the reference audio." \
-F "file=@reference.wav"
# Example dialogue call via a local OpenAI-compatible provider.
curl -X POST http://127.0.0.1:5000/api/chat \
-H "Content-Type: application/json" \
-d '{"provider":"ollama","model":"gemma3:1b","messages":[{"role":"user","content":"Say hi in one sentence."}]}'
The local role field is only a convenience for the example UI. AbstractCore
compatibility remains the capability/plugin contract: clients may pass a
voice cloned-voice id to TTS, and AbstractCore routes that to
VoiceManager.speak_to_bytes(..., voice=...).
AbstractCore tool helpers (manual wiring)¶
If you prefer to wire tools explicitly, abstractvoice/integrations/abstractcore.py provides:
make_voice_tools(voice_manager, store) -> list[callable]- Requires
abstractcoreat runtime (it importsabstractcore.tool). storecan be a MediaStore-like object, or an AbstractRuntime-like ArtifactStore (adapted viaRuntimeArtifactStoreAdapterinabstractvoice/artifacts.py).
Tools exposed by make_voice_tools(...) (current):
- voice_tts(text, voice=None, format="wav", run_id=None) -> artifact_ref
- voice_profile_list(kind="tts") -> {profiles, active_profile}
- voice_profile_set(profile_id, kind="tts") -> {ok, active_profile}
- audio_transcribe(audio_artifact|audio_b64, ...) -> {text, transcript_artifact}
Minimal sketch:
from abstractvoice import VoiceManager
from abstractvoice.integrations.abstractcore import make_voice_tools
vm = VoiceManager(remote_api_key="sk-...")
tools = make_voice_tools(voice_manager=vm, store=artifact_store)
Example (engine-agnostic profile selection):
vm = VoiceManager(tts_engine="omnivoice", allow_downloads=False)
vm.set_profile("female_01", kind="tts")
wav_bytes = vm.speak_to_bytes("Hello.", format="wav")
TTS metrics (library-level):
- VoiceManager.speak_to_bytes(...) / VoiceManager.speak_to_file(...) record best-effort stats for the last synthesis.
- Call vm.pop_last_tts_metrics() to retrieve and clear them (dict with fields like engine, synth_s, audio_s, rtf, sample_rate).
Non-contract surface (may change without notice)¶
- CLI behavior (
abstractvoice/examples/*) - Internal adapter details and model catalogs beyond the documented defaults