FAQ¶
Setup¶
What Python versions are supported?¶
AbstractVoice supports Python >=3.9. The lightweight base install and the
remote/OpenAI-compatible path are supported on Python 3.9. Local
Piper/faster-whisper extras and the web example are also supported on
Python 3.9. OpenF5/F5-TTS, Chroma, and OmniVoice require Python 3.10+ because
their upstream runtimes do; AEC requires Python 3.11+.
Do I need system speech dependencies?¶
The bare package install does not require local speech engines. Install
abstractvoice[local] for the full local Piper/faster-whisper/cloning path.
Piper runs
through ONNX Runtime, so it does not require system packages such as
espeak-ng.
Microphone and speaker I/O use sounddevice / PortAudio and are installed by
abstractvoice[local] or abstractvoice[audio-io]. On some Linux systems you
may need PortAudio packages from the OS package manager. See
docs/installation.md for platform notes.
I installed the package but abstractvoice is not found. What should I check?¶
Install into the Python environment that runs your shell:
python -m pip install abstractvoice
python -m abstractvoice cli --verbose
From a source checkout, python -m abstractvoice cli --verbose is the most
reliable way to start without depending on console-script path setup.
Local State, History, And Caches¶
How do I clear the current REPL conversation history?¶
Inside the REPL:
/clear
/clear resets the in-memory LLM message list back to the system prompt. That
is the history sent to the configured OpenAI-compatible chat endpoint.
Related commands:
/history
/history 50 --all
/history 10 --full
/reset
/reset clears the chat history and also resets the active voice selection back
to the default TTS path.
Does the REPL automatically persist LLM conversation history?¶
No. The LLM message history is in memory unless you explicitly save it:
/save my-session
/load my-session
/save my-session writes my-session.mem in the current working directory.
Delete that .mem file if you no longer want the saved conversation.
Why do my previous typed commands still appear with the up arrow?¶
That is terminal command history, not LLM chat history. The REPL stores it as a small readline history file so the up/down arrows work across sessions.
Find the app-data directory with:
python - <<'PY'
import appdirs
print(appdirs.user_data_dir("abstractvoice"))
PY
Then delete repl_history inside that directory.
Common locations:
- macOS:
~/Library/Application Support/abstractvoice/repl_history - Linux:
~/.local/share/abstractvoice/repl_history - Windows:
%LOCALAPPDATA%\\abstractvoice\\abstractvoice\\repl_historyor the path printed byappdirs
Where are models, cloned voices, and prompt caches stored?¶
Default local state:
- Piper voices:
~/.piper/models - faster-whisper models: Hugging Face cache, usually
~/.cache/huggingface - OpenF5 artifacts:
~/.cache/abstractvoice/openf5 - Chroma artifacts and prompt cache:
~/.cache/abstractvoice/chroma - Cloned voices:
appdirs.user_data_dir("abstractvoice")/cloned_voices - OmniVoice persistent prompt profiles:
appdirs.user_data_dir("abstractvoice")/omnivoice_prompt_cache - REPL terminal command history:
appdirs.user_data_dir("abstractvoice")/repl_history - Saved REPL memories: wherever you ran
/save, with a.memextension
How do I reset local state completely?¶
Stop any running AbstractVoice process, then remove only the state you want to purge.
For a full local reset on macOS/Linux:
rm -rf ~/.piper/models
rm -rf ~/.cache/abstractvoice/openf5
rm -rf ~/.cache/abstractvoice/chroma
python - <<'PY'
import shutil
from pathlib import Path
import appdirs
root = Path(appdirs.user_data_dir("abstractvoice"))
for name in ("cloned_voices", "omnivoice_prompt_cache", "repl_history"):
path = root / name
if path.is_dir():
shutil.rmtree(path)
elif path.exists():
path.unlink()
PY
Be careful with the Hugging Face cache. It is shared by many AI tools. Prefer
deleting specific model folders under ~/.cache/huggingface/hub instead of
removing the whole cache unless you really want to reset everything.
Offline-First Downloads¶
Why does the REPL not download models automatically?¶
The REPL starts VoiceManager(..., allow_downloads=False). Interactive sessions
should not surprise you with multi-GB downloads. Prefetch what you need:
abstractvoice-prefetch --piper en
abstractvoice-prefetch --stt small
The library default is remote-first: VoiceManager() uses OpenAI audio and
reads OPENAI_API_KEY. Local adapters still respect allow_downloads=True
when you explicitly select them.
What should I prefetch first?¶
Most users should start with:
python -m abstractvoice download --piper en
python -m abstractvoice download --stt small
Optional heavy engines:
python -m abstractvoice download --openf5
python -m abstractvoice download --chroma
python -m abstractvoice download --audiodit
python -m abstractvoice download --omnivoice
Install the matching extra before prefetching optional engines, for example
pip install "abstractvoice[omnivoice]".
REPL Usage¶
Can I use the REPL without an LLM server?¶
Yes. Use /speak <text> to test the configured TTS engine directly:
/speak hello from AbstractVoice
Normal chat messages call the configured OpenAI-compatible LLM endpoint. The
default provider preset is Ollama at http://localhost:11434.
How do I enable microphone input?¶
Microphone input is off by default. Start with:
OPENAI_API_KEY=... abstractvoice --voice-mode stop
Or enable it inside the REPL:
/voice stop
stop mode is the recommended hands-free mode on speakers. During TTS it keeps
a stop-phrase detector active, so you can say "ok stop" to cut playback.
What are the voice modes?¶
off: no microphone input.stop: recommended on speakers; normal transcriptions pause during TTS, but "ok stop" can interrupt playback.wait: strict turn-taking; microphone processing pauses during TTS.full: barge-in on speech; best with a headset or acoustic echo cancellation.ptt: push-to-talk profile for short intentional captures.
How do I hide or inspect model reasoning from local LLMs?¶
The REPL removes <think>...</think> blocks before printing, storing, or
speaking an assistant response. This keeps spoken output clean.
Use /history --full to inspect the exact message list that remains in memory.
Library And Server Integration¶
Can I use AbstractVoice without the REPL?¶
Yes. The main API is VoiceManager:
from abstractvoice import VoiceManager
vm = VoiceManager()
wav = vm.speak_to_bytes("Hello.", format="wav")
text = vm.transcribe_file("hello.wav")
VoiceManager() reads OPENAI_API_KEY by default. For local/offline use,
install abstractvoice[local] and create
VoiceManager(tts_engine="piper", stt_engine="faster_whisper").
For long-lived apps and servers, create one VoiceManager per configuration and
reuse it. Heavy engines are expensive to load repeatedly.
Does AbstractVoice ship an HTTP server?¶
AbstractVoice ships a small local FastAPI web example:
pip install "abstractvoice[web]"
OPENAI_API_KEY=... abstractvoice web --port 5000
Open http://127.0.0.1:5000 and use it to test discussion playback,
assistant/user voice selection, TTS, audio-file transcription, and a tiny
example dialogue panel backed by an OpenAI-compatible local provider such as
Ollama. It lazy-loads VoiceManager on the first audio request.
If selecting a cloned voice appears slow, that is usually the cloning runtime loading weights and preparing prompt/runtime caches. The web example shows a busy overlay during selection/preload and synthesis so the browser does not look stuck.
For production HTTP APIs, use AbstractCore Server. AbstractVoice is still the voice I/O library; AbstractCore owns the OpenAI-compatible server surface in the AbstractFramework ecosystem.
Install both packages in the same environment:
pip install "abstractcore[server]" abstractvoice
OPENAI_API_KEY=... python -m abstractcore.server.app
AbstractCore discovers the AbstractVoice capability plugin and can expose OpenAI-compatible audio endpoints such as:
POST /v1/audio/speechPOST /v1/audio/transcriptions
Direct Python integration remains fully supported through VoiceManager.
Languages And Engines¶
How do I switch language?¶
Use VoiceManager(language="fr") or:
vm.set_language("fr")
In the REPL:
/language fr
For Piper, each language needs a cached voice:
python -m abstractvoice download --piper fr
Which TTS engine should I use?¶
- Remote OpenAI-compatible engines are the lightest path for server and plugin
deployments, and hosted OpenAI is the default
VoiceManager()path. - Piper is the recommended reliable path for local TTS; install
abstractvoice[local]orabstractvoice[piper]. - faster-whisper is the local STT path; install
abstractvoice[local]orabstractvoice[stt]. - OpenF5, Chroma, AudioDiT, and OmniVoice are optional heavier engines for cloning, research, or richer voice experiments.
- AudioDiT is best treated as an EN/ZH-focused experimental TTS/cloning engine in this integration.
- OmniVoice is the main optional path for omnilingual speech and voice design, but stable reusable voice profiles still need more validation.
For current caveats, see docs/known-issues.md.
OmniVoice fails to import with operator torchvision::nms does not exist. What now?¶
OmniVoice uses the torch/torchaudio/torchvision stack. If another package
installed an incompatible torchvision, reinstall the matching build. For the
torch 2.8 family this is typically:
python -m pip install --upgrade --force-reinstall "torchvision==0.23.*"
If you do not need torchvision, uninstalling it can also avoid the import path:
python -m pip uninstall torchvision
Voice Cloning¶
Is voice cloning included in the base install?¶
No. Voice cloning is optional:
pip install "abstractvoice[cloning]" # OpenF5
pip install "abstractvoice[chroma]" # Chroma, GPU-heavy
pip install "abstractvoice[audiodit]" # AudioDiT
pip install "abstractvoice[omnivoice]" # OmniVoice
Artifacts are still downloaded explicitly with python -m abstractvoice download ...
or abstractvoice-prefetch ....
How should I prepare reference audio?¶
Good reference audio matters more than most generation knobs:
- Use one speaker.
- Avoid music, effects, and background noise.
- Start with 4-10 seconds of clean speech.
- Trim leading and trailing silence.
- Provide an exact transcript when the engine accepts
reference_text.
Engine notes:
- OpenF5 normalizes to 24 kHz mono and clips references to 15 seconds.
- AudioDiT normalizes to 24 kHz and clips prompt audio to 15 seconds.
- OmniVoice currently supports one reference audio file for a clone prompt.
- Chroma normalizes prompt audio to 24 kHz mono and clips to 30 seconds.
Can I move cloned voices between machines?¶
Yes. Use the REPL export/import commands:
/clone_export <id-or-name> <path>
/clone_import <path>
The target machine still needs the same cloning engine installed and its weights prefetched.
Can OmniVoice make a stable portable voice preset?¶
There are two levels of stability:
- Designed voices use
instruct, seed, temperatures, and generation settings. This is useful, but exact identity can vary across hardware and model builds. - Prompt-conditioned voices use reference audio or cached prompt tokens. This is the stronger route for reusable voices.
For practical persistence, prefer a profile that builds a persistent prompt cache, or create/export an OmniVoice clone from reference audio.
Known Issues And Bug Tracking¶
Where are known bugs tracked?¶
GitHub Issues should be the canonical tracker for active bugs. Use labels such
as bug, known-issue, engine:audiodit, engine:omnivoice, and
release:0.8.1.
docs/known-issues.md is the curated release-facing mirror: it lists the known
issues users should see before choosing an engine, plus the current workaround.
When a bug is fixed, close the GitHub issue, move the note to CHANGELOG.md,
and remove it from the active known-issues list.
Are there known engine caveats in this release?¶
Yes:
- AudioDiT direct/base TTS can sound distorted in this release. AudioDiT cloning is still the better-validated AudioDiT path.
- OmniVoice stable reusable profiles are still being curated. For stronger persistence, use cached prompt profiles or exported cloned voices.
See docs/known-issues.md for the current list.
Licensing¶
Is AbstractVoice MIT licensed? What about voices and model weights?¶
The library code is MIT licensed. Model weights and voice files are separate assets with their own licenses and distribution terms.
Read docs/voices-and-licenses.md before redistributing models, voices, or
generated voice assets.