Skip to content

Getting started

Start here after README.md when you want to confirm the default remote path works, then optionally switch to local/offline engines.

Use docs/api.md for the supported integrator contract, docs/architecture.md for the implementation map, and docs/faq.md for cache/history reset and common troubleshooting.

Requirements

  • Python >=3.9 (see pyproject.toml)
  • OPENAI_API_KEY for the default VoiceManager() / REPL path
  • For microphone input: OS-level microphone permissions for your terminal/IDE

Install

pip install abstractvoice
export OPENAI_API_KEY=...

The plain install is lightweight and remote/plugin oriented. For fully local inference, listening, and cloning engines, install abstractvoice[local] and select local engines explicitly. Optional extras are documented in docs/installation.md.

60-second smoke test (no mic required)

Start the REPL:

OPENAI_API_KEY=... abstractvoice --verbose

From a source checkout (without installing the console script), use:

OPENAI_API_KEY=... python -m abstractvoice cli --verbose

In the REPL, run:

  • /speak hello (tests TTS without calling an LLM)

For local/offline TTS instead:

pip install "abstractvoice[local]"
abstractvoice-prefetch --piper en
abstractvoice --tts-engine piper --stt-engine faster_whisper --verbose

Optional Browser Example

The local web UI is a small FastAPI example around VoiceManager: discussion read-through with separate assistant/user voices, text to WAV, audio-file transcription, and a tiny optional LLM dialogue panel for OpenAI-compatible local providers such as Ollama. It is not the production server surface; use AbstractCore Server for production OpenAI-compatible HTTP endpoints.

pip install "abstractvoice[web]"
abstractvoice web --port 5000 --tts-engine openai-compatible --stt-engine openai-compatible --remote-base-url http://localhost:8000/v1

For a local web lab with Piper/faster-whisper instead:

pip install "abstractvoice[web,local]"
abstractvoice web --port 5000 --tts-engine piper --stt-engine faster_whisper

If you want the browser UI and a smaller optional engine install, compose extras directly, such as abstractvoice[web,omnivoice].

Then open http://127.0.0.1:5000.

The web example is offline-first by default. Prefetch models first, or start it with --allow-downloads when you explicitly want web requests to download missing models. Selecting a cloned voice can take a while on first use because the cloning backend loads weights and builds prompt/runtime caches; the browser UI shows a busy overlay while that work is happening.

The browser voice-cloning action validates a new clone by synthesizing a short sample before reporting success. If an optional engine cannot load, the stored clone is removed and the backend error is shown instead of leaving a broken voice in the selector.

For the dialogue panel, start a compatible local LLM server separately (for example Ollama on http://localhost:11434), choose a model in the page, then use Ask Assistant. The browser owns the short chat history; the example server only forwards one /v1/chat/completions request.

Minimal library usage

from abstractvoice import VoiceManager

vm = VoiceManager()
vm.speak("Hello from AbstractVoice.")

This uses OpenAI remote audio and reads OPENAI_API_KEY. For local inference:

from abstractvoice import VoiceManager

vm = VoiceManager(tts_engine="piper", stt_engine="faster_whisper")
vm.speak("Hello from the local stack.")

The public entry point is abstractvoice.VoiceManager (abstractvoice/voice_manager.py).

If you’re using AbstractVoice inside the AbstractFramework ecosystem, the intended architecture is:

  • AbstractCore runs agents and exposes OpenAI-compatible endpoints.
  • AbstractVoice is installed alongside it and provides TTS/STT as a capability backend plugin.

Pointers:

  • docs/api.md → “Integrations (AbstractFramework ecosystem)”
  • Capability plugin implementation: abstractvoice/integrations/abstractcore_plugin.py

If you’re integrating into the AbstractFramework ecosystem (AbstractCore / AbstractRuntime), see: - README.md (ecosystem overview) - docs/api.md (Integrations section; code pointers)

Minimal AbstractCore Server smoke test:

pip install "abstractcore[server]" abstractvoice
OPENAI_API_KEY=... python -m abstractcore.server.app

# TTS through AbstractCore + AbstractVoice
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello from AbstractVoice through AbstractCore.","format":"wav"}' \
  --output hello.wav

# STT through AbstractCore + AbstractVoice
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F "file=@hello.wav" \
  -F "language=en"

If the server is configured with ABSTRACTCORE_SERVER_API_KEY, include Authorization: Bearer <key> in those requests.

Enable microphone input (voice modes)

By default, the REPL does not start microphone capture. Enable it explicitly:

abstractvoice --voice-mode stop

From a source checkout:

python -m abstractvoice cli --voice-mode stop

Recommended modes (implemented in abstractvoice/vm/core.py and abstractvoice/recognition.py):

  • stop (recommended on speakers): keeps listening; during TTS it suppresses normal transcriptions but still lets you say “ok stop” to cut playback.
  • wait (strict turn-taking): pauses mic processing while speaking.
  • full (barge-in by speech): best with AEC or a headset; speakers can self-interrupt.

See docs/repl_guide.md for commands and docs/adr/0002_barge_in_interruption.md for rationale.

The REPL runs with allow_downloads=False, so prefetch explicitly:

abstractvoice-prefetch --stt small
abstractvoice-prefetch --piper en

For cloning engines (optional / large), see docs/installation.md and docs/voices-and-licenses.md.

Current engine caveats, including AudioDiT direct TTS quality and OmniVoice profile stability, are tracked in docs/known-issues.md.

Clear local history or caches

Inside the REPL, /clear resets the LLM message history sent to the provider. /reset also resets active voice state. Saved memories only exist when you run /save <name>.

Terminal command history, cloned voices, and model caches live in separate local directories. The exact reset commands are in docs/faq.md.

Contributing / local development

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
python -m pytest -q

See also: CONTRIBUTING.md, SECURITY.md, and internal notes in docs/development.md.