Dependencies (direct) + models/weights¶
This document lists direct dependencies declared in pyproject.toml, plus the models/weights AbstractVoice may download.
Why direct-only?
- Direct dependencies (what we declare) are stable and reviewable.
- Transitive dependencies (what pip resolves) vary by platform and change over time; we call out only a few notable runtimes where it helps debugging.
If you need licensing guidance for model weights and voices, also read docs/voices-and-licenses.md.
Lightweight base dependencies (installed by default)¶
These are always installed with pip install abstractvoice.
- numpy
- Why: array operations and audio sample handling.
- Where: widely used across
abstractvoice/*(audio + cloning glue). - Repo:
https://github.com/numpy/numpy -
License:
https://github.com/numpy/numpy/blob/main/LICENSE.txt -
requests
- Why: HTTP calls (mainly integration / utility flows).
- Where: various network helpers; also used by some upstream libs.
- Repo:
https://github.com/psf/requests -
License:
https://github.com/psf/requests/blob/main/LICENSE -
appdirs
- Why: OS-appropriate cache/data directory locations.
- Where: used in cache/storage helpers.
- Repo:
https://github.com/ActiveState/appdirs - License:
https://github.com/ActiveState/appdirs/blob/master/LICENSE.txt
Optional extras (opt-in)¶
abstractvoice[local] — full local stack¶
This is the full local/offline handle. It includes local Piper TTS, faster-whisper STT, microphone/playback/VAD dependencies, AEC where supported, and the optional local cloning/TTS engine dependencies that are resolver-safe on the current Python version. Use granular extras below for smaller installs.
- piper-tts
- Why: local neural TTS backend.
- Where:
abstractvoice/adapters/tts_piper.py - Repo:
https://github.com/rhasspy/piper -
License:
https://github.com/rhasspy/piper/blob/master/LICENSE -
faster-whisper
- Why: local STT backend (fast Whisper inference).
- Where:
abstractvoice/adapters/stt_faster_whisper.py - Repo:
https://github.com/SYSTRAN/faster-whisper -
License:
https://github.com/SYSTRAN/faster-whisper/blob/master/LICENSE -
sounddevice
- Why: microphone/speaker I/O (PortAudio bindings).
- Where:
abstractvoice/recognition.py,abstractvoice/tts/tts_engine.py,abstractvoice/audio/* - Repo:
https://github.com/spatialaudio/python-sounddevice -
License:
https://github.com/spatialaudio/python-sounddevice/blob/master/LICENSE -
soundfile
- Why: WAV/FLAC/OGG read/write (libsndfile bindings).
- Where: cloning store normalization, tests, file utilities.
- Repo:
https://github.com/bastibe/python-soundfile -
License:
https://github.com/bastibe/python-soundfile/blob/master/LICENSE -
webrtcvad
- Why: VAD (voice activity detection) for
listen()and voice modes. - Where:
abstractvoice/vad/voice_detector.py,abstractvoice/recognition.py - Repo:
https://github.com/wiseman/py-webrtcvad - License:
https://github.com/wiseman/py-webrtcvad/blob/master/LICENSE.txt
abstractvoice[cloning], [chroma], [audiodit], [omnivoice] — model prefetch support¶
- huggingface_hub
- Why: download and cache optional engine model weights/artifacts.
- Where:
abstractvoice/cloning/engine_f5.py,abstractvoice/cloning/engine_chroma.py,abstractvoice/audiodit/runtime.py,abstractvoice/omnivoice/runtime.py - Repo:
https://github.com/huggingface/huggingface_hub - License:
https://github.com/huggingface/huggingface_hub/blob/main/LICENSE
abstractvoice[audiodit] — LongCat-AudioDiT (TTS + prompt-audio cloning)¶
Supported on Python 3.9+. Python 3.9 uses the latest compatible Transformers 4.x line; Python 3.10+ uses the Transformers 5.x line.
Python packages:
- torch
- Why: tensor runtime for AudioDiT inference.
- Where:
abstractvoice/audiodit/*,abstractvoice/cloning/engine_audiodit.py - Repo:
https://github.com/pytorch/pytorch -
License:
https://github.com/pytorch/pytorch/blob/main/LICENSE -
transformers
- Why: HF
PreTrainedModel/tokenizer APIs used by AudioDiT + text encoder. - Where:
abstractvoice/audiodit/*,abstractvoice/audiodit/runtime.py - Repo:
https://github.com/huggingface/transformers -
License:
https://github.com/huggingface/transformers/blob/main/LICENSE -
einops
- Why: model block reshaping utilities (used by AudioDiT modules).
- Where:
abstractvoice/audiodit/modeling_audiodit.py - Repo:
https://github.com/arogozhnikov/einops -
License:
https://github.com/arogozhnikov/einops/blob/master/LICENSE -
sentencepiece
- Why: tokenizer runtime (used by UMT5 tokenizer).
- Where: pulled by Transformers tokenizers depending on model.
- Repo:
https://github.com/google/sentencepiece -
License:
https://github.com/google/sentencepiece/blob/master/LICENSE -
safetensors
- Why: weight format used by HF models.
- Where: model load via Transformers.
- Repo:
https://github.com/huggingface/safetensors - License:
https://github.com/huggingface/safetensors/blob/main/LICENSE
Models/weights (Hugging Face):
- LongCat-AudioDiT-1B (weights + model license statement)
- Model:
https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B - License file:
https://huggingface.co/meituan-longcat/LongCat-AudioDiT-1B/blob/main/LICENSE - Cache:
~/.cache/huggingface(default) -
Prefetch:
python -m abstractvoice download --audioditorabstractvoice-prefetch --audiodit -
UMT5 text encoder (default:
google/umt5-base) - Model:
https://huggingface.co/google/umt5-base - License:
https://huggingface.co/google/umt5-base(model card metadata; license isapache-2.0) - Cache:
~/.cache/huggingface(default)
Vendored code:
- LongCat-AudioDiT upstream repository (MIT)
- Repo:
https://github.com/meituan-longcat/LongCat-AudioDiT - License:
https://github.com/meituan-longcat/LongCat-AudioDiT/blob/main/LICENSE - What we ship: a HuggingFace-compatible derived implementation under
abstractvoice/audiodit/*to avoidtrust_remote_code.
abstractvoice[omnivoice] — OmniVoice (omnilingual TTS + cloning / design)¶
Requires Python 3.10+ because the upstream omnivoice package declares that
Python floor.
Python packages:
- omnivoice
- Why: OmniVoice engine runtime + model wrapper.
- Where:
abstractvoice/omnivoice/*,abstractvoice/adapters/tts_omnivoice.py,abstractvoice/cloning/engine_omnivoice.py - Repo:
https://github.com/k2-fsa/OmniVoice -
License:
https://github.com/k2-fsa/OmniVoice/blob/master/LICENSE -
torch / torchaudio / torchvision
- Why: tensor runtime for OmniVoice inference.
- Repo:
https://github.com/pytorch/pytorch -
License:
https://github.com/pytorch/pytorch/blob/main/LICENSE -
transformers
- Repo:
https://github.com/huggingface/transformers -
License:
https://github.com/huggingface/transformers/blob/main/LICENSE -
accelerate
- Repo:
https://github.com/huggingface/accelerate - License:
https://github.com/huggingface/accelerate/blob/main/LICENSE
Models/weights (Hugging Face):
- OmniVoice (weights + model license statement)
- Model:
https://huggingface.co/k2-fsa/OmniVoice - Cache:
~/.cache/huggingface(default) - Prefetch:
python -m abstractvoice download --omnivoiceorabstractvoice-prefetch --omnivoice
abstractvoice[chroma] — Chroma-4B cloning (GPU-heavy)¶
Requires Python 3.10+ in AbstractVoice packaging because Chroma depends on Transformers 5.x.
- torch / torchaudio / torchvision
- Repo:
https://github.com/pytorch/pytorch -
License:
https://github.com/pytorch/pytorch/blob/main/LICENSE -
transformers
- Repo:
https://github.com/huggingface/transformers -
License:
https://github.com/huggingface/transformers/blob/main/LICENSE -
accelerate
- Repo:
https://github.com/huggingface/accelerate -
License:
https://github.com/huggingface/accelerate/blob/main/LICENSE -
av (PyAV)
- Repo:
https://github.com/PyAV-Org/PyAV -
License:
https://github.com/PyAV-Org/PyAV/blob/main/LICENSE.txt -
librosa / audioread / pillow / safetensors
- librosa repo/license:
https://github.com/librosa/librosa/https://github.com/librosa/librosa/blob/main/LICENSE.md - audioread repo/license:
https://github.com/beetbox/audioread/https://github.com/beetbox/audioread/blob/master/LICENSE - pillow repo/license:
https://github.com/python-pillow/Pillow/https://github.com/python-pillow/Pillow/blob/main/LICENSE - safetensors repo/license:
https://github.com/huggingface/safetensors/https://github.com/huggingface/safetensors/blob/main/LICENSE
Model:
- Chroma-4B
- Model:
https://huggingface.co/FlashLabs/Chroma-4B - License: see model card + repo files.
abstractvoice[cloning] — OpenF5 / F5-TTS cloning¶
Requires Python 3.10+ because current upstream f5-tts packages import
Python 3.10-only annotations during inference. On Python 3.9, use
abstractvoice[audiodit] with the audiodit cloning engine for tested
prompt-audio cloning.
- f5-tts
- Why: cloning engine (
f5_tts) backend. - Where:
abstractvoice/cloning/engine_f5.py - Repo:
https://github.com/SWivid/F5-TTS - License:
https://github.com/SWivid/F5-TTS/blob/main/LICENSE
abstractvoice[audio-fx] — Audio effects¶
- librosa
- Why: optional “speed change without pitch change”.
- Where: used by the audio FX path; surfaced via
docs/installation.md. - Repo:
https://github.com/librosa/librosa - License:
https://github.com/librosa/librosa/blob/main/LICENSE.md
abstractvoice[aec] — Acoustic echo cancellation¶
Requires Python 3.11+ because aec-audio-processing declares that Python
floor.
- aec-audio-processing
- Why: optional AEC for “full voice” barge-in on speakers.
- Where:
abstractvoice/aec/* - Repo:
https://github.com/shichaog/AEC-Audio-Processing - License:
https://github.com/shichaog/AEC-Audio-Processing/blob/main/LICENSE
abstractvoice[web] — Local web example¶
The base web extra installs only the browser server stack. Compose it with
abstractvoice[local] for the full local lab, or with granular engine extras
such as abstractvoice[omnivoice] for smaller installs.
- FastAPI
- Why: local browser example routes for TTS, transcription, assistant/user voice selection, voice cloning upload forms, and discussion playback (
abstractvoice web). - Where:
abstractvoice/examples/web_ui.py - Repo:
https://github.com/fastapi/fastapi -
License:
https://github.com/fastapi/fastapi/blob/master/LICENSE -
Uvicorn
- Why: local ASGI server for
abstractvoice web. - Where:
abstractvoice/examples/web_ui.py - Repo:
https://github.com/encode/uvicorn -
License:
https://github.com/encode/uvicorn/blob/master/LICENSE.md -
python-multipart
- Why: parse browser/OpenAI-compatible audio upload forms for transcription and voice cloning.
- Where:
abstractvoice/examples/web_ui.py - Repo:
https://github.com/Kludex/python-multipart - License:
https://github.com/Kludex/python-multipart/blob/master/LICENSE.txt
abstractvoice[stt] — Faster-Whisper STT¶
- faster-whisper
- Why: local STT adapter path.
- Where:
abstractvoice/adapters/stt_faster_whisper.py - Repo:
https://github.com/SYSTRAN/faster-whisper - License:
https://github.com/SYSTRAN/faster-whisper/blob/master/LICENSE
Notable transitive runtimes (debugging-oriented)¶
- CTranslate2 (used by
faster-whisper) - Repo:
https://github.com/OpenNMT/CTranslate2 -
License:
https://github.com/OpenNMT/CTranslate2/blob/master/LICENSE -
ONNX Runtime (used by Piper)
- Repo:
https://github.com/microsoft/onnxruntime - License:
https://github.com/microsoft/onnxruntime/blob/main/LICENSE
Development dependencies¶
These are used for local development only (not required at runtime).
- pytest:
https://github.com/pytest-dev/pytest—https://github.com/pytest-dev/pytest/blob/main/LICENSE - black:
https://github.com/psf/black—https://github.com/psf/black/blob/main/LICENSE - flake8:
https://github.com/PyCQA/flake8—https://github.com/PyCQA/flake8/blob/main/LICENSE