Getting Started¶
This guide helps you generate your first image using AbstractVision with the built-in backends:
- OpenAI-compatible HTTP: call a local/remote server that exposes OpenAI-shaped image endpoints
- Diffusers (local Python): Stable Diffusion / Qwen Image / FLUX 2 / GLM-Image (and other Diffusers pipelines)
- stable-diffusion.cpp (local GGUF): GGUF diffusion models via
sd-cli(recommended for GPU backends like Metal/CUDA) or via pip-installable python bindings (often CPU-only fallback) - Playground (web, optional): self-contained AbstractVision UI/API for local model loading and jobs (
/v1/vision/*)
See also: - Docs index: docs/README.md - FAQ: docs/faq.md - API reference: docs/api.md - Architecture: docs/architecture.md - Backends: docs/reference/backends.md - Configuration (CLI/REPL env vars): docs/reference/configuration.md - Capability registry: docs/reference/capabilities-registry.md - Artifacts: docs/reference/artifacts.md - AbstractCore integration: docs/reference/abstractcore-integration.md
0) Install¶
From PyPI:
pip install abstractvision
AbstractVision’s base install is lightweight. It includes the shared API, capability registry, artifact helpers, CLI, AbstractCore plugin entry point, and stdlib OpenAI-compatible HTTP backend. Local inference runtimes are explicit extras: install abstractvision[diffusers] for Torch/Diffusers, abstractvision[sdcpp] for the stable-diffusion.cpp python binding fallback, or abstractvision[local] for both.
If you see “missing pipeline class” errors for newer model families, install the diffusers-dev extra (or compatibility alias huggingface-dev) to get compatible dependencies, then install Diffusers from source (main).
For that newer-pipeline workflow from a repo checkout, install the diffusers-dev extra (compatible deps; does not include Diffusers main):
pip install -e ".[diffusers-dev]"
If you're installing AbstractVision from PyPI, you can install the extra directly:
pip install -U "abstractvision[diffusers-dev]"
Or install Diffusers from source directly:
pip install -U "git+https://github.com/huggingface/diffusers@main"
Sanity check:
python -c "import diffusers; print(diffusers.__version__)"
python -c "import diffusers; print('GlmImagePipeline', hasattr(diffusers, 'GlmImagePipeline')); print('Flux2KleinPipeline', hasattr(diffusers, 'Flux2KleinPipeline'))"
Offline alternative (if you already have a local Diffusers checkout):
pip install -U -e /path/to/diffusers
Or, from a repo checkout (run in the repo root):
pip install -e .
For contributor tooling from a repo checkout, use:
pip install -e ".[dev]"
For local Diffusers generation, install abstractvision[diffusers] before selecting the diffusers backend. Use diffusers-dev only when you need newer Diffusers-compatible dependency pins, and use sdcpp only when you want the optional stable-diffusion.cpp python binding fallback.
Optional extras:
| Extra | Use |
|---|---|
openai |
Empty official OpenAI provider intent marker; the HTTP backend is stdlib-only today. |
openai-compatible |
Empty local/remote OpenAI-shaped endpoint intent marker; the HTTP backend is stdlib-only today. |
diffusers |
Installs Torch/Diffusers and related packages for local Diffusers generation. |
sdcpp |
Installs stable-diffusion-cpp-python for the stable-diffusion.cpp pip binding fallback. |
huggingface |
Compatibility alias for the historical Diffusers backend dependency set. |
local |
Convenience extra for both local backend dependency sets, including sdcpp. |
all |
All runtime backend dependencies, without contributor tooling. |
abstractcore |
Empty compatibility marker; install AbstractCore in the host application environment. |
Contributor-only extras:
| Extra | Use |
|---|---|
diffusers-dev / huggingface-dev |
Looser dependency pins for newer/unreleased Diffusers pipelines. Install Diffusers main separately when a pipeline is not in the latest release. |
test |
Local test dependencies. |
docs |
Documentation build tooling. |
dev |
Full contributor workflow: tests, docs, packaging, formatting, release checks, and pre-commit. Do not use this as an application runtime profile. |
Optional (recommended): pre-download heavyweight model sets (so first-run doesn’t do surprise multi‑GB downloads):
python scripts/download_model_sets.py --list
python scripts/download_model_sets.py --plan --set sd15_diffusers
python scripts/download_model_sets.py --plan --set flux2_klein_4b_gguf
python scripts/download_model_sets.py --set sd15_diffusers
0.1 Hardware quickstart (macOS Metal vs NVIDIA CUDA vs CPU)¶
AbstractVision can run “locally” via two main routes:
- Diffusers backend: uses Torch device selection (
cuda/mps/cpu). - stable-diffusion.cpp backend (
sdcpp): runs GGUF diffusion models using: sd-cli(recommended when you want GPU backends like Metal or CUDA)- or
stable-diffusion-cpp-python(convenient, but often CPU-only, especially on macOS)
macOS (Apple Silicon, Metal)¶
- Diffusers: start with Stable Diffusion 1.5, then move up:
/backend diffusers runwayml/stable-diffusion-v1-5 mps float16/backend diffusers black-forest-labs/FLUX.2-klein-4B mps float16(requires Diffusersmaintoday)- GGUF (
sdcpp): installsd-clifrom stable-diffusion.cpp releases and use CLI mode for Metal speed: - Download: https://github.com/leejet/stable-diffusion.cpp/releases
- Pick the Darwin arm64 zip (example asset name:
sd-…-bin-Darwin-macOS-…-arm64.zip) - If macOS blocks execution, clear quarantine:
xattr -dr com.apple.quarantine /path/to/sd-cli - In the REPL, pass the full path as the last arg to
/backend sdcpp …(see section 6)).
If you see Using CPU backend in logs, you’re on CPU (it will work, but can be extremely slow for large models).
NVIDIA (CUDA)¶
- Install a CUDA-enabled PyTorch wheel first (see https://pytorch.org/get-started/locally/).
- Use Diffusers with
cuda+float16: /backend diffusers runwayml/stable-diffusion-v1-5 cuda float16- For GGUF (
sdcpp) on NVIDIA, use ansd-clibuild compiled with CUDA (stable-diffusion.cpp releases provide multiple assets depending on tag).
CPU-only¶
- Expect slow inference. Prefer smaller models and lower resolutions/steps.
sdcppvia python bindings is the simplest “no external binary” option, but it will use whatever backend the wheel was compiled with (often CPU).
Recommended default models (VRAM guide)¶
If you run locally (Diffusers backend) and want a reliable starting point, here are practical model picks from the packaged capability registry (src/abstractvision/assets/vision_model_capabilities.json).
Notes:
- VRAM needs vary with resolution, dtype, and pipeline implementation. Treat this as a starting point.
- Some models are gated on Hugging Face and require accepting terms + setting HF_TOKEN.
- If you want a non-gated modern image model, try black-forest-labs/FLUX.2-klein-4B (but it currently requires installing Diffusers from source; see the FLUX section below).
| GPU VRAM | Recommended model id | Why | Install / quickstart |
|---|---|---|---|
| ≤ 16 GB | runwayml/stable-diffusion-v1-5 |
Small, stable, and widely compatible (Windows/Linux CUDA, macOS MPS) | pip install "abstractvision[diffusers]" then run the REPL using the snippet below |
| 24-32 GB | black-forest-labs/FLUX.2-klein-4B |
Newer non-gated model, much smaller than FLUX.2-dev | Install Diffusers main, then use the FLUX.2 klein section below |
| 32 GB | stabilityai/stable-diffusion-3.5-large-turbo |
High-quality still images with low step counts (gated) | Accept model terms on HF, set HF_TOKEN, then use the SD3.5 section below |
| 64 GB | Qwen/Qwen-Image-2512 |
Strong prompt following and text rendering (large model) | Same as Diffusers setup; if pipeline import fails, use Diffusers main (see install section above) |
| 128 GB | black-forest-labs/FLUX.2-dev |
Very high quality (very large; non-commercial license; gated) | Accept model terms on HF, set HF_TOKEN, then use the FLUX section below |
macOS Metal (Apple Silicon) quick picks:
- If you want local quantized FLUX.2 on Metal: prefer stable-diffusion.cpp (GGUF) via the
sdcppbackend (see section 6)). - If you want a fast local FLUX.2 for iteration:
black-forest-labs/FLUX.2-klein-4B(or GGUF equivalents) is usually the most practical starting point. - If you want strong prompt following + text rendering:
Qwen/Qwen-Image-2512(Diffusers onmps, start withfloat16).
Recommended default (local, cross-platform) — Stable Diffusion 1.5:
pip install "abstractvision[diffusers]"
huggingface-cli download runwayml/stable-diffusion-v1-5
export ABSTRACTVISION_BACKEND=diffusers
export ABSTRACTVISION_MODEL_ID=runwayml/stable-diffusion-v1-5
export ABSTRACTVISION_DIFFUSERS_DEVICE=auto
abstractvision repl
Then type a prompt (plain text runs /t2i), or use /t2i "..." --open.
Jump to detailed recipes: - Stable Diffusion 1.5: section 1) First local image (Diffusers) - FLUX.2-klein-4B: section 2) Next small model (FLUX.2-klein-4B) - OpenAI-compatible HTTP: section 2.1) OpenAI-compatible HTTP - Qwen Image: section 3) Qwen Image (Diffusers) - FLUX 2 details: section 4) FLUX 2 (Diffusers) - SD3.5: section 5) Stable Diffusion 3.5 (Diffusers, gated)
1) First local image (Diffusers)¶
The REPL is cache-only by default, so it will not download model weights. Download the model separately first:
huggingface-cli download runwayml/stable-diffusion-v1-5
# Required for this local Diffusers recipe.
export ABSTRACTVISION_BACKEND=diffusers
export ABSTRACTVISION_MODEL_ID=runwayml/stable-diffusion-v1-5
export ABSTRACTVISION_DIFFUSERS_DEVICE=auto
# auto prefers cuda, then mps, then cpu. You can also set cuda/mps/cpu explicitly.
# Optional: override dtype (auto defaults to float16 on MPS for broad compatibility).
# - `float16` is usually the best speed/compatibility tradeoff on Apple Silicon
# - `bfloat16` can work for some models, but can trigger dtype-mismatch errors in some pipelines
# - `float32` is the most stable, but can require much more memory
# export ABSTRACTVISION_DIFFUSERS_TORCH_DTYPE=bfloat16
# export ABSTRACTVISION_DIFFUSERS_TORCH_DTYPE=float16
# export ABSTRACTVISION_DIFFUSERS_TORCH_DTYPE=float32
Quick sanity check (device):
python -c "import torch; print('mps', torch.backends.mps.is_available(), 'cuda', torch.cuda.is_available())"
If you have an NVIDIA GPU but cuda is False, you likely installed a CPU-only PyTorch build. Follow the PyTorch install guide to install a CUDA-enabled wheel, then re-run the sanity check: https://pytorch.org/get-started/locally/.
Start the REPL:
abstractvision repl
With ABSTRACTVISION_BACKEND=diffusers and ABSTRACTVISION_MODEL_ID set above, the REPL uses runwayml/stable-diffusion-v1-5:
/set guidance_scale 7
/set seed 42
/t2i "a cinematic photo of a red fox in snow" --width 512 --height 512 --steps 10 --open
Change settings by changing /set … values, or pass flags per request:
/t2i "a watercolor painting of a lighthouse" --width 512 --height 512 --steps 20 --seed 123 --guidance-scale 6.5 --open
2) Next small model (FLUX.2-klein-4B)¶
After Stable Diffusion 1.5 works, black-forest-labs/FLUX.2-klein-4B is the next recommended local test. It is
non-gated and much smaller than FLUX.2-dev, but it currently needs Diffusers from source because released Diffusers
may not include Flux2KleinPipeline.
pip install -U "abstractvision[diffusers-dev]"
pip install -U "git+https://github.com/huggingface/diffusers@main"
Quick REPL test:
/backend diffusers black-forest-labs/FLUX.2-klein-4B mps float16
/t2i "a product photo of a matte black espresso machine" --width 1024 --height 1024 --steps 4 --guidance-scale 1.0 --open
Use cuda float16 on NVIDIA, or auto if you want AbstractVision/Torch to pick the device.
2.1) OpenAI-compatible HTTP¶
Use this path if you already have a server that exposes OpenAI-shaped image endpoints (e.g. a local model server).
For unknown or local OpenAI-compatible servers, AbstractVision forwards local extension fields such as steps, seed, guidance_scale, width, and height. For the real OpenAI API and known GPT image models, it suppresses unsupported local-only fields and sends the narrower OpenAI request shape.
List provider-advertised models explicitly:
abstractvision provider-models --openai --task text_to_image
abstractvision provider-models --base-url http://localhost:1234/v1 --task text_to_image
One-shot (stores output via LocalAssetStore and prints an artifact ref + file path):
abstractvision t2i --base-url http://localhost:1234/v1 "a watercolor painting of a lighthouse" --width 512 --height 512 --steps 10 --open
Interactive REPL:
abstractvision repl
/backend openai http://localhost:1234/v1
/t2i "a watercolor painting of a lighthouse" --width 512 --height 512 --steps 10 --open
If your server also supports video endpoints, configure them via ABSTRACTVISION_TEXT_TO_VIDEO_PATH / ABSTRACTVISION_IMAGE_TO_VIDEO_PATH (see docs/reference/configuration.md).
3) Qwen Image (Diffusers)¶
Qwen Image models in the registry:
Qwen/Qwen-Image(older)Qwen/Qwen-Image-2512(newer)
Use the same Diffusers flow:
/backend diffusers Qwen/Qwen-Image-2512 mps float16
/t2i "a poster with the word 'ABSTRACT' rendered perfectly in bold typography" --width 512 --height 512 --steps 10 --guidance-scale 2.5 --open
Notes:
- Qwen Image models are large.
- For best results, prefer the model card’s recommended sizes (e.g. 1328x1328 for 1:1). For quick tests, 512x512 is fine.
- On Apple Silicon (MPS), start with fp16 (default; best compatibility):
- ABSTRACTVISION_DIFFUSERS_TORCH_DTYPE=float16 (or in the REPL: /backend diffusers Qwen/Qwen-Image-2512 mps float16)
- If you get NaNs/black images, try fp32 (this can require very large peak memory during load):
- ABSTRACTVISION_DIFFUSERS_TORCH_DTYPE=float32 (or in the REPL: /backend diffusers Qwen/Qwen-Image-2512 mps float32)
- On Apple Silicon (MPS), AbstractVision upcasts the VAE to fp32 when using fp16 to avoid common “black image” issues.
- Automatic fp32 retry on all-black output is enabled by default on MPS (can increase peak memory):
- disable with ABSTRACTVISION_DIFFUSERS_AUTO_RETRY_FP32=0
- In AbstractVision, --guidance-scale is mapped to Qwen’s true_cfg_scale when using Qwen pipelines (CFG). If you set --guidance-scale but don’t provide a negative_prompt, AbstractVision passes a placeholder negative prompt (" ") so CFG is actually enabled.
Tip: keep guidance_scale relatively low for some modern DiT models.
3.1) LoRA + Rapid-AIO (Diffusers)¶
AbstractVision can apply LoRA adapters (Diffusers adapter system) and optionally swap in a distilled “Rapid-AIO” transformer for faster Qwen Image Edit inference.
These features follow the Diffusers download setting. The REPL is cache-only by default, so pre-download adapters or Rapid-AIO weights separately before using repo ids here. If you intentionally want runtime downloads, set:
export ABSTRACTVISION_DIFFUSERS_ALLOW_DOWNLOAD=1
LoRA example (REPL; note: loras_json is forwarded via request.extra):
/backend diffusers Qwen/Qwen-Image-Edit-2511 mps float16
/t2i "a cinematic photo of a red fox in snow" --steps 8 --guidance-scale 1 --loras_json '[{"source":"lightx2v/Qwen-Image-Edit-2511-Lightning","scale":1.0}]' --open
Rapid-AIO example (distilled transformer override; Qwen Image Edit):
/backend diffusers Qwen/Qwen-Image-Edit-2511 mps float16
/t2i "a cinematic photo of a red fox in snow" --steps 4 --guidance-scale 1 --rapid_aio_repo linoyts/Qwen-Image-Edit-Rapid-AIO --open
4) FLUX 2 (Diffusers)¶
FLUX 2 models in the registry:
black-forest-labs/FLUX.2-klein-4B(Apache-2.0, not gated)black-forest-labs/FLUX.2-klein-9B(non-commercial license, gated on Hugging Face)black-forest-labs/FLUX.2-dev(non-commercial license, gated on Hugging Face)
Sanity check:
python -c "import diffusers; print(diffusers.__version__)"
Notes:
- FLUX.2-dev uses Diffusers Flux2Pipeline and works on released Diffusers (0.36+).
- FLUX.2-klein-4B and FLUX.2-klein-9B use Flux2KleinPipeline, which is not available in the released Diffusers (0.36.0). It currently
requires installing Diffusers from source (with the diffusers-dev extra for compatible dependency pins):
- pip install -U "abstractvision[diffusers-dev]"
- pip install -U "git+https://github.com/huggingface/diffusers@main"
Recommended first FLUX example (FLUX.2-klein-4B, not gated):
/backend diffusers black-forest-labs/FLUX.2-klein-4B mps float16
/t2i "a product photo of a matte black espresso machine" --width 1024 --height 1024 --steps 4 --guidance-scale 1.0 --seed 0 --open
Example (FLUX.2-klein-9B, gated; requires Diffusers main and HF access):
/backend diffusers black-forest-labs/FLUX.2-klein-9B mps float16
/t2i "a minimalist product photo of a matte black espresso machine, studio lighting" --width 1024 --height 1024 --steps 4 --guidance-scale 1.0 --seed 0 --open
Example (FLUX.2-dev, gated; you must pre-download it into your HF cache first):
/backend diffusers black-forest-labs/FLUX.2-dev mps
/t2i "a minimalist product photo of a matte black espresso machine, studio lighting" --width 1024 --height 1024 --steps 4 --guidance-scale 1.0 --seed 0 --open
If you use gated models (like FLUX.2-dev), you typically must accept the model’s terms on Hugging Face and set HF_TOKEN in your environment.
5) Stable Diffusion 3.5 (Diffusers, gated)¶
SD3.5 models (all gated on Hugging Face):
stabilityai/stable-diffusion-3.5-large-turbostabilityai/stable-diffusion-3.5-largestabilityai/stable-diffusion-3.5-medium
1) Accept the model terms on Hugging Face (in your browser).
2) Export a token:
export HF_TOKEN=... # your Hugging Face access token
Then in the REPL:
/backend diffusers stabilityai/stable-diffusion-3.5-large-turbo mps
/t2i "a modern product photo of a watch, studio lighting" --width 1024 --height 1024 --steps 6 --guidance-scale 4 --seed 42 --open
Turbo models are usually best with low step counts (e.g. ~4–8).
6) GGUF diffusion models (stable-diffusion.cpp)¶
If you downloaded a GGUF diffusion model (like Qwen Image GGUF or FLUX.2 GGUF), Diffusers cannot load it. Use the stable-diffusion.cpp backend instead (either via pip-installed python bindings or sd-cli).
6.1 Install stable-diffusion.cpp runtime¶
The base pip install abstractvision path does not install local inference runtimes. Use one of these explicit stable-diffusion.cpp runtime choices:
pip install "abstractvision[sdcpp]"
This pip binding path is convenient, but it may require a native build or run CPU-only depending on how the wheel was built.
Alternative (external executable):
- Download
sd-clifrom: https://github.com/leejet/stable-diffusion.cpp/releases - Ensure
sd-cliis in yourPATH(or pass a full path as the last arg to/backend sdcpp …).
On macOS (Apple Silicon), sd-cli is the recommended path to get Metal acceleration. If you see Using CPU backend,
install sd-cli and re-run in CLI mode.
6.2 Single-file Stable Diffusion model¶
This is the lowest-friction sdcpp shape: one model file plus an optional sd-cli path. Use it for Stable Diffusion
1.x/2.x/SDXL checkpoints or GGUF conversions that stable-diffusion.cpp can load as --model.
abstractvision repl
/backend sdcpp /path/to/sd-v1-5.gguf /path/to/sd-cli
/t2i "a watercolor painting of a lighthouse" --width 512 --height 512 --steps 10 --open
If sd-cli is already in your PATH, you can omit the final /path/to/sd-cli argument. If it is not available,
AbstractVision falls back to stable-diffusion-cpp-python when that package is installed, for example through pip install "abstractvision[sdcpp]".
6.3 Download the required Qwen Image VAE¶
curl -L -o ./qwen_image_vae.safetensors \
https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors
6.4 Run Qwen Image with sdcpp component mode¶
abstractvision repl
Then:
/backend sdcpp /path/to/qwen-image-2512-Q4_K_M.gguf ./qwen_image_vae.safetensors /path/to/Qwen2.5-VL-7B-Instruct-*.gguf /path/to/sd-cli
/set width 1024
/set height 1024
/t2i "a cinematic photo of a red fox in snow" --sampling-method euler --offload-to-cpu --diffusion-fa --flow-shift 3 --open
Any extra --flag you pass (like --sampling-method euler) is forwarded to the backend as extra.
- CLI mode: forwarded to sd-cli
- Python bindings mode: keys are mapped to python binding kwargs when supported; unsupported keys are ignored (see ../src/abstractvision/backends/stable_diffusion_cpp.py)
- Diffusers backend: only forwards kwargs that the pipeline __call__ accepts; unknown keys are ignored (see ../src/abstractvision/backends/huggingface_diffusers.py)
6.5 FLUX.2-klein-4B (GGUF) example¶
Stable-diffusion.cpp supports FLUX.2-klein-4B GGUF when you provide:
- a GGUF diffusion model (e.g.
flux-2-klein-4b-Q8_0.gguf) - the FLUX.2 VAE (safetensors)
- an LLM text encoder (GGUF), e.g.
Qwen3-4B-Q4_K_M.gguf
You can download the matching set with:
python scripts/download_model_sets.py --set flux2_klein_4b_gguf
Example (REPL):
/backend sdcpp /path/to/flux-2-klein-4b-Q8_0.gguf /path/to/flux2_ae.safetensors /path/to/Qwen3-4B-Q4_K_M.gguf /path/to/sd-cli
/t2i "a product photo of a matte black espresso machine" --steps 4 --guidance-scale 1.0 --sampling-method euler --diffusion-fa --offload-to-cpu --open
FLUX.2-dev and Qwen Image GGUF are still documented here as heavier follow-ups, but try the single-file Stable Diffusion path or klein-4B first when you are testing a fresh machine.
7) Web UI testing (optional): Playground¶
This repo includes a self-contained web UI and local API server. It is owned by AbstractVision and does not require AbstractCore. Treat it as a local/dev testing surface; use AbstractCore/Gateway for production routing, authentication, and browser-origin policy.
7.1 Start the playground¶
abstractvision playground --port 8091
Open:
http://127.0.0.1:8091/vision_playground.html
In the UI: - The API Base URL defaults to the same process that serves the page - Select a cached model and load it - Generate (T2I) or upload an input image (I2I) and run edits
For the endpoint list, see playground/README.md.