AbstractVision architecture¶
AbstractVision is a model-agnostic Python layer that standardizes generative vision outputs behind a small API: text→image, image→image (and optionally video when a backend supports it).
This document describes the current code in this repo and links to the supporting reference docs.
See also: - Docs index: docs/README.md - Getting started: docs/getting-started.md - API reference: docs/api.md - FAQ: docs/faq.md - Backends: docs/reference/backends.md - Capability registry: docs/reference/capabilities-registry.md - Artifacts: docs/reference/artifacts.md - AbstractCore integration: docs/reference/abstractcore-integration.md
AbstractFramework ecosystem (positioning)¶
AbstractVision is one component in the AbstractFramework ecosystem:
- AbstractFramework (project hub): https://github.com/lpalbou/AbstractFramework
- AbstractCore (orchestration + tool calling): https://github.com/lpalbou/abstractcore
- AbstractRuntime (runtime services, including artifact storage): https://github.com/lpalbou/abstractruntime
Where AbstractVision fits:
- AbstractVision focuses on producing images/videos (generators).
- AbstractCore focuses on orchestration, tool calling, and higher-level workflows (it can discover AbstractVision via the plugin entry point in pyproject.toml and src/abstractvision/integrations/abstractcore_plugin.py).
- AbstractRuntime provides runtime services and an artifact store interface; RuntimeArtifactStoreAdapter bridges AbstractVision to an AbstractRuntime-style artifact store (src/abstractvision/artifacts.py).
Scope (and non-goals)¶
AbstractVision focuses on producing images/videos.
It is not the owner of “LLM image/video input attachments” (multimodal inputs to LLMs); those concerns live in higher-level layers (e.g., AbstractCore).
Key components (with evidence pointers)¶
- Orchestrator:
VisionManager - Delegates execution to a backend.
- Optionally gates requests using the capability registry when
model_idis set. - Optionally stores outputs and returns artifact refs when
storeis set. - Backend contract:
VisionBackend - Implementations live in
../src/abstractvision/backends/. - Capability registry:
VisionModelCapabilitiesRegistry - Loads packaged data:
vision_model_capabilities.json. - Artifact outputs:
MediaStore,LocalAssetStore,RuntimeArtifactStoreAdapter - Artifact ref helper:
is_artifact_ref()(see../src/abstractvision/artifacts.py). - CLI/REPL:
abstractvisionentrypoint (../src/abstractvision/cli.py) - Lets you inspect the registry and manually test generation backends.
- AbstractCore integration:
- Capability plugin:
../src/abstractvision/integrations/abstractcore_plugin.py(registered inpyproject.toml) - Tool helpers:
../src/abstractvision/integrations/abstractcore.py
High-level flow (library mode)¶
flowchart LR
Caller[Caller<br/>(Python / CLI)] --> VM[VisionManager]
VM -->|request dataclass| BE[VisionBackend]
BE -->|GeneratedAsset| VM
VM -->|store set| Store[MediaStore<br/>(LocalAssetStore / Runtime adapter)]
Store --> Ref[Artifact ref dict]
VM -->|store not set| Asset[GeneratedAsset<br/>(bytes + mime)]
Notes (anchored in code):
- VisionManager creates request dataclasses like ImageGenerationRequest / ImageEditRequest (../src/abstractvision/types.py).
- When store is set, VisionManager._maybe_store() calls store.store_bytes(...) and returns an artifact ref dict (../src/abstractvision/vision_manager.py, ../src/abstractvision/artifacts.py).
Capability gating (model-level) vs runtime gating (backend-level)¶
AbstractVision separates two kinds of “can I do this?” checks:
1) Model-level gating (optional): “Does model X support task Y?”
- Implemented by VisionModelCapabilitiesRegistry.require_support(...) (../src/abstractvision/model_capabilities.py)
- Used by VisionManager._require_model_support(...) when VisionManager.model_id is set (../src/abstractvision/vision_manager.py)
2) Backend-level gating (best-effort): “Does this configured backend support task Y / mask edits?”
- Backends may implement get_capabilities() returning VisionBackendCapabilities (../src/abstractvision/types.py)
- Enforced by VisionManager._require_backend_support(...) and mask checks in VisionManager.edit_image(...) (../src/abstractvision/vision_manager.py)
Backend reality (what runs today)¶
The public API includes text_to_video, image_to_video, and multi_view_image, but backend support is currently limited:
- Built-in backends implement images (
text_to_image,image_to_image): - OpenAI-compatible HTTP backend (
../src/abstractvision/backends/openai_compatible.py) - Diffusers backend (
../src/abstractvision/backends/huggingface_diffusers.py) - stable-diffusion.cpp backend (
../src/abstractvision/backends/stable_diffusion_cpp.py) - Video is supported only by the OpenAI-compatible backend, and only when
text_to_video_path/image_to_video_pathare configured (../src/abstractvision/backends/openai_compatible.py). - No built-in backend implements
multi_view_imageyet (they raiseCapabilityNotSupportedErroringenerate_angles(...)).
For a detailed support matrix and configuration options, see docs/reference/backends.md.
AbstractCore plugin flow (framework integration)¶
AbstractVision can be discovered by AbstractCore via an entry point:
[project.entry-points."abstractcore.capabilities_plugins"] in ../pyproject.toml.
flowchart LR
AC[AbstractCore] -->|loads entry point| Plugin[AbstractVision plugin<br/>register(...)]
Plugin --> Cap[VisionCapability<br/>(t2i/i2i/t2v/i2v)]
Cap --> VM[VisionManager]
VM --> BE{Configured backend}
BE --> HTTP[OpenAI-compatible HTTP<br/>OpenAI or local /v1 server]
BE --> HF[Local Diffusers]
BE --> SDCPP[Local stable-diffusion.cpp]
Current plugin behavior (evidence in ../src/abstractvision/integrations/abstractcore_plugin.py):
- Default: OpenAI HTTP with backend id abstractvision:openai; the legacy backend id abstractvision:openai-compatible remains registered and preserves compatible-endpoint defaults when selected directly.
- Compatible endpoints should set ABSTRACTVISION_BACKEND=openai-compatible plus ABSTRACTVISION_BASE_URL; legacy base-url-only configs still resolve as compatible endpoints.
- Local Diffusers and stable-diffusion.cpp are supported when vision_backend / ABSTRACTVISION_BACKEND selects diffusers or sdcpp.
- Configuration is read from owner.config keys like vision_base_url, vision_model_id, vision_backend, and backend-specific keys, then falls back to ABSTRACTVISION_* and standard OpenAI env vars where relevant.
Extending AbstractVision (practical steps)¶
- Add a new backend:
1) Implement
VisionBackend(../src/abstractvision/backends/base_backend.py) 2) Add capability reporting viaget_capabilities()when you can (optional) 3) Add tests under../tests/ - Update the registry:
1) Edit
../src/abstractvision/assets/vision_model_capabilities.json2) Validate by running the test suite (validator is wired into the registry loader) 3) Useabstractvision show-model <id>to sanity-check task/param printing (../src/abstractvision/cli.py)