Skip to content

AbstractVision architecture

AbstractVision is a model-agnostic Python layer that standardizes generative vision outputs behind a small API: text→image, image→image (and optionally video when a backend supports it).

This document describes the current code in this repo and links to the supporting reference docs.

See also: - Docs index: docs/README.md - Getting started: docs/getting-started.md - API reference: docs/api.md - FAQ: docs/faq.md - Backends: docs/reference/backends.md - Capability registry: docs/reference/capabilities-registry.md - Artifacts: docs/reference/artifacts.md - AbstractCore integration: docs/reference/abstractcore-integration.md

AbstractFramework ecosystem (positioning)

AbstractVision is one component in the AbstractFramework ecosystem:

Where AbstractVision fits: - AbstractVision focuses on producing images/videos (generators). - AbstractCore focuses on orchestration, tool calling, and higher-level workflows (it can discover AbstractVision via the plugin entry point in pyproject.toml and src/abstractvision/integrations/abstractcore_plugin.py). - AbstractRuntime provides runtime services and an artifact store interface; RuntimeArtifactStoreAdapter bridges AbstractVision to an AbstractRuntime-style artifact store (src/abstractvision/artifacts.py).

Scope (and non-goals)

AbstractVision focuses on producing images/videos.

It is not the owner of “LLM image/video input attachments” (multimodal inputs to LLMs); those concerns live in higher-level layers (e.g., AbstractCore).

Key components (with evidence pointers)

High-level flow (library mode)

flowchart LR
  Caller[Caller<br/>(Python / CLI)] --> VM[VisionManager]
  VM -->|request dataclass| BE[VisionBackend]
  BE -->|GeneratedAsset| VM
  VM -->|store set| Store[MediaStore<br/>(LocalAssetStore / Runtime adapter)]
  Store --> Ref[Artifact ref dict]
  VM -->|store not set| Asset[GeneratedAsset<br/>(bytes + mime)]

Notes (anchored in code): - VisionManager creates request dataclasses like ImageGenerationRequest / ImageEditRequest (../src/abstractvision/types.py). - When store is set, VisionManager._maybe_store() calls store.store_bytes(...) and returns an artifact ref dict (../src/abstractvision/vision_manager.py, ../src/abstractvision/artifacts.py).

Capability gating (model-level) vs runtime gating (backend-level)

AbstractVision separates two kinds of “can I do this?” checks:

1) Model-level gating (optional): “Does model X support task Y?” - Implemented by VisionModelCapabilitiesRegistry.require_support(...) (../src/abstractvision/model_capabilities.py) - Used by VisionManager._require_model_support(...) when VisionManager.model_id is set (../src/abstractvision/vision_manager.py)

2) Backend-level gating (best-effort): “Does this configured backend support task Y / mask edits?” - Backends may implement get_capabilities() returning VisionBackendCapabilities (../src/abstractvision/types.py) - Enforced by VisionManager._require_backend_support(...) and mask checks in VisionManager.edit_image(...) (../src/abstractvision/vision_manager.py)

Backend reality (what runs today)

The public API includes text_to_video, image_to_video, and multi_view_image, but backend support is currently limited:

For a detailed support matrix and configuration options, see docs/reference/backends.md.

AbstractCore plugin flow (framework integration)

AbstractVision can be discovered by AbstractCore via an entry point: [project.entry-points."abstractcore.capabilities_plugins"] in ../pyproject.toml.

flowchart LR
  AC[AbstractCore] -->|loads entry point| Plugin[AbstractVision plugin<br/>register(...)]
  Plugin --> Cap[VisionCapability<br/>(t2i/i2i/t2v/i2v)]
  Cap --> VM[VisionManager]
  VM --> BE{Configured backend}
  BE --> HTTP[OpenAI-compatible HTTP<br/>OpenAI or local /v1 server]
  BE --> HF[Local Diffusers]
  BE --> SDCPP[Local stable-diffusion.cpp]

Current plugin behavior (evidence in ../src/abstractvision/integrations/abstractcore_plugin.py): - Default: OpenAI HTTP with backend id abstractvision:openai; the legacy backend id abstractvision:openai-compatible remains registered and preserves compatible-endpoint defaults when selected directly. - Compatible endpoints should set ABSTRACTVISION_BACKEND=openai-compatible plus ABSTRACTVISION_BASE_URL; legacy base-url-only configs still resolve as compatible endpoints. - Local Diffusers and stable-diffusion.cpp are supported when vision_backend / ABSTRACTVISION_BACKEND selects diffusers or sdcpp. - Configuration is read from owner.config keys like vision_base_url, vision_model_id, vision_backend, and backend-specific keys, then falls back to ABSTRACTVISION_* and standard OpenAI env vars where relevant.

Extending AbstractVision (practical steps)