Why I Built This
Content libraries grow fast. Manually tagging thousands of video, image, and audio files is slow, error-prone, and doesn't scale — every file needs a human to describe it before it becomes searchable. I built ADAIS to automate that entirely.
ADAIS is a full-stack system that accepts any file, detects its type automatically, routes it through an AI provider of your choice, and returns structured, searchable metadata — labels, transcripts, entities, sentiment, dominant colours, and more. No manual tagging required.
All code is available at github.com/foobearer/ai-content-pipeline. Clone it and follow along or build from scratch with this guide.
Architecture Overview
ADAIS has two parts: a FastAPI backend that handles file ingestion, provider routing, and analysis, and a React + Vite frontend that provides the upload UI and renders results. The key architectural decision was the Provider Pattern — an abstract interface that all three AI providers implement.
React + Vite + Tailwind (port 5173)
│
│ POST /analyse/auto (multipart/form-data)
▼
FastAPI Backend (port 8000)
┌──────────────────────────────────┐
│ Route → FileHandler → Provider │
│ │
│ ┌──────────────────────────┐ │
│ │ Provider Factory │ │
│ └────┬──────────┬─────┬───┘ │
└────────┼──────────┼─────┼───────┘
│ │ │
OpenAI Google HuggingFace
GPT-4o Vision BLIP+BART
Whisper Speech Whisper-base
adais/
├── backend/
│ └── src/
│ ├── main.py # FastAPI app + all routes
│ ├── config.py # Settings from .env
│ ├── providers/
│ │ ├── base.py # Abstract interface
│ │ ├── __init__.py # Provider factory
│ │ ├── openai_provider.py
│ │ ├── google_provider.py
│ │ └── huggingface_provider.py
│ ├── models/
│ │ └── schemas.py # Pydantic models
│ └── utils/
│ └── file_handler.py # Upload validation
└── frontend/
└── src/
├── App.tsx
├── components/ # One file per component
├── hooks/useAnalysis.ts # All state logic
├── types/api.ts # TypeScript types
└── utils/api.ts # fetch wrapper
Backend: FastAPI + the Provider Pattern
FastAPI is an excellent choice for AI pipelines — it's async by default, auto-generates Swagger docs from type hints, and Pydantic handles all validation. Start with the data models — everything else is built around them.
Define your schemas first
All request and response shapes live in schemas.py — the single source of truth. The frontend TypeScript types mirror these exactly.
from pydantic import BaseModel, Field
from enum import Enum
class Provider(str, Enum):
OPENAI = "openai"
GOOGLE = "google"
HUGGINGFACE = "huggingface"
class Label(BaseModel):
name: str
confidence: float = Field(..., ge=0.0, le=1.0)
class ImageAnalysis(BaseModel):
labels: list[Label] = []
extracted_text: str | None = None
dominant_colours: list[str] = []
description: str | None = None
tags: list[str] = []
class AnalysisResult(BaseModel):
job_id: str
content_type: str
provider: ProviderInfo
image_analysis: ImageAnalysis | None = None
video_analysis: VideoAnalysis | None = None
text_analysis: TextAnalysis | None = None
Using str | None (Python 3.10+ union syntax) instead of Optional[str] is cleaner and now standard. Pydantic v2 supports both.
The abstract base class
This is the heart of the architecture. All three providers implement the same interface — routes never need to know which provider they're calling.
from abc import ABC, abstractmethod
from pathlib import Path
class BaseProvider(ABC):
@property
@abstractmethod
def name(self) -> str: ...
@abstractmethod
async def analyse_image(self, path: Path) -> ImageAnalysis: ...
@abstractmethod
async def analyse_video(self, path: Path) -> VideoAnalysis: ...
@abstractmethod
async def analyse_text(self, text: str) -> TextAnalysis: ...
The provider factory
The factory is the only place that knows which class maps to which provider. Lazy imports mean missing optional dependencies don't crash the app on startup.
def get_provider(provider: Provider) -> BaseProvider:
if provider == Provider.OPENAI:
from src.providers.openai_provider import OpenAIProvider
return OpenAIProvider()
if provider == Provider.GOOGLE:
from src.providers.google_provider import GoogleProvider
return GoogleProvider()
if provider == Provider.HUGGINGFACE:
from src.providers.huggingface_provider import HuggingFaceProvider
return HuggingFaceProvider()
The main route
With the provider pattern in place, each route is a thin wrapper — validate, get provider, call method, return result.
@app.post("/analyse/auto", response_model=AnalysisResult)
async def analyse_auto(
file: UploadFile = File(...),
provider: Provider = Form(Provider.HUGGINGFACE),
):
path, content_type = await save_upload(file)
ai = get_provider(provider)
if content_type == ContentType.IMAGE:
result = await ai.analyse_image(path)
elif content_type == ContentType.VIDEO:
result = await ai.analyse_video(path)
else:
text = await extract_text(path)
result = await ai.analyse_text(text)
cleanup(path)
return result
Implementing the Three AI Providers
Each provider implements the same three methods. Here's how each works internally.
| Feature | OpenAI | Google Cloud | HuggingFace |
|---|---|---|---|
| Image analysis | GPT-4o Vision | Vision API | BLIP + ViT |
| Video transcript | Whisper-1 | Speech-to-Text | Whisper-base (local) |
| Text summary | GPT-4o-mini | Not supported | BART-large-CNN |
| Named entities | GPT-4o-mini | Natural Language API | BERT-NER |
| API key required | Yes | Yes | No — runs locally |
| Cost | Pay per use | Pay per use | Free |
OpenAI — structured JSON from GPT-4o
The key technique is response_format: {"type": "json_object"} which forces GPT-4o to always return valid parseable JSON. Combined with a structured system prompt, you get reliable output every time.
async def analyse_image(self, path: Path) -> ImageAnalysis:
image_b64 = base64.standard_b64encode(path.read_bytes()).decode()
resp = await self._client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"}, # always valid JSON
messages=[
{"role": "system", "content": "Return JSON with keys: labels, "
"objects, extracted_text, description, tags, is_safe"},
{"role": "user", "content": [{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}
}]}
]
)
raw = json.loads(resp.choices[0].message.content)
return ImageAnalysis(**raw)
HuggingFace — lazy-loading local models
Loading a transformer model takes 5–10 seconds. Cache pipelines after first load so subsequent calls are instant. Use run_in_executor to avoid blocking FastAPI's async event loop.
class HuggingFaceProvider(BaseProvider):
def __init__(self):
self._pipelines: dict = {} # cache — load each model only once
async def _load_pipeline(self, task: str, model: str):
if task not in self._pipelines:
from transformers import pipeline
# run_in_executor: blocking load in thread,
# doesn't block the async event loop
self._pipelines[task] = await asyncio.get_event_loop().run_in_executor(
None, lambda: pipeline(task, model=model)
)
return self._pipelines[task]
Never call blocking code directly in an async function. Always wrap with run_in_executor so FastAPI can continue serving other requests while a model loads.
Frontend: React + Vite + Tailwind
The frontend follows a clear separation: one custom hook owns all state and API logic, components are purely presentational. Here's the component tree:
App.tsx
├── ProviderSelector # three provider cards
├── DropZone # drag-and-drop file input
├── AnalyseButton # submit + loading state
└── ResultsPanel # routes by content type
├── MetaCard # job ID, provider, time
├── ImageResults # labels, OCR, colours
├── VideoResults # transcript + segments
└── TextResults # summary, sentiment, entities
The useAnalysis hook — all state in one place
Every piece of state lives in a single custom hook. App.tsx just calls useAnalysis() and passes values to components — zero business logic in the component tree.
type Status = 'idle' | 'analysing' | 'done' | 'error'
export function useAnalysis() {
const [status, setStatus] = useState<Status>('idle')
const [result, setResult] = useState<AnalysisResult | null>(null)
const [file, setFile] = useState<File | null>(null)
const [error, setError] = useState<string | null>(null)
const analyse = useCallback(async (provider: Provider) => {
if (!file) return
setStatus('analysing')
try {
const data = await analyseFile(file, provider)
setResult(data); setStatus('done')
} catch (e) {
setError(e instanceof Error ? e.message : 'Failed')
setStatus('error')
}
}, [file])
return { status, result, error, file, setFile, analyse }
}
Vite proxy — no CORS headaches in development
Configure Vite to proxy API calls to FastAPI. Your frontend calls /analyse/auto and Vite silently forwards it to localhost:8000.
export default defineConfig({
plugins: [react()],
server: {
proxy: {
'/analyse': 'http://localhost:8000',
'/health': 'http://localhost:8000',
'/providers': 'http://localhost:8000',
}
}
})
TypeScript Types from Pydantic Schemas
Keep your backend Pydantic schemas and frontend TypeScript types in sync. When the API shape changes, TypeScript immediately tells you every component that needs fixing.
// Mirrors backend/src/models/schemas.py exactly
export type Provider = 'openai' | 'google' | 'huggingface'
export interface Label {
name: string
confidence: number // 0.0 – 1.0
}
export interface AnalysisResult {
job_id: string
content_type: ContentType
provider: ProviderInfo
image_analysis: ImageAnalysis | null
video_analysis: VideoAnalysis | null
text_analysis: TextAnalysis | null
}
For larger projects, use openapi-typescript to auto-generate types from FastAPI's OpenAPI spec at /openapi.json.
Running It Locally
You only need Python 3.10+ and Node 18+. Open two terminal windows and follow these steps.
Clone the repo
Get the code from GitHub and navigate into the project.
Start the backend
Create a virtual environment, install dependencies, copy .env.example to .env, start FastAPI.
Start the frontend
Install Node dependencies and start the Vite dev server in a second terminal.
Open the app
UI at localhost:5173 · Auto-generated API docs at localhost:8000/docs
git clone https://github.com/foobearer/ai-content-pipeline.git
cd ai-content-pipeline/backend
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
uvicorn src.main:app --reload
cd ../frontend
npm install
npm run dev
Using the HuggingFace provider for the first time downloads approximately 2GB of model weights. This is a one-time download — models are cached locally and subsequent runs start instantly.
Lessons Learned
Use the provider pattern from day one
A common mistake is coupling your code directly to one AI provider. The AI landscape moves fast — a model that is best today may not be in six months, and you might need to swap for cost, performance, or compliance reasons. The abstract base class pattern solves this from day one: add a new provider by implementing four methods, and nothing else in the codebase changes. This was the single most valuable architectural decision in this project.
Validate file types by content, not extension
A user can rename virus.exe to photo.jpg. Always read the file's magic bytes — the first few bytes that identify the format — rather than trusting the extension. The file_handler.py utility does this before any file touches an AI model.
run_in_executor is non-negotiable for blocking calls
FastAPI is async — calling a blocking function directly (model loading, synchronous SDK calls) blocks the entire server. Every blocking operation in ADAIS runs in a thread pool via asyncio.get_event_loop().run_in_executor(None, ...).
Keep TypeScript types in sync with Pydantic
For a project this size, manually mirroring the types is fast and readable. The discipline of updating both files when a schema changes is worth it — TypeScript's compiler immediately surfaces every broken component.
The full project is on GitHub at github.com/foobearer/ai-content-pipeline — clone it, run it, and use it as a starting point for your own data analysis and indexing projects. Questions or want to extend it with a new provider? Open an issue or reach out directly.