Live Content Pipeline

Orchestrator

A content pipeline that turns YouTube videos into published articles. AI-assisted, editorially controlled, running on 10 microservices — all on Cloudflare Workers.

The Orchestrator is the production backbone for the Channel Sites network. It discovers YouTube videos, matches them to scripts, selects images from approved sources, generates SEO-optimized articles with AI, suggests internal links, and publishes to the sites — all through durable Workflows with human-in-the-loop checkpoints.

How It's Built

Architecture and implementation.

YouTube discovery + script matching

Fetches all channel videos via YouTube Data API, stores metadata in D1. Scripts matched to videos via fuzzy Jaccard similarity (≥0.9) against Trello cards. Handles .docx, .pdf, and Google Docs.

AI image selection with editorial allowlist

Claude analyzes scripts to extract image search queries. Sources: Wikimedia, UN Photos, NATO, DVIDS, government Flickr. Each candidate scored for relevance, authenticity, dimensions (≥1200px), and licensing.

SEO article generation pipeline

Multi-step AI pipeline: cleanup → draft from source script (no hallucinated facts) → enrich with key takeaways and FAQ → optimize metadata → assemble final markdown.

Link suggestions via Vectorize

All published articles indexed as embeddings. New articles query for semantically similar content, generating internal and cross-site link suggestions stored in D1 for editorial review.

Human-in-the-loop publishing

Low-confidence images get flagged on Trello for manual approval with a 72-hour polling window. Final articles committed to GitHub as markdown, triggering CI/CD.

Idempotency at every stage

Post existence check before generation. Every pipeline execution recorded in workflow_runs. Per-phase timing, decision data, and AI outputs logged with correlation IDs.

Architecture Map

Request flow and service topology

orchestrator-sync (hourly cron trigger)
    → orchestrator-youtube   (video discovery)
    → orchestrator-trello    (script card matching)
    → orchestrator-store     (D1/R2 persistence)
    → orchestrator-script    (script extraction + normalization)
    → orchestrator-match     (video-to-card matching)
    → orchestrator-workflow  (durable orchestration)
          → orchestrator-image  (image discovery + scoring)
          → orchestrator-seo    (article generation pipeline)
          → orchestrator-linker (semantic link suggestions)

Primitives Used

Every Cloudflare binding in this project.

Workers 10 microservice workers with focused responsibilities
Workflows Durable multi-step orchestration with retry and state persistence
D1 Videos, posts, Trello cards, workflow runs, SEO pages, link placements, channel config
R2 Normalized scripts, AI output artifacts, image metadata, audit traces
AI Gateway Rate-limited, authenticated access to Claude for content and image scoring
Workers AI Text embeddings for Vectorize and alternative LLM inference
Vectorize Article embeddings for semantic link suggestion matching
Service Bindings Zero-latency inter-worker communication across all 10 services
Cron Triggers Hourly sync runner

What Makes This Interesting

The architectural angle worth paying attention to.

The Orchestrator is a content pipeline built like a production system: idempotency at every stage, durable execution with Workflows, human-in-the-loop approval via Trello with async polling, fact-preservation constraints on AI generation, source allowlisting for image provenance, full audit trails with timing and decision data. The microservice decomposition via Workers + service bindings means each stage scales, deploys, and fails independently. Total infrastructure: 10 Workers, a D1 database, two R2 buckets, and a Vectorize index.