Live Content Pipeline

Orchestrator

Content pipeline that turns YouTube videos into published articles with AI assistance and editorial controls.

Orchestrator powers publishing for the Channel Sites network. It handles discovery, script matching, image selection, article generation, link suggestions, and publishing flow with checkpoints and review steps.

← All projects

Architecture

How it's wired.

How It's Built

Implementation notes.

YouTube discovery + script matching

Pulls channel videos from the YouTube Data API and stores metadata in D1. Scripts are matched with fuzzy Jaccard similarity (>=0.9) against Trello cards. Supports .docx, .pdf, and Google Docs.

AI image selection with editorial allowlist

Claude extracts image search queries from scripts. Sources are allowlisted (Wikimedia, UN Photos, NATO, DVIDS, government Flickr). Candidates are scored for relevance, authenticity, size (>=1200px), and licensing.

SEO article generation pipeline

Multi-step AI flow: cleanup, source-grounded draft, key takeaways/FAQ enrichment, metadata optimization, then final markdown assembly.

Link suggestions via Vectorize

Published articles are indexed as embeddings. New articles query similar content to suggest internal and cross-site links for editorial review in D1.

Human-in-the-loop publishing

Low-confidence images are sent to Trello for manual approval with a 72-hour polling window. Final articles are committed to GitHub as markdown to trigger CI/CD.

Idempotency at every stage

Posts are checked for existence before generation. Each run is recorded in workflow_runs with per-phase timing, decision data, and AI outputs tied to a correlation ID.

Primitives Used

Cloudflare primitives in this project.

Workers 10 microservice workers with focused responsibilities

Workflows Durable multi-step orchestration with retry and state persistence

D1 Videos, posts, Trello cards, workflow runs, SEO pages, link placements, channel config

R2 Normalized scripts, AI output artifacts, image metadata, audit traces

AI Gateway Rate-limited, authenticated access to Claude for content and image scoring

Workers AI Text embeddings for Vectorize and alternative LLM inference

Vectorize Article embeddings for semantic link suggestion matching

Service Bindings Zero-latency inter-worker communication across all 10 services

Cron Triggers Hourly sync runner

Why This Design

Why I built it this way.

"Orchestrator is useful because it combines AI generation with editorial controls and operational durability. Each stage is isolated, retriable, and observable, so failures are easier to debug and recover from."

← All projects Why Cloudflare