Live Observability Infrastructure

Analytics Worker

Observability infrastructure for the whole network. Uptime monitoring, incident management, analytics proxying, and runbook execution — one Worker, zero servers.

The Analytics Worker is the operational backbone. It monitors every site in the network, alerts on downtime, proxies analytics past ad blockers, and exposes observability APIs that Golem and dashboards consume. It's the thing that wakes me up when something's broken.

How It's Built

Architecture and implementation.

PostHog analytics proxy

Ad blockers kill analytics. Every channel site gets a ph.<domain> subdomain. The Worker proxies requests to PostHog's EU servers — static assets to CDN, API calls to ingestion endpoint. Browser sees first-party subdomain.

Uptime monitoring with alerting

Cron trigger every 5 minutes. Checks all targets: HTTP status, response body validity, latency (flags anomalies above 3.5s), form verification. State transitions trigger Discord alerts and Golem webhooks.

Job heartbeat supervision

Background jobs POST heartbeats to /ops/jobs/heartbeat. Configurable SLA per job — if a job hasn't checked in within the allowed window, an incident is created. Catches the silent failures that cron monitoring misses.

Observability REST API

Authenticated endpoints: GET /ops/summary, /ops/incidents, /ops/anomalies, /ops/target/:id. POST /ops/targets, /ops/runbook/:id/execute, /ops/restart/:target. Consumed by Golem and dashboards.

Runbook execution (approval-gated)

Runbook execution creates an incident and sends a webhook to Golem for approval. Only allowlisted runbook IDs accepted. Actual execution flows through Golem's policy engine with Discord confirmation required.

Frontend crash logging

POST /_log endpoint accepts crash reports from frontend apps, authenticated via shared token. Logs to KV with plans to forward critical crashes to the incident system and Discord.

Architecture Map

Request flow and service topology

Browser analytics → ph.<domain> → Worker → PostHog EU API / CDN

Cron (every 5 min) → Monitor all targets
                       ├── HTTP checks (status, body, latency)
                       ├── State transitions → KV
                       ├── Incidents → KV + Discord notification
                       └── Webhook → Golem gateway

Background jobs → POST /ops/jobs/heartbeat → KV timestamp
                   ↓ (if stale past SLA)
                   Incident created

Golem / Dashboard → GET /ops/* → Health summaries, incidents, anomalies
                  → POST /ops/runbook/:id/execute → Incident + Golem webhook

Frontend → POST /_log → KV crash log

Primitives Used

Every Cloudflare binding in this project.

Workers Single worker handling all proxy, monitoring, and API functions
KV Monitoring state (target status, incidents), job heartbeat timestamps, crash logs
Cron Triggers 5-minute monitoring cycle
Zone Routes 13 domain-specific route bindings for analytics proxy (ph.<domain>)

What Makes This Interesting

The architectural angle worth paying attention to.

This is a complete observability platform — monitoring, alerting, incident management, job supervision, runbook execution, analytics proxying — running as a single Cloudflare Worker with KV for state. No Prometheus. No Grafana. No PagerDuty. No dedicated monitoring infrastructure. The cron trigger fires every 5 minutes, checks everything, updates KV, sends alerts if states changed, and goes back to sleep.