Live AI Operations Agent

Golem

AI operations agent with policy-gated actions, durable memory, and real infrastructure side effects.

Golem runs through Discord and helps with operational tasks across my systems. It includes confirmation workflows, rate limits, circuit breakers, and audit logs so actions are controlled and reviewable.

← All projects

Architecture

How it's wired.

How It's Built

Implementation notes.

Ack-fast webhook pattern

Discord webhook returns HTTP 200 immediately. Processing continues asynchronously in a Session Durable Object. This avoids timeouts and duplicate retry storms.

Session coordination with Durable Objects

Each Discord chat maps to one DO instance. It handles deduplication (24-hour replay window), session state, and pending action coordination.

Agent execution with Workflows

GolemAgentWorkflow runs the main loop: load context/history, call Claude or parse commands, check policy, then execute or request confirmation. The loop runs up to 10 turns.

Policy engine

Each capability has a D1 policy record for confirmation, rate limits, cooldowns, circuit breakers, and mutation freezes. Policies are enforced before any tool runs.

Isolated skill workers

Each capability runs in its own Worker via service bindings (web-fetch, email, hosting, server-admin, observability). A skill can fail or redeploy without taking down the gateway.

Durable memory

Facts are stored in D1 with confidence scores and source attribution. Memory is injected into Claude context on each turn. Redaction rules block retrieval of sensitive topics.

Primitives Used

Cloudflare primitives in this project.

Workers 6 service workers: gateway + 5 skill workers

Durable Objects Session coordination, dedup, pending action state machine

Workflows Agent execution orchestration with checkpoints and retry

D1 Sessions, messages, policies, pending actions, tool runs, memories, audit logs

R2 Soul files defining agent personality and instructions

Vectorize Semantic memory retrieval (in progress)

Service Bindings Zero-latency inter-worker calls for skill isolation

Cron Triggers Pending action expiry every 5 minutes

Why This Design

Why I built it this way.

"Golem is where AI tooling meets operations safely. The important part is not just tool calling; it is the safety envelope around tool calling: policy checks, confirmation gates, replay protection, and durable logs."

← All projects Why Cloudflare