// Multi-Agent System
Boxman 3000, a five-agent operations council managing four parallel businesses
A multi-agent AI system that runs strategy, ad analytics, content, finance, and engineering across four parallel businesses, with confirm-gate boundaries on every write.
The problem
Running four independent businesses solo, with one human, is a coordination problem before it is a workload problem. Strategy decisions need ad data, ad changes need content alignment, content needs finance signal, finance needs engineering reality, and every domain needs the others without anyone collapsing into a single overloaded generalist. A single chat assistant cannot hold all of that context without losing it on every turn. A queue of cron jobs cannot reason across domains.
The architecture
A council of specialized agents, each scoped to one operational domain (strategy, ad analytics, content, finance, engineering). Agents share a single source of truth through a shared-context layer rather than passing state through prompts. Every destructive action (publishing, spending, sending, deleting) is gated by a structured confirm step that names the action, scope, and target before any write.
- Each agent has a tightly scoped system prompt and a tightly scoped tool set. Strategy cannot publish. Content cannot move money. Engineering cannot send email.
- Shared-context layer is a flat-file plus REST surface that every agent reads from and writes to, so coordination is durable across sessions.
- Confirm-gate pattern: write tools return an
action_idproposal. Nothing happens until the human confirms with that id. The agent does not retry a confirm. - Long-running orchestration runs under PM2 with the same operational discipline as any other production service: logs, restarts, alerts.
- New agent capabilities ship as MCP tool surfaces, not as bespoke agent code, so the same toolset can be invoked from any LLM client.
What it does in production
- Operates daily across the four businesses without manual orchestration, with the human in the loop only on writes.
- Replaces a class of decisions that previously required context-switching between four separate dashboards.
- Has been running continuously for months under the same architecture, with new domains added by appending agents rather than rewriting the system.
What I would do differently
The shared-context layer started as a flat-file pattern and outgrew it. Next iteration moves to a typed Postgres-backed context store with explicit schemas per domain, which would catch a class of cross-agent shape mismatches at the boundary instead of inside an agent.