How We Orchestrate 6 AI Agents to Run an Online Business

The Architecture

Most "multi-agent" demos show agents chatting with each other in a loop until they produce an answer. That is not what we built.

Our agents are more like employees at a small company. They each have a role, a task queue, a budget, and a chain of command. They don't talk to each other directly — they coordinate through a shared task system called Paperclip.

What Is Paperclip?

Paperclip is the control plane. It's a task management system designed specifically for AI agent coordination. Think of it like Jira, except instead of humans picking up tickets, agents wake up on a schedule, check their inbox, and execute.

Each agent has:

A role and title (e.g., "Content Writer — Aviation Content Writer")
Capabilities description (what it knows how to do)
A heartbeat schedule (currently: hourly)
A budget (monthly spend limit)
A chain of command (who it reports to)

The Heartbeat Model

Every agent runs in heartbeats — short windows of execution (typically 5-20 minutes). When a heartbeat fires:

Check identity — confirm which agent I am
Check assignments — what tasks are in my queue with status todo or in_progress?
Checkout the task — acquire a lock so no other agent picks it up simultaneously
Read context — understand the issue, parent tasks, comments from other agents
Do the work — use tools (file system, browser, APIs) to complete the task
Update status — mark done, blocked, or add a comment if still in progress
Exit — the heartbeat ends

The checkout mechanism is critical. It's a distributed lock — if two agents try to checkout the same task simultaneously, only one wins and the other gets a 409 Conflict error. This prevents duplicate work without requiring agents to coordinate in real time.

Task Communication

Agents communicate through issue comments. If the Content Writer needs something from the CEO, it leaves a comment on the issue and either marks it blocked (the CEO will see it on next heartbeat) or @-mentions the CEO to trigger an immediate wake.

@-mentions are used sparingly — they consume budget and create noise. Most coordination happens naturally through status changes.

The CEO Agent

The CEO is different from other agents. Its primary job is:

Monitoring the dashboard (task counts, blockers, agent activity)
Promoting backlog tasks to todo when agents become idle
Creating new subtasks when work needs to be broken down
Escalating blockers to the board (human owner)

The CEO doesn't do deep individual-contributor work — it does light coordination and keeps the pipeline flowing. In practice, it wakes up every hour, checks which agents are idle, and assigns them work.

Tool Access

Each agent has access to the same base toolkit:

File system (read/write/edit)
Bash (run commands, scripts)
Web search and fetch
Browser automation (Playwright)
Paperclip API (task management)

Specialized agents have additional access patterns. The Founding Engineer focuses on third-party API integrations. The Data Engineer focuses on data pipelines and monitoring scripts. The Growth Engineer focuses on SEO tooling and analytics.

What Works Well

Parallelism is real. At peak, we had 5 agents running simultaneous tasks across different domains. The Content Writer was drafting articles while the Founding Engineer was building Redbubble automation while the Product Engineer was setting up Payhip listings. None of them needed to wait for the others.

The checkout lock prevents conflicts. We haven't had a single duplicate-work incident. Agents correctly identify when a task is already checked out and move on.

Vertical specialization reduces hallucination. The Content Writer agent's context is full of writing-relevant work. It doesn't drift into infrastructure tasks. Role boundaries naturally constrain agent behavior.

What Doesn't Work Yet

Cross-agent dependencies are clunky. If the Content Writer needs real affiliate links before it can finalize an article, and those links depend on the Founding Engineer completing an integration, which depends on the board providing API credentials — the chain-of-dependency creates a lot of blocked tasks and context switching.

Agents can't escalate urgency well. If a critical blocker appears at 2am, no heartbeat runs to wake the board. The system is passive — it waits for the next scheduled run. We need a notification layer.

Budget visibility per task is limited. We know total monthly spend per agent but not per-task cost. This makes it hard to know which tasks are "worth it" and which are burning budget on low-value work.

The Stack

Agent runtime: Claude Sonnet 4.6 (claude-sonnet-4-6)
Orchestration: Paperclip (Anthropic's agent coordination platform)
Adapters: claude_local (agents run on local machines via Claude Code)
Infrastructure: Next.js + Vercel + Supabase
Content storage: MDX files in the repository
Product hosting: Payhip, Gumroad, Redbubble (in progress), Etsy (blocked)

The agents run locally on macOS hardware. Heartbeats are triggered by the Paperclip scheduler or by event-based wakes (task assignments, @-mentions).

What's Next

We're exploring a few improvements:

A proper dependency graph — tasks that declare upstream dependencies so they don't block on polling
Budget-per-task tracking — know the cost of what you're producing
A human notification bridge — critical blockers should page the board, not just wait

This is all early-stage. The architecture works for where we are. When we have real revenue, we'll invest in the infrastructure layer.