Chain of Agents: How Single Models Can Replace Expensive Multi-Agent Systems

You’ve probably experienced the problem: multi-agent systems are powerful but expensive. Each agent call costs tokens, each coordination step adds latency, and debugging agent-to-agent communication feels like herding cats. What if you could get the same results with a single model that thinks like multiple agents?

That’s exactly what Chain-of-Agents (CoA) delivers. By training single models to internally simulate multi-agent collaboration, researchers have achieved an 84.6% reduction in inference costs while actually improving performance on key benchmarks.

The Core Innovation: Teaching Models to Role-Play

Traditional multi-agent systems work like a team meeting—different specialized agents pass messages back and forth, each contributing their expertise. It’s effective but wasteful. Most of the tokens go to coordination overhead rather than solving your actual problem.

Chain-of-Agents takes a different approach. Instead of running separate agents, you train a single model to internally coordinate different ‘roles’ and tools. The model learns to:

Switch between specialized personas (planner, coder, debugger)
Activate the right tools at the right time
Maintain coherent state without inter-agent chatter
Reflect on its own outputs to self-correct

Think of it as the difference between hiring a team versus hiring one exceptionally versatile expert who can wear multiple hats.

How They Built It: Distillation Meets Reinforcement Learning

The training process is clever. First, they distill successful multi-agent system runs into Chain-of-Agents formatted traces. These traces capture the reasoning patterns of multi-agent collaboration without the communication overhead.

The training happens in two stages:

Stage 1: Supervised Fine-Tuning (SFT)
The model learns from reformatted ReAct-style data, both short and long reasoning chains. Progressive filtering ensures only high-quality trajectories make it through. The model learns to:

Plan before acting
Call tools efficiently
Reflect on observations
Maintain coherent reasoning across steps

Stage 2: Agentic Reinforcement Learning
Here’s where it gets interesting. The model performs tool-aware rollouts on new tasks, receiving rewards based on:

Task correctness (via LLM-as-Judge for web tasks)
Exact match for QA tasks
Test case success for code/math

This RL stage is crucial—it teaches the model to coordinate tools and reasoning robustly, not just mimic training data.

The Numbers That Matter

With Qwen-2.5-32B as the backbone, the results speak for themselves:

General Agent Tasks:

GAIA: 55.3% (new SOTA for pass@1)
BrowseComp: 11.1%
HumanLifeBenchmark: 18.0%
WebWalker: 63.0%

Code and Math:

AIME 2025: 59.8%
MATH-500: 94.6%
OlympiadBench: 72.1%
LiveCodeBench v5: 47.9%

But here’s the kicker—these results come with massive efficiency gains:

84.6% reduction in token costs compared to multi-agent systems
Fewer tool calls needed
Single model inference instead of multiple agent calls

Why This Changes Everything

For Your Development Workflow

Instead of orchestrating complex multi-agent pipelines, you can now:

Deploy a single model that handles multiple roles
Reduce latency from inter-agent communication
Simplify debugging (one model, one log stream)
Scale without multiplying costs

For Your Production Systems

The efficiency gains translate directly to your bottom line:

Lower API costs (fewer tokens, fewer calls)
Faster response times (no agent coordination overhead)
More predictable behavior (single model consistency)
Easier monitoring and optimization

For Test-Time Scaling

Best-of-3 and pass@3 sampling show dramatic improvements:

GAIA: 69.9% with best-of-3
HumanLifeBenchmark: 33.2% with pass@3

This means you can trade a small amount of compute for significant accuracy gains when needed.

What This Means for You

If you’re building with LLMs today, Chain-of-Agents offers a clear path forward:

Start with single-agent architectures. Before reaching for complex multi-agent frameworks, consider whether a CoA-style model could handle your use case.
Focus on role prompting. Even without training your own model, you can apply CoA principles by crafting prompts that explicitly invoke different roles and reasoning modes.
Measure token efficiency. Track not just accuracy but tokens-per-task. You might find that simpler architectures with better prompting outperform complex agent systems.
Watch for CoA-trained models. As more models adopt this training approach, you’ll have access to drop-in replacements for multi-agent systems.

The Practical Takeaway

Chain-of-Agents isn’t just a research curiosity—it’s a blueprint for how production AI systems will evolve. The era of expensive, chatty multi-agent systems is ending. The future belongs to single models that can think like entire teams.

You don’t need to wait for the future. Start experimenting with role-based prompting today. Measure your token usage. Question whether that complex agent pipeline really needs to be complex.

The best architecture isn’t always the most sophisticated one. Sometimes it’s the one that does more with less.

References: