Agent Teams: How AI Agents Are Evolving Organizations

Coding Benchmark Snapshot

Terminal-Bench 2.0 · SWE-Bench Verified · SWE-Bench Pro (Public)

Model	Terminal-Bench 2.0	SWE-Bench Verified	SWE-Bench Pro (Public)
Gemini 3.1 Pro (High)	68.5%	80.6%	54.2%
Gemini 3 Pro (High)	56.9%	76.2%	43.3%
Sonnet 4.6 (Max)	59.1%	79.6%	–
Opus 4.6 (Max)	65.4%	80.8%	–
GPT-5.2	54.0%	80.0%	55.6%
GPT-5.2 (xhigh)	62.2%	–	–
GPT-5.3-Codex	64.7%	–	–
GPT-5.3-Codex (xhigh)	77.3%	–	56.8%

Source: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro

We are now living in the agentic age of organizational growth

AI agents, or just agents, are helping

Last year, Microsoft published the Work Trend Index Annual Report in which they defined ‘hybrid’ teams of humans + agents

I read that paper but did not think much of it untill I started to reflect on the progress of multi-agent systems like claude code and OpenClaw

Here is the latest attempt by Anthropic as of Feb 5th

Claude Code now supports agent teams (in research preview)

Instead of a single agent working through a task sequentially, a lead agent can delegate to multiple teammates that work in parallel to research, debug, and build while coordinating with each other.

Try it out today by… pic.twitter.com/vi7lUJDOTi
— Lydia Hallie ✨ (@lydiahallie) February 5, 2026

According to Anthropic’s agent teams docs:

Coordinate multiple Claude Code instances working together as a team, with shared tasks, inter-agent messaging, and centralized management.

So what happend? We went from using one AI agent at a time to using multi-agent systems

This is possible thanks to agentic orchestration

Agents coordinating other agents?

This is one of the unlocks of OpenClaw

OpenClaw unlocks a unique improvement over traditional agent runtimes by adding a gateway.

In OpenClaw, heartbeats are Markdown files and cron jobs are JSON files.

Git worktrees are used by OpenAI Codex in the Codex macOS app, but the idea of cron-like inspired automation is similar to what I covered in agentic automation.

Note, OpenClaw by itself sets up one AI agent to run for you

We need a better agentic harness or orchestrator like antfarm by Ryan Carson

If you’re using @openclaw this will be a big unlock.

Antfarm is a batteries-included agent team that operates reliably and deterministically.

Works with OpenClaw using just crons, YAML and SQLite.

It auto-runs Ralph loops after creating atomic user stories.

I open sourced it… https://t.co/g6a1N0jwel
— Ryan Carson (@ryancarson) February 9, 2026

When you spin up your first antfarm, you get 6 agents

Now we are gettting clower to a team!

Let’s discuss the architecture of this new agentic team inspired by antfarm

antfarm combines crons (cron jobs), with YAML and SQLite.

The code execution step as part of each workflow is built around a concept known as the ‘Ralph Wiggum Loop,’ after the simpleton character from ‘The Simpsons’

‘Ralph’ functions as an autonomous coding loop and shares the same workflow as a human developer

That means Ralph consults a list of tasks to be completed, implements them, runs tests, commits the code, marks the task as completed, logs what it learned from the process, and then selects the next task from the list.

Ralph picks one story then executes it

export type Story = {
  id: string;
  runId: string;
  storyIndex: number;
  storyId: string;
  title: string;
  description: string;
  acceptanceCriteria: string[];
  status: "pending" | "running" | "done" | "failed";
  output?: string;
  retryCount: number;
  maxRetries: number;
};

(This example is complete, it can be run "as is")

https://github.com/snarktank/antfarm/blob/bbd6debda1f6b76ca178db923d2478357d810baf/src/installer/types.ts#L61C1-L73C3

How does it work?

A human says ‘OpenClaw Team, I want to build this new feature.’ Then a specialized agent ‘interviews’ the human about specifics the AI team needs to know. Then, a different agent turns those instructions into a prioritized list of tasks. Different agents complete the actual coding, while additional category experts check and verify their work.

Spec’ing is done via spec-driven development (SDD)

Maybe we should treat agentic teams as humans + RPG (role playing game) agents

According to https://www.antfarm.cool

Built on the Ralph loop

Each agent runs in a fresh session with clean context. Memory persists through git history and progress files — the same autonomous loop pattern from Ralph, scaled to multi-agent workflows.

Microsoft team recognized that it is possible to get overwhelmed by too many AI agents

High-Level Architecture

[Click to expand]

Agent Lifecycle (Ralph Loop Model)

Here is how antfarm works

[Click to expand]

Multi-Agent Story Isolation (feature-dev example)

[Click to expand]

Custom Workflow Definition (YAML → Runtime)

[Click to expand]

End-to-End Example Run

[Click to expand]

Why Antfarm Works

Instead of:

‘One big AI agent doing everything’

It becomes:

‘A deterministic assembly line of specialized agents’

Let’s evaluate how Anthropic’s agent teams work

One way to understand it is by contrasting against subagents

Claude Agent Teams, Taught Incrementally

Step 1: One agent, one thread

Start simple: one agent handles everything in sequence.

[Click to expand]

Step 2: Add a lead and specialists

The lead delegates. Teammates run in parallel.

[Click to expand]

Agent teams add a shared task list and direct teammate messaging.

[Click to expand]

Step 4: Run the claim-and-complete loop

Each teammate claims unblocked work, executes, and updates status.

[Click to expand]

Step 5: Add quality gates and cleanup

Hooks enforce standards before tasks close; lead shuts down the team.

[Click to expand]

This is the shift: from one big agent to a coordinated team with explicit roles, shared state, and controlled handoffs.

Step 6: Gate the feature explicitly

Agent teams stay opt-in behind an environment flag.

[Click to expand]

Step 7: Use local team state files

Coordination is driven by local frontmatter state, not hidden magic.

[Click to expand]

Step 8: Notify the lead through tmux

Idle teammates can report back through a coordinator session.

[Click to expand]

Step 9: Launch swarm from interactive command

The command gathers constraints first, then writes shared tasks.

[Click to expand]

This is the practical model: explicit flag, explicit files, explicit hooks, explicit task generation.

Agent Teams: How AI Agents Are Evolving Organizations

Coding Benchmark Snapshot

High-Level Architecture

Agent Lifecycle (Ralph Loop Model)

Multi-Agent Story Isolation (feature-dev example)

Custom Workflow Definition (YAML → Runtime)

End-to-End Example Run

Claude Agent Teams, Taught Incrementally

Step 1: One agent, one thread

Step 2: Add a lead and specialists

Step 3: Share tasks and messages

Step 4: Run the claim-and-complete loop

Step 5: Add quality gates and cleanup

Step 6: Gate the feature explicitly

Step 7: Use local team state files

Step 8: Notify the lead through tmux

Step 9: Launch swarm from interactive command