Ship Tools, Not Code: Tools Is All You Need

In previous posts, we established a clear foundation for agentic work. First, that code itself is the ultimate tool, far superior to complex, abstract protocols. Second, that we can make code reliable by creating agent-friendly CLI tools that act as predictable, sandboxed components.

This is the correct foundation. But building these tools one by one is thinking too small. It’s still a world where we build for agents. The next leap is to build infrastructure that lets agents build for themselves.

The goal isn’t just to ship tools. It’s to ship tool-builders.

From Using Tools to Creating Them

I once wrote that ‘agents started hiring agents. Tools started building tools.’ This is not a distant vision; it’s the immediate, strategic frontier.

Imagine a primitive tool in an agent’s arsenal called create_tool. It takes a simple prompt—‘build a tool that finds a prospect’s email address using our CRM API and Hunter.io’—and generates a new, robust, and shareable command-line tool.

This meta-tool is the seed of an ecosystem. An agent doesn’t just use a tool; it identifies a missing capability and creates it for the entire network. The primitives become simple and powerful:

create_tool: Generate a new CLI tool from a description.
edit_tool: Modify an existing tool to fix a bug or add a feature.
list_tools: Discover tools created by other agents.

This is how you scale intelligence. You don’t build a thousand tools for a thousand agents. You give a thousand agents the ability to build, share, and improve a million tools.

But this autonomy requires wisdom. Your agency experience shows that agents shouldn’t create tools from every user prompt—there needs to be a pattern. The most effective approach follows the ‘rule of three’: when an agent notices the same failure or limitation three times, it’s a signal to create a better tool.

For example, if an agent repeatedly struggles to extract email addresses from different formats, that’s not three isolated problems—it’s one pattern demanding a solution. This pattern recognition is what separates reactive tool creation from strategic tool building.

Vincent Quigley, Staff Software Engineer at Sanity, captured this problem perfectly: ‘AI doesn’t learn from mistakes. You fix the same misunderstandings repeatedly. Your solution: better documentation and more explicit instructions’ [1]. But documentation isn’t enough. The real solution is transforming those repeated patterns into tools—permanent, testable artifacts that encode the learning. Where humans update their mental models, agents must update their toolsets.

Learning from Conversation History

Creating effective tools requires learning from past interactions. When you instruct agents to build tools from historical patterns, they need access to conversation logs. Claude Code demonstrates this naturally by storing all conversations in ~/.claude/projects as JSON files (on macOS/Linux) or %USERPROFILE%\.claudecode\conversations\ (on Windows).

These logs become a valuable dataset. An agent can analyze hundreds of past conversations to identify:

Recurring tasks that waste time
Common failure patterns
Frequently combined operations that could become a single tool

This addresses what Vincent Quigley describes as ‘running multiple agents like a small team with daily amnesia’ [1]. Each conversation starts fresh, each agent begins at zero. But the tools they create and share become the persistent memory of the system. The conversation history isn’t just a record—it’s the antidote to amnesia, training data for building tools that ensure tomorrow’s agents don’t repeat today’s mistakes.

The Rule of Three: A Protocol for Tool Creation

Creating tools requires discipline. Not every repeated task deserves a tool, and not every failure needs automation. The most effective pattern we’ve discovered is the ‘rule of three’: when any recurring problem occurs three or more times, it’s time to create a tool.

Here’s the protocol agents should follow:

Pattern Detection: Before creating a tool, scan conversation history for at least three occurrences of:

The same error or failure mode
Similar manual workarounds being applied
Repeated sequences of commands
Common user frustrations requiring clarification

Pattern Validation: The three occurrences must share the same core problem, occur in comparable contexts, and be resolved using similar approaches. Three different formats of the same problem (extracting emails from JSON, CSV, and HTML) counts as one pattern, not three.

The Simple Rule: If a pattern appears three or more times, create a tool. No complex calculations needed. Three occurrences is the signal that this isn’t a one-off problem—it’s a systematic issue that deserves a systematic solution.

This discipline prevents premature toolification—creating tools after a single request—and tool sprawl—creating similar tools instead of extending existing ones. Each tool should solve the pattern, not just the specific instance.

The Network is the Moat

When agents can build, the network effects become explosive. A sales agent might create a find_email_address tool for its own use. But a marketing agent can then discover and compose it into a larger workflow for lead generation. A recruiting agent might adapt it to find candidates.

Each new tool doesn’t just add one capability; it multiplies the potential of every agent in the network. The value of the system shifts from the quality of its individual tools to the creative velocity of the entire network.

This is where incentives become critical. Whether through tokens or other mechanisms, the network must reward contribution. Agents who create valuable, widely-used tools should be incentivized, fueling a flywheel of innovation. The network begins to pay for its own growth.

Tool Metadata: The Memory Architecture

When agents create tools following the rule of three, they must embed the pattern’s history into the tool itself. This metadata becomes the institutional memory that survives beyond any single conversation:

tool_metadata:
  pattern_detected: "Description of recurring pattern"
  occurrences_analyzed: [conversation_1, conversation_2, conversation_3]
  problem_solved: "What manual work this eliminates"
  time_saved_per_use: "Estimated minutes"
  usage_examples: "Real examples from the pattern"

This isn’t bureaucracy—it’s how we defeat the ‘daily amnesia.’ Each tool carries its origin story, its purpose, and proof of its value. When another agent discovers this tool, they understand not just what it does, but why it exists and when to use it.

The success metrics are clear: a tool should be reused at least 3x within its first week, reduce similar manual interventions by >80%, and achieve cross-agent adoption—used by agents who didn’t create it. Tools that don’t meet these thresholds become candidates for deprecation or merger with more successful tools.

Ship the Tool Factory

We are on the cusp of truly dynamic agentic systems. But success won’t be defined by the perfection of our hand-crafted tools. It will be defined by the creative autonomy we grant our agents.

Your moat is not your feature set. It’s the vibrancy of your network. An ecosystem of 10,000 agents building and sharing rudimentary tools will always out-compete a silo of 100 agents using a handful of perfect ones.

The directive is clear: Stop building tools for agents. Build the infrastructure that lets agents build tools for each other—but with the wisdom to know when a tool is truly needed.

The rule of three isn’t just a heuristic; it’s a philosophy. It says that real patterns deserve real solutions. It prevents the chaos of a thousand half-baked tools while ensuring that genuine pain points get addressed. It transforms agents from reactive tool creators into strategic pattern recognizers.

Stop shipping features. Start shipping the factory—with quality control built in.

References

First Attempt Will Be 95% Garbage by Vincent Quigley