The promise of AI-assisted coding has always been compelling: describe what you want, and watch as intelligent systems transform your ideas into working code. But anyone who has wrestled with large language models knows the reality is more complex. Single-model approaches hit walls—context limits, hallucinations, incomplete implementations, and the dreaded ‘AI forgot what we were building’ syndrome.
After building dozens of AI-powered development tools, I’ve discovered that the future isn’t about finding the perfect model or building complex abstractions. In my previous post on agentic tools, I explored how giving AI agents access to the same command-line tools developers use—rather than abstract protocols—unlocks their true potential. Now, let’s take that principle further by orchestrating multiple AI systems together, still following the philosophy that ‘code is all you need.’
This is the story of a three-step coding revolution that’s changing how we build software: Specification → Planning → Execution. It’s a workflow that can take a high-level idea and transform it into production-ready code in 1-10 minutes, with minimal human intervention—all using simple CLI tools and shell commands.
No complex protocols. No abstract tool definitions. Just AI agents wielding the same powerful command-line tools that developers have perfected over decades.
Let’s explore how this orchestrated approach solves the fundamental problems that plague single-model AI development.
The Problem: Single-Model Limitations
Traditional AI coding assistance asks too much of a single model. Even our most capable models have hard limits: Claude Opus 4 and Sonnet 3.7 are constrained to 200K tokens, while Gemini 2.5 Pro offers 1M tokens. But context windows alone don’t solve the fundamental problem.
When you prompt any single model—whether Claude, GPT-4, or Gemini—to ‘build a CLI tool that processes specifications and generates implementation plans,’ you’re asking it to:
- Understand your requirements (specification comprehension)
- Plan the implementation (architectural decisions)
- Execute the code changes (file modifications)
- Validate the results (testing and verification)
This cognitive load often leads to:
- Context collapse: Even with 200K tokens, Claude loses track of earlier decisions when deep in implementation
- Incomplete implementations: Features get partially built then abandoned as context fills
- Architectural inconsistency: Different parts of the system follow different patterns
- Token limit frustration: Running out of context just as things get interesting
For example, Claude’s 200K context might seem generous, but a typical codebase easily consumes 50-100K tokens. Add in the conversation history, tool outputs, and implementation details, and you’re hitting limits before completing complex features. Gemini’s 1M tokens help with understanding large codebases but can’t solve the fundamental issue of mixing comprehension, planning, and execution in a single context.
The solution isn’t a bigger model—it’s a better workflow that respects each model’s strengths and limitations.
The Three-Step Solution: Orchestrated AI Development
Our breakthrough came from recognizing that different AI models excel at different tasks. By creating a structured handoff between specialized systems, we can build complex software that neither model could create alone.
Step 1: Specification Clarification (Gemini 2.5 Pro)
The first step uses Gemini 2.5 Pro with its massive 1M token context window to understand and clarify requirements. This isn’t just about reading a spec—it’s about comprehending the full context of your codebase.
Key Capabilities:
- Massive Context: 1M tokens allows ingesting entire codebases via CLI tools like repomix
- Specification Analysis: Deep understanding of requirements and constraints
- Clarification Generation: Identifying ambiguities and edge cases
- Domain Knowledge: Leveraging training on diverse coding patterns
Example Workflow with Enhanced File Selection:
The magic happens when Gemini can see your entire codebase structure, existing patterns, and architectural decisions. With repomix’s new stdin feature, you can now precisely control which files are included in the context, ensuring Gemini focuses on the most relevant code for your specification.
Step 2: Implementation Planning (Gemini 2.5 Pro)
The second step takes the clarified specification and generates a detailed implementation plan. This is where Gemini’s reasoning capabilities shine, creating step-by-step instructions that account for:
- Existing Code Patterns: Following established conventions
- Dependency Management: Understanding what libraries and frameworks are already in use
- Integration Points: Identifying where new code connects to existing systems
- Testing Strategy: Planning verification and validation steps
Enhanced Planning with Targeted Context:
Output Format:
This level of specificity—including exact locations and reasoning—is what makes the handoff to the execution phase seamless. With repomix’s stdin feature, Gemini has precisely the context it needs to generate accurate, comprehensive plans.
Step 3: Code Execution (Claude Opus/Sonnet)
The final step uses Claude Opus 4 or Sonnet 3.7 to execute the generated plan. While Claude has a smaller context window (200K tokens) compared to Gemini’s 1M, it excels at precise implementation tasks.
Key Capabilities:
- Precise Code Modification: Making exact changes to specific files based on the plan
- Error Handling: Debugging and fixing issues as they arise during implementation
- Incremental Execution: Working through plans systematically, task by task
- Validation: Running tests, linters, and builds to ensure correctness
- Tool Mastery: Expert use of CLI tools for file manipulation and verification
Execution Workflow with Focused Context:
Why Claude Excels at Execution:
- Surgical Precision: With a clear plan, Claude doesn’t need to understand the entire codebase—just the specific files to modify
- Reliability: Following a detailed plan reduces hallucination and ensures consistent implementation
- Speed: No time wasted on architectural decisions or exploring the codebase
- Validation Focus: Claude’s training emphasizes correctness and testing, perfect for the execution phase
This targeted approach ensures Claude operates within its optimal context window while delivering reliable, tested implementations.
The Technical Architecture: How It Actually Works
The three-step workflow relies on sophisticated orchestration between AI models and supporting infrastructure:
CLI Tool Integration: Code Is All You Need
Command-line tools like repomix embody the ‘code is all you need’ philosophy perfectly. Rather than complex protocols or abstractions, they give AI agents the same powerful tools human developers use. They solve the ‘codebase context’ problem by:
Benefits:
- Compression: Tree-sitter parsing removes implementation details while preserving structure (reduces token usage by ~70%). This is an experimental feature - test thoroughly on your own codebase before relying on it
- Precision: Stdin feature allows exact file selection based on any criteria
- Flexibility: Combine with any file discovery tool (find, rg, fd, git, grep)
- Efficiency: Structured XML format optimizes token usage
Why CLI Tools Beat Complex Abstractions
This approach aligns perfectly with the ‘code is all you need’ philosophy. Instead of building complex protocols like MCP that require inference and consume massive token counts, we give AI agents the same tools developers use:
-
Direct Execution: No abstraction layer means no ambiguity. When an agent runs
rg -l "TODO" | repomix --stdin, it’s as clear as when a human runs it. -
Perfect Composability: Unix pipes and shell commands have solved composability for decades. Why reinvent it?
-
Zero Learning Curve: Every developer understands
grep,find, and pipes. AI agents trained on code already know these patterns. -
Deterministic Workflows: Unlike tool protocols that rely on model inference, shell commands execute predictably every time.
As Armin Ronacher notes: ‘The way to think about this problem is that when you don’t have an AI, and you’re solving a problem as a software engineer, your tool of choice is code.’ The repomix CLI tool exemplifies this—it’s not a special ‘AI tool,’ it’s just a good developer tool that happens to work perfectly with AI agents.
Token Economics: Division of Labor
The workflow strategically divides cognitive load based on each model’s strengths:
| Phase | Model | Context Limit | Optimized For |
|---|---|---|---|
| Specification | Gemini 2.5 Pro | 1M tokens | Understanding, reasoning, planning |
| Planning | Gemini 2.5 Pro | 1M tokens | Architecture, dependencies, integration |
| Execution | Claude Opus/Sonnet | 200K tokens | Precise edits, error handling, validation |
This division ensures that:
- Gemini handles the ‘big picture’ thinking with massive context
- Claude focuses on precise execution with proven reliability
- Neither model is overwhelmed by tasks outside their sweet spot
Three-Step Workflow Visualization
The Three-Step Process:
- SPEC: Human writes USER-SPEC → Gemini clarifies into AI-SPEC
- PLAN: Gemini transforms AI-SPEC into detailed AI-PLAN → Human reviews
- EXECUTE: Claude implements AI-PLAN → Human validates code changes
Optimization Strategies with Repomix Stdin
Pro Tip: Claude Code Optimization Workflows
Claude Code offers powerful optimization features that complement the three-step workflow:
Multiple Agents for Parallel Execution:
# This prompt triggers Claude Code to spawn multiple AI agents
claude "Use multiple agents. Let's build this!" < AI-PLAN.md execution-context.xml
Extended Thinking for Complex Problems:
# Progressively more thinking budget: think < think hard < think harder < ultrathink
claude "Let's ultrathink about this implementation" < AI-PLAN.md execution-context.xml
These optimization workflows—similar to compiler optimization flags—work best when Claude has a clear plan to follow rather than discovering requirements through tool calls. This is especially effective when:
- Multiple files need independent modifications (use multiple agents)
- Complex architectural decisions need deep reasoning (use ultrathink)
- Test suites can run in parallel while implementation proceeds
Beyond Single-Model Limitations
The three-step workflow solves fundamental problems that plague single-model approaches:
Problem 1: Context Collapse
Single Model: Loses track of earlier decisions as context fills up Three-Step Solution: Each model operates within its optimal context window
Problem 2: Inconsistent Quality
Single Model: Quality degrades as tasks become more complex Three-Step Solution: Each model handles tasks it’s optimized for
Problem 3: All-or-Nothing Execution
Single Model: If anything fails, the entire session is lost Three-Step Solution: Failures are isolated to specific steps with clear recovery paths
Problem 4: Unclear Progress
Single Model: Black box execution with no intermediate feedback Three-Step Solution: Clear progress tracking with validation at each stage
The Workflow in Action: A Real Example
Let’s walk through a real implementation that demonstrates the power of this approach:
Input: AI-SPEC.md
Processing: Gemini Analysis
gemini "Let's plan this!" < AI-SPEC.md
Gemini processes the specification with full codebase context and generates a detailed 475-line implementation plan covering:
- 12 specific implementation steps
- Exact file paths and line numbers
- Code snippets with before/after comparisons
- Assertion requirements for safety compliance
- Integration testing strategies
Execution: Claude Implementation
claude "Let's build this!" < AI-PLAN.md
Claude executes the plan systematically, modifying files, running tests, and validating each change. The execution is precise and reliable because Claude follows the detailed plan rather than making architectural decisions on the fly.
Result: A fully functional three-step CLI workflow, built in 6 minutes, with comprehensive error handling and safety assertions.
Implications for the Future of AI Development
This orchestrated approach represents a fundamental shift in how we think about AI-assisted development:
From Prompting to Orchestration
Instead of crafting the perfect prompt, we design systems that coordinate multiple AI capabilities.
From Single-Shot to Workflow
Instead of hoping one model can handle everything, we create reliable handoffs between specialized systems.
From Manual to Automated
Instead of copy-pasting AI suggestions, we build systems that can modify codebases autonomously.
From Experimental to Production
Instead of treating AI coding as a toy, we create workflows robust enough for real software development.
The Fundamental Problem: Tool Calls and Runtime Context Building
The current generation of coding agents—even sophisticated planning systems like Gemini—suffer from a critical architectural flaw: they’re forced to build context on the go through tool calls rather than building context up-front. This is the difference between compile-time and runtime context building, and it fundamentally limits what AI agents can achieve.
The Tool Call Trap
When a coding agent starts working on your codebase, it faces an immediate problem: it doesn’t know what it doesn’t know. Every piece of information must be discovered through explicit tool calls:
Each tool call consumes tokens, adds latency, and most critically—the agent must infer what to search for based on incomplete information. It’s like trying to understand a codebase while blindfolded, only able to touch one file at a time.
Compile-Time vs Runtime Context Building
The three-step workflow revolutionizes this by introducing compile-time context building:
Traditional Agent (Runtime Context):
- Starts with zero knowledge
- Discovers context through sequential tool calls
- Makes decisions based on partial information
- Context building and execution are interleaved
Three-Step Workflow (Compile-Time Context):
- Builds complete context before planning
- Makes decisions with full codebase visibility
- Separates context building from execution
- Context is immutable during execution
This is analogous to the difference between interpreted and compiled languages. Just as a compiler can optimize better with a complete view of the program, our specification and planning phases can make better decisions with complete codebase context.
Extended Thinking and the Context Problem
Claude Code’s ‘ultrathink’ feature—where using phrases like ‘think’, ‘think hard’, ‘think harder’, or ‘ultrathink’ allocates progressively more thinking budget—exemplifies both the potential and limitations of current coding agents. When Claude engages in extended reasoning, forcing it to interrupt that thinking for tool calls is like forcing a compiler to pause optimization to check if a variable exists.
The three-step workflow complements these optimization features perfectly:
- Specification Phase: Gemini processes the entire codebase context upfront, no interruptions
- Planning Phase: Extended reasoning generates comprehensive plans with full context
- Execution Phase: Claude can use ‘ultrathink’ for complex implementations, but with a clear plan to follow
This separation means that extended thinking—whether in Gemini’s planning or Claude’s execution—operates on complete context rather than fragments discovered through tool calls
The Hidden Cost of Tool Protocols
Modern tool protocols like MCP (Model Context Protocol) seem sophisticated but actually worsen this problem:
This requires the agent to:
- Infer what to search for (runtime decision)
- Wait for results (latency)
- Process partial results (limited context)
- Repeat until sufficient context is built (token waste)
Compare this to our approach:
# Compile-time: Build complete context upfront
repomix --output=full-context.xml
# Agent now has EVERYTHING, can reason holistically
Real-World Impact: 10x Faster, 100x More Reliable
This architectural difference has profound implications:
| Metric | Tool-Call Agents | Three-Step Workflow |
|---|---|---|
| Context Building | 50-100 tool calls | 1 upfront build |
| Token Efficiency | 10-20% wasted on discovery | <1% overhead |
| Decision Quality | Based on fragments | Based on complete view |
| Reasoning Interruptions | Constant | Zero during planning |
| Failure Recovery | Must rebuild context | Context persists |
The Missing Piece in Current Agents
Every major coding agent today—GitHub Copilot Workspace, Cursor, Codeium, even Claude Code—operates in this tool-call paradigm. They’re trying to be ‘smart’ about what context to fetch, but this is fundamentally the wrong approach. It’s like trying to optimize a program while only being able to see one function at a time.
The solution isn’t smarter tool selection or better search queries. The solution is to eliminate runtime context discovery entirely by building complete context upfront—exactly what the three-step workflow achieves.
Conclusion: The Dawn of Orchestrated AI Development
The three-step workflow isn’t just a technical improvement—it’s a new paradigm for human-AI collaboration in software development. By recognizing that different AI models excel at different tasks, and by giving them the same CLI tools developers use, we can build systems that are more reliable, more powerful, and more practical than any single model or complex protocol.
The results speak for themselves: complex codebases modified in minutes, not hours. Comprehensive implementations that follow existing patterns and conventions. Error handling and validation that actually works. And most importantly, a workflow that scales from simple scripts to production applications—all without a single abstract ‘tool definition.’
As the industry chases complex protocols and abstractions, the real breakthrough has been hiding in plain sight: code is all you need. The most powerful tool you can give an AI agent isn’t a JSON schema—it’s a shell prompt and the ability to run commands like rg, grep, and repomix.
The future of coding isn’t about replacing developers with AI or building elaborate tool frameworks. It’s about augmenting human creativity with orchestrated AI workflows that use the battle-tested tools we already have.
Ready to build your own orchestrated AI development workflow? Forget the complex abstractions. Install repomix, give your agents access to the command line, and discover how the ‘code is all you need’ philosophy can transform your development process.