Single-model AI coding is broken. LLMs hallucinate, lose context, and produce inconsistent code. The solution isn’t a bigger model; it’s a better workflow.
I’ve found success by orchestrating multiple AI systems in a simple, two-step process: Plan → Execute. This workflow uses specialized AI for each phase, coordinated by the same CLI tools developers use daily. No complex protocols, no abstract definitions—just pure, efficient code generation.
This is a CLI-first revolution that transforms a specification into production-ready code in minutes.
The Problem: Why Single Models Fail
Current AI coding assistance fails because it mixes planning, execution, and validation in a single, fragile process. This leads to:
- Context Collapse: The AI forgets the goal midway through implementation.
- Inconsistent Quality: Complexity degrades output.
- All-or-Nothing Execution: A single failure destroys the entire session.
- No Visibility: It’s a black box until it’s too late.
The root cause is that agents build context on the fly through inefficient tool calls. They are forced to discover what they don’t know, one API call at a time.
The Two-Step Solution: Plan-Execute
Our breakthrough separates the cognitive load, assigning each task to the best-suited AI model.
Step 1: Plan with Gemini 2.5 Pro
We start with Gemini 2.5 Pro and its 1M token context window. It ingests the user’s specification and the entire relevant codebase to produce a detailed AI-PLAN.md. This plan is exhaustive, covering file paths, code changes, dependencies, and test strategies.
Create the Plan:
# 1. Gather relevant codebase context using git and repomix
git ls-files "*.go" "go.mod" "Makefile" | repomix --stdin --output=codebase.xml
# 2. Generate a detailed plan using the spec and codebase context
gemini "Create a plan from this spec" < USER-SPEC.md codebase.xml > AI-PLAN.md
Gemini’s massive context allows it to create a holistic, actionable plan that respects existing patterns and architecture.
Step 2: Execute with Claude Opus/Sonnet
Next, we hand the AI-PLAN.md to Claude Opus or Sonnet. With a smaller 200K token window, Claude excels at precise, reliable execution when given clear instructions. It doesn’t need to understand the whole system—only the plan and the files it needs to modify.
Execute the Plan:
# 1. Extract only the files mentioned in the plan for focused context
rg -o '[a-zA-Z0-9_/]+\.go' AI-PLAN.md | sort -u | repomix --stdin --output=plan-files.xml
# 2. Execute the plan with the focused context
claude "Implement this plan" < AI-PLAN.md plan-files.xml
Claude works through the plan task-by-task, modifying code, running tests, and validating each step.
Why It Works: CLI-First Orchestration
This workflow is effective because it relies on simple, powerful principles.
Code Is All You Need
Instead of abstract protocols, we give AI agents the same tools developers use: git, grep, rg, and repomix. These CLI tools are:
- Direct: No ambiguity.
- Composable: Unix pipes have solved integration for decades.
- Familiar: AIs trained on code already understand them.
- Deterministic: They execute predictably every time.
The repomix tool is key, allowing us to package specific codebase slices into a token-efficient XML format for the AI to consume.
Smart Token Economics
We use the right model for the right job, optimizing for both capability and cost.
| Phase | Model | Context Limit | Optimized For |
|---|---|---|---|
| Plan | Gemini 2.5 Pro | 1M tokens | Holistic understanding, architecture, planning |
| Execute | Claude Opus/Sonnet | 200K tokens | Precise edits, error handling, validation |
This division of labor prevents any single model from being overwhelmed.
Token Usage by Project Size
The Plan-Execute workflow adapts to projects of any size by managing context intelligently.
Small Projects (<100K Tokens)
For small codebases, the entire project fits comfortably within both Gemini’s and Claude’s context windows. This allows for straightforward, holistic analysis and execution without complex context engineering.
# The entire codebase can be passed to both models
git ls-files | repomix --stdin --output=codebase.xml
gemini "Create a plan..." < spec.md codebase.xml > plan.md
claude "Implement this plan..." < plan.md codebase.xml
Medium Projects (100K-200K Tokens)
Here, the full codebase fits into Gemini’s 1M context window for planning, but exceeds Claude’s 200K limit. For execution, we must provide only the files referenced in the AI-PLAN.md.
Large Projects (>200K Tokens)
For large codebases, even Gemini’s 1M context window may be challenged. Here, repomix with git becomes critical for creating a representative, token-efficient slice of the codebase for planning. Execution remains targeted, using only the files specified in the plan.
Visualizing the Workflow
The Workflow in Action
We used this process to build the two-step CLI itself.
- Input: A
USER-SPEC.mdfile detailing the goal. - Planning: Gemini analyzed our spec and the existing codebase, producing a 475-line
AI-PLAN.mdwith 12 distinct steps. - Execution: Claude followed the plan, modifying files, running tests, and validating each change.
Result: A fully functional, two-step CLI workflow with error handling and safety assertions, built and validated in just six minutes.
Conclusion: The Future is Orchestrated
The Plan-Execute workflow represents a paradigm shift from single-model prompting to multi-system orchestration. By separating planning from execution and using standard developer tools, we create AI coding systems that are robust, scalable, and practical for production use.
The breakthrough isn’t a complex new protocol. It’s the realization that code is all you need. The most powerful tool for an AI agent is a shell prompt and access to git, grep, and repomix. The future of AI development lies in augmenting human creativity with simple, powerful, and orchestrated workflows.