Spec-Plan-Execute: Going Beyond Plan-And-Execute

The promise of AI-assisted coding has always been compelling: describe what you want, and watch as intelligent systems transform your ideas into working code. But anyone who has wrestled with large language models knows the reality is more complex. Single-model approaches hit walls—context limits, hallucinations, incomplete implementations, and the dreaded ‘AI forgot what we were building’ syndrome.

After building dozens of AI-powered development tools, I’ve discovered that the future isn’t about finding the perfect model or building complex abstractions. In my previous post on agentic tools, I explored how giving AI agents access to the same command-line tools developers use—rather than abstract protocols—unlocks their true potential. Now, let’s take that principle further by orchestrating multiple AI systems together, still following the philosophy that ‘code is all you need.’

This is the story of a three-step coding revolution that’s changing how we build software: Specification → Planning → Execution. It’s a workflow that can take a high-level idea and transform it into production-ready code in 1-10 minutes, with minimal human intervention—all using simple CLI tools and shell commands.

No complex protocols. No abstract tool definitions. Just AI agents wielding the same powerful command-line tools that developers have perfected over decades.

Let’s explore how this orchestrated approach solves the fundamental problems that plague single-model AI development.

The Problem: Single-Model Limitations

Traditional AI coding assistance asks too much of a single model. Even our most capable models have hard limits: Claude Opus 4 and Sonnet 3.7 are constrained to 200K tokens, while Gemini 2.5 Pro offers 1M tokens. But context windows alone don’t solve the fundamental problem.

When you prompt any single model—whether Claude, GPT-4, or Gemini—to ‘build a CLI tool that processes specifications and generates implementation plans,’ you’re asking it to:

Understand your requirements (specification comprehension)
Plan the implementation (architectural decisions)
Execute the code changes (file modifications)
Validate the results (testing and verification)

This cognitive load often leads to:

Context collapse: Even with 200K tokens, Claude loses track of earlier decisions when deep in implementation
Incomplete implementations: Features get partially built then abandoned as context fills
Architectural inconsistency: Different parts of the system follow different patterns
Token limit frustration: Running out of context just as things get interesting

For example, Claude’s 200K context might seem generous, but a typical codebase easily consumes 50-100K tokens. Add in the conversation history, tool outputs, and implementation details, and you’re hitting limits before completing complex features. Gemini’s 1M tokens help with understanding large codebases but can’t solve the fundamental issue of mixing comprehension, planning, and execution in a single context.

The solution isn’t a bigger model—it’s a better workflow that respects each model’s strengths and limitations.

The Three-Step Solution: Orchestrated AI Development

Our breakthrough came from recognizing that different AI models excel at different tasks. By creating a structured handoff between specialized systems, we can build complex software that neither model could create alone.

Step 1: Specification Clarification (Gemini 2.5 Pro)

The first step uses Gemini 2.5 Pro with its massive 1M token context window to understand and clarify requirements. This isn’t just about reading a spec—it’s about comprehending the full context of your codebase.

Key Capabilities:

Massive Context: 1M tokens allows ingesting entire codebases via CLI tools like repomix
Specification Analysis: Deep understanding of requirements and constraints
Clarification Generation: Identifying ambiguities and edge cases
Domain Knowledge: Leveraging training on diverse coding patterns

Example Workflow with Enhanced File Selection:

# User provides high-level specification
echo "Build a CLI that generates AI plans from specifications" > USER-SPEC.md

# NEW: Use repomix stdin feature for precise context inclusion
# Find all TypeScript files containing "spec" or "plan" logic
rg -l "spec|plan" --type ts | repomix --stdin --output=spec-context.xml

# Or interactively select relevant files with fzf
find . -name "*.ts" -type f | fzf -m | repomix --stdin --output=spec-context.xml

# Gemini processes with targeted codebase context
gemini "Let's clarify this spec!" < USER-SPEC.md spec-context.xml

# Output: Detailed clarifications and refined requirements

(This example is complete, it can be run "as is")

The magic happens when Gemini can see your entire codebase structure, existing patterns, and architectural decisions. With repomix’s new stdin feature, you can now precisely control which files are included in the context, ensuring Gemini focuses on the most relevant code for your specification.

Step 2: Implementation Planning (Gemini 2.5 Pro)

The second step takes the clarified specification and generates a detailed implementation plan. This is where Gemini’s reasoning capabilities shine, creating step-by-step instructions that account for:

Existing Code Patterns: Following established conventions
Dependency Management: Understanding what libraries and frameworks are already in use
Integration Points: Identifying where new code connects to existing systems
Testing Strategy: Planning verification and validation steps

Enhanced Planning with Targeted Context:

# Find all files that will be affected by the implementation
# 1. Search for existing command patterns
rg -l "AddCommand|NewCmd" --type go | repomix --stdin --output=command-patterns.xml

# 2. Find related test files
fd -e test.go | repomix --stdin --output=test-patterns.xml

# 3. Include configuration and dependency files
echo -e "go.mod\ngo.sum\nMakefile" | repomix --stdin --output=deps.xml

# Generate comprehensive plan with all relevant context
gemini "Create implementation plan" < AI-SPEC.md command-patterns.xml test-patterns.xml deps.xml

(This example is complete, it can be run "as is")

Output Format:

# Gemini generates a detailed AI-PLAN.md with:
- Specific file paths and line numbers
- Code changes required
- Integration points
- Test strategies
- Validation steps

(This example is complete, it can be run "as is")

This level of specificity—including exact locations and reasoning—is what makes the handoff to the execution phase seamless. With repomix’s stdin feature, Gemini has precisely the context it needs to generate accurate, comprehensive plans.

Step 3: Code Execution (Claude Opus/Sonnet)

The final step uses Claude Opus 4 or Sonnet 3.7 to execute the generated plan. While Claude has a smaller context window (200K tokens) compared to Gemini’s 1M, it excels at precise implementation tasks.

Key Capabilities:

Precise Code Modification: Making exact changes to specific files based on the plan
Error Handling: Debugging and fixing issues as they arise during implementation
Incremental Execution: Working through plans systematically, task by task
Validation: Running tests, linters, and builds to ensure correctness
Tool Mastery: Expert use of CLI tools for file manipulation and verification

Execution Workflow with Focused Context:

# Extract only the files mentioned in the AI-PLAN.md
rg -o '[a-zA-Z0-9_/]+\.(go|ts|py|js)' AI-PLAN.md | \
  sort -u | \
  repomix --stdin --output=plan-files.xml

# Claude executes with the plan and targeted file context
claude "Let's build this!" < AI-PLAN.md plan-files.xml

# Output: Systematic task completion with validation
# - Modifies files according to plan
# - Runs tests after each major change
# - Validates builds and linting
# - Reports completion status for each task

(This example is complete, it can be run "as is")

Why Claude Excels at Execution:

Surgical Precision: With a clear plan, Claude doesn’t need to understand the entire codebase—just the specific files to modify
Reliability: Following a detailed plan reduces hallucination and ensures consistent implementation
Speed: No time wasted on architectural decisions or exploring the codebase
Validation Focus: Claude’s training emphasizes correctness and testing, perfect for the execution phase

This targeted approach ensures Claude operates within its optimal context window while delivering reliable, tested implementations.

The Technical Architecture: How It Actually Works

The three-step workflow relies on sophisticated orchestration between AI models and supporting infrastructure:

CLI Tool Integration: Code Is All You Need

Command-line tools like repomix embody the ‘code is all you need’ philosophy perfectly. Rather than complex protocols or abstractions, they give AI agents the same powerful tools human developers use. They solve the ‘codebase context’ problem by:

# Traditional approach: repomix compacts entire codebase
repomix --compress --output=codebase.xml

# NEW: Targeted context selection with stdin
# Find files modified in the last week
git ls-files --modified | repomix --stdin --output=recent-changes.xml

# Search for files containing specific patterns
rg -l "TODO|FIXME" --type ts | repomix --stdin --output=todos.xml

# Include only files in specific directories
fd -e ts -e tsx . src/components src/utils | repomix --stdin --output=components.xml

# Remove comments to save on token count  
rg --files --type ts | repomix --stdin --compress --remove-comments --output=lean-context.xml

# Gemini processes targeted context efficiently
gemini -p "Analyze these specific files" < components.xml

(This example is complete, it can be run "as is")

Benefits:

Compression: Tree-sitter parsing removes implementation details while preserving structure (reduces token usage by ~70%). This is an experimental feature - test thoroughly on your own codebase before relying on it
Precision: Stdin feature allows exact file selection based on any criteria
Flexibility: Combine with any file discovery tool (find, rg, fd, git, grep)
Efficiency: Structured XML format optimizes token usage

Why CLI Tools Beat Complex Abstractions

This approach aligns perfectly with the ‘code is all you need’ philosophy. Instead of building complex protocols like MCP that require inference and consume massive token counts, we give AI agents the same tools developers use:

Direct Execution: No abstraction layer means no ambiguity. When an agent runs rg -l "TODO" | repomix --stdin, it’s as clear as when a human runs it.
Perfect Composability: Unix pipes and shell commands have solved composability for decades. Why reinvent it?
Zero Learning Curve: Every developer understands grep, find, and pipes. AI agents trained on code already know these patterns.
Deterministic Workflows: Unlike tool protocols that rely on model inference, shell commands execute predictably every time.

As Armin Ronacher notes: ‘The way to think about this problem is that when you don’t have an AI, and you’re solving a problem as a software engineer, your tool of choice is code.’ The repomix CLI tool exemplifies this—it’s not a special ‘AI tool,’ it’s just a good developer tool that happens to work perfectly with AI agents.

Token Economics: Division of Labor

The workflow strategically divides cognitive load based on each model’s strengths:

Phase	Model	Context Limit	Optimized For
Specification	Gemini 2.5 Pro	1M tokens	Understanding, reasoning, planning
Planning	Gemini 2.5 Pro	1M tokens	Architecture, dependencies, integration
Execution	Claude Opus/Sonnet	200K tokens	Precise edits, error handling, validation

This division ensures that:

Gemini handles the ‘big picture’ thinking with massive context
Claude focuses on precise execution with proven reliability
Neither model is overwhelmed by tasks outside their sweet spot

Three-Step Workflow Visualization

flowchart TB
    subgraph "Human Developer"
        H1[Write Requirements<br/>USER-SPEC.md]
        H2[Review & Refine Plan]
        H3[Validate Results]
    end
    
    subgraph "AI Agents"
        AI1[Gemini: Clarify Spec<br/>→ AI-SPEC.md]
        AI2[Gemini: Generate Plan<br/>→ AI-PLAN.md]
        AI3[Claude: Execute Plan<br/>→ Code Changes]
    end
    
    H1 -->|USER-SPEC| AI1
    AI1 -->|AI-SPEC| AI2
    AI2 -->|AI-PLAN| H2
    H2 -->|Approved Plan| AI3
    AI3 -->|Implementation| H3
    
    style H1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style H2 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style H3 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style AI1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style AI2 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style AI3 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px

(This example is complete, it can be run "as is")

The Three-Step Process:

SPEC: Human writes USER-SPEC → Gemini clarifies into AI-SPEC
PLAN: Gemini transforms AI-SPEC into detailed AI-PLAN → Human reviews
EXECUTE: Claude implements AI-PLAN → Human validates code changes

Optimization Strategies with Repomix Stdin

# Traditional approach (slow for large codebases)
repomix --output=full-codebase.xml  # Can be 100MB+

# Optimized approach with stdin (fast and targeted)
# Only include files modified in current branch
git diff --name-only main...HEAD | repomix --stdin --output=branch-changes.xml

# Include only source files, exclude tests and docs
find src -name "*.ts" -not -path "*/test/*" | repomix --stdin --output=src-only.xml

# Smart filtering based on imports/dependencies
rg -l "import.*gemini" --type ts | repomix --stdin --output=gemini-deps.xml

# Parallel execution with targeted contexts
rg -l "spec" --type ts | repomix --stdin --output=spec-files.xml &
rg -l "plan" --type ts | repomix --stdin --output=plan-files.xml &
rg -l "execute" --type ts | repomix --stdin --output=exec-files.xml &
wait

# Process with minimal, relevant context
gemini "Let's plan this!" < AI-SPEC.md spec-files.xml plan-files.xml

(This example is complete, it can be run "as is")

Pro Tip: Claude Code Optimization Workflows

Claude Code offers powerful optimization features that complement the three-step workflow:

Multiple Agents for Parallel Execution:

# This prompt triggers Claude Code to spawn multiple AI agents
claude "Use multiple agents. Let's build this!" < AI-PLAN.md execution-context.xml

Extended Thinking for Complex Problems:

# Progressively more thinking budget: think < think hard < think harder < ultrathink
claude "Let's ultrathink about this implementation" < AI-PLAN.md execution-context.xml

These optimization workflows—similar to compiler optimization flags—work best when Claude has a clear plan to follow rather than discovering requirements through tool calls. This is especially effective when:

Multiple files need independent modifications (use multiple agents)
Complex architectural decisions need deep reasoning (use ultrathink)
Test suites can run in parallel while implementation proceeds

Beyond Single-Model Limitations

The three-step workflow solves fundamental problems that plague single-model approaches:

Problem 1: Context Collapse

Single Model: Loses track of earlier decisions as context fills up Three-Step Solution: Each model operates within its optimal context window

Problem 2: Inconsistent Quality

Single Model: Quality degrades as tasks become more complex Three-Step Solution: Each model handles tasks it’s optimized for

Problem 3: All-or-Nothing Execution

Single Model: If anything fails, the entire session is lost Three-Step Solution: Failures are isolated to specific steps with clear recovery paths

Problem 4: Unclear Progress

Single Model: Black box execution with no intermediate feedback Three-Step Solution: Clear progress tracking with validation at each stage

The Workflow in Action: A Real Example

Let’s walk through a real implementation that demonstrates the power of this approach:

Input: AI-SPEC.md

# AI-SPEC.md

### 1. Goal
To create a three-step command-line interface (CLI) workflow where:
1. A "Gemini CLI" first generates a software implementation plan
2. The generated plan is refined based on user specifications  
3. A "Claude Code CLI" then executes this plan to modify the codebase

### 2. System Context & Boundaries
- **Gemini CLI**: Receives AI-SPEC.md via stdin, outputs AI-PLAN.md
- **Claude Code CLI**: Consumes AI-PLAN.md, modifies local git repository
- **Data Contract**: Markdown-formatted files with structured instructions

(This example is complete, it can be run "as is")

Processing: Gemini Analysis

gemini "Let's plan this!" < AI-SPEC.md

Gemini processes the specification with full codebase context and generates a detailed 475-line implementation plan covering:

12 specific implementation steps
Exact file paths and line numbers
Code snippets with before/after comparisons
Assertion requirements for safety compliance
Integration testing strategies

Execution: Claude Implementation

claude "Let's build this!" < AI-PLAN.md

Claude executes the plan systematically, modifying files, running tests, and validating each change. The execution is precise and reliable because Claude follows the detailed plan rather than making architectural decisions on the fly.

Result: A fully functional three-step CLI workflow, built in 6 minutes, with comprehensive error handling and safety assertions.

Implications for the Future of AI Development

This orchestrated approach represents a fundamental shift in how we think about AI-assisted development:

From Prompting to Orchestration

Instead of crafting the perfect prompt, we design systems that coordinate multiple AI capabilities.

From Single-Shot to Workflow

Instead of hoping one model can handle everything, we create reliable handoffs between specialized systems.

From Manual to Automated

Instead of copy-pasting AI suggestions, we build systems that can modify codebases autonomously.

From Experimental to Production

Instead of treating AI coding as a toy, we create workflows robust enough for real software development.

The Fundamental Problem: Tool Calls and Runtime Context Building

The current generation of coding agents—even sophisticated planning systems like Gemini—suffer from a critical architectural flaw: they’re forced to build context on the go through tool calls rather than building context up-front. This is the difference between compile-time and runtime context building, and it fundamentally limits what AI agents can achieve.

The Tool Call Trap

When a coding agent starts working on your codebase, it faces an immediate problem: it doesn’t know what it doesn’t know. Every piece of information must be discovered through explicit tool calls:

# Agent: "I need to understand the project structure"
ls -la
# Agent: "Now I need to see what's in src/"
ls src/
# Agent: "Let me check if there's a README"
cat README.md
# Agent: "What testing framework is used?"
grep -r "test" package.json
# Agent: "Are there existing patterns for commands?"
rg "AddCommand" --type go

(This example is complete, it can be run "as is")

Each tool call consumes tokens, adds latency, and most critically—the agent must infer what to search for based on incomplete information. It’s like trying to understand a codebase while blindfolded, only able to touch one file at a time.

Compile-Time vs Runtime Context Building

The three-step workflow revolutionizes this by introducing compile-time context building:

Traditional Agent (Runtime Context):

Starts with zero knowledge
Discovers context through sequential tool calls
Makes decisions based on partial information
Context building and execution are interleaved

Three-Step Workflow (Compile-Time Context):

Builds complete context before planning
Makes decisions with full codebase visibility
Separates context building from execution
Context is immutable during execution

This is analogous to the difference between interpreted and compiled languages. Just as a compiler can optimize better with a complete view of the program, our specification and planning phases can make better decisions with complete codebase context.

Extended Thinking and the Context Problem

Claude Code’s ‘ultrathink’ feature—where using phrases like ‘think’, ‘think hard’, ‘think harder’, or ‘ultrathink’ allocates progressively more thinking budget—exemplifies both the potential and limitations of current coding agents. When Claude engages in extended reasoning, forcing it to interrupt that thinking for tool calls is like forcing a compiler to pause optimization to check if a variable exists.

The three-step workflow complements these optimization features perfectly:

Specification Phase: Gemini processes the entire codebase context upfront, no interruptions
Planning Phase: Extended reasoning generates comprehensive plans with full context
Execution Phase: Claude can use ‘ultrathink’ for complex implementations, but with a clear plan to follow

This separation means that extended thinking—whether in Gemini’s planning or Claude’s execution—operates on complete context rather than fragments discovered through tool calls

The Hidden Cost of Tool Protocols

Modern tool protocols like MCP (Model Context Protocol) seem sophisticated but actually worsen this problem:

{
  "tool": "search_codebase",
  "parameters": {
    "query": "command pattern",
    "file_types": ["go", "ts"]
  }
}

(This example is complete, it can be run "as is")

This requires the agent to:

Infer what to search for (runtime decision)
Wait for results (latency)
Process partial results (limited context)
Repeat until sufficient context is built (token waste)

Compare this to our approach:

# Compile-time: Build complete context upfront
repomix --output=full-context.xml
# Agent now has EVERYTHING, can reason holistically

Real-World Impact: 10x Faster, 100x More Reliable

This architectural difference has profound implications:

Metric	Tool-Call Agents	Three-Step Workflow
Context Building	50-100 tool calls	1 upfront build
Token Efficiency	10-20% wasted on discovery	<1% overhead
Decision Quality	Based on fragments	Based on complete view
Reasoning Interruptions	Constant	Zero during planning
Failure Recovery	Must rebuild context	Context persists

The Missing Piece in Current Agents

Every major coding agent today—GitHub Copilot Workspace, Cursor, Codeium, even Claude Code—operates in this tool-call paradigm. They’re trying to be ‘smart’ about what context to fetch, but this is fundamentally the wrong approach. It’s like trying to optimize a program while only being able to see one function at a time.

The solution isn’t smarter tool selection or better search queries. The solution is to eliminate runtime context discovery entirely by building complete context upfront—exactly what the three-step workflow achieves.

Conclusion: The Dawn of Orchestrated AI Development

The three-step workflow isn’t just a technical improvement—it’s a new paradigm for human-AI collaboration in software development. By recognizing that different AI models excel at different tasks, and by giving them the same CLI tools developers use, we can build systems that are more reliable, more powerful, and more practical than any single model or complex protocol.

The results speak for themselves: complex codebases modified in minutes, not hours. Comprehensive implementations that follow existing patterns and conventions. Error handling and validation that actually works. And most importantly, a workflow that scales from simple scripts to production applications—all without a single abstract ‘tool definition.’

As the industry chases complex protocols and abstractions, the real breakthrough has been hiding in plain sight: code is all you need. The most powerful tool you can give an AI agent isn’t a JSON schema—it’s a shell prompt and the ability to run commands like rg, grep, and repomix.

The future of coding isn’t about replacing developers with AI or building elaborate tool frameworks. It’s about augmenting human creativity with orchestrated AI workflows that use the battle-tested tools we already have.

Ready to build your own orchestrated AI development workflow? Forget the complex abstractions. Install repomix, give your agents access to the command line, and discover how the ‘code is all you need’ philosophy can transform your development process.