OpenAI recently released a comprehensive guide titled ‘Building an AI-native engineering team.’ [1] It charts the evolution of coding agents from simple autocomplete to autonomous partners capable of navigating the entire Software Development Lifecycle (SDLC)—Plan, Design, Build, Test, Review, and Deploy.

It is a polished, optimistic vision of the future. It is also fundamentally incomplete.

The guide assumes that if you point an agent at a repository, it will ‘figure it out.’ It operates on the assumption that the lights are on in the entire building. But anyone who has actually built complex software with LLMs knows the reality is darker.

OpenAI’s guide starts at Phase 1: Plan. To succeed, you must start at Phase 0: Discovery.

The Spotlight Is The Stage

OpenAI’s guide relies on ‘Unified context across systems.’ They describe a workflow where an agent reads a spec, scans the codebase, and magically understands the implications.

A single model can read code, configuration, and telemetry, providing consistent reasoning across layers that previously required separate tooling.

This ignores the fundamental constraint of LLMs: The Spotlight is the Stage.

Imagine a pitch-black theater. Your agent is the actor. The code repository is the stage. The context window is the spotlight.

To the actor, only what is illuminated exists.

If the spotlight (context) is focused on User.ts, but the critical validation logic is in Auth.ts standing three feet away in the dark, the agent does not ‘know’ about the validation. It cannot see it. It cannot reason about it. If you ask it to plan a feature, it will confidentially hallucinate a plan that walks right off the edge of the stage.

You cannot just dump the entire theater into the spotlight—that creates noise and confusion. You must direct the light.

Phase 0: Discovery Phasex

Before you ask an agent to Plan (act), you must employ a Discovery Agent (lighting technician).

The goal of Discovery is not to solve the problem. The goal is to set the stage.

A human engineer acts as their own lighting technician. They grep (scan the dark), find the relevant files, and open them (turn on the light). Only then do they start coding.

An AI-native workflow must replicate this. The Discovery Agent’s job is to:

Explore: Move the light around the dark stage to find the necessary props.
Curate: Lock the spotlight onto only the specific files required for the scene.
Handoff: Freeze this state for the Builder agent.

The Tooling: Discovery Tools as Set Designers

Three tools excel at this phase: RepoMix, RepoPrompt, and code2prompt. All provide token management features—essential for understanding context window constraints. A fourth option, gptree, exists but is not recommended for production use.

RepoMix packs your repository into a single, LLM-friendly format with comprehensive token management. I felt friction personally when focused file selection hid the full directory structure, so I contributed the --include-full-directory-structure flag (merged in v1.8.0). [2][3] This flag lets agents see the complete directory tree while keeping file processing scoped to --include patterns, giving Discovery Agents spatial awareness. For token awareness, RepoMix provides multiple options: --token-count-tree visualizes token distribution hierarchically (optionally with a threshold like --token-count-tree 1000 to show only high-impact files), --verbose includes token counts in debug logging, and --token-count-encoding <encoding> lets you specify the tokenizer model (o200k_base for GPT-4o, cl100k_base for GPT-3.5/4, etc.). The --compress option uses Tree-sitter to reduce tokens while preserving code structure. This comprehensive token management makes RepoMix ideal for understanding and optimizing context allocation.

RepoPrompt is a Mac-native application that automates context assembly through intelligent code mapping and token efficiency. It reduces token usage by 80% compared to naive approaches through its CodeMaps feature, which extracts classes, functions, and references to create semantic understanding without verbosity. It provides persistent context sync across AI tools and MCP Server integration with 15+ specialized tools—perfect for agent-to-agent collaboration during discovery. Important limitation: RepoPrompt runs only on macOS. Its MCP server for cloud deployments requires Mac instances (e.g., AWS EC2 Mac instances); most third-party platforms like Fly.io, DigitalOcean, and Heroku do not offer macOS compute, restricting RepoPrompt’s use to local development environments or costly dedicated Mac infrastructure.

code2prompt is a high-performance CLI tool that combines speed with unprecedented token transparency. Every execution automatically displays token counts, and its --token-map flag visualizes token distribution across files like a disk-usage tool—showing you exactly which files dominate your context budget. This visual transparency is ideal when you need to understand the token footprint of your entire codebase at a glance. It supports multiple tokenizer encodings (o200k_base, cl100k_base, etc.) and exports to Markdown, JSON, or XML with custom Handlebars templates.

gptree is a lightweight CLI tool for quick context assembly with built-in directory visualization. It automatically creates a visual directory tree while respecting .gitignore patterns and offers both simple file-type filtering and advanced glob patterns. Critical limitations: gptree defaults to Safe Mode with a ~25K token limit (inadequate for production workflows) and lacks token counting entirely. While you can override the limit to 60K via --disable-safe-mode, the model operates blind—it has no visibility into token consumption. You must implement your own token counting workflows externally. This makes gptree fundamentally different from RepoMix, RepoPrompt, and code2prompt, all of which provide token transparency. I do not recommend gptree for production discovery workflows, but developers should be aware of it as an option for small, simple projects where token counting is less critical.

Token Management Features:

RepoMix: --token-count-tree [threshold] (hierarchical view) + --verbose (debug logging) + --token-count-encoding <encoding> (multiple tokenizer models) + --compress (Tree-sitter reduction)
RepoPrompt: Smart 80% reduction via CodeMaps + persistent sync across tools
code2prompt: Automatic counting on every run + --token-map visual distribution + multiple tokenizer encodings
gptree: Safe Mode only (no token counting; model operates blind)

Choose based on your workflow and constraints:

For token optimization & file filtering: RepoMix (CLI, cross-platform, scriptable, compression, threshold filtering)
For interactive exploration & token reduction (Mac-only): RepoPrompt (Mac native UI, CodeMaps, persistent context sync; MCP server requires Mac instances)
For token transparency & visualization (cross-platform): code2prompt (Rust speed, automatic token maps, multiple formats, CLI portability)

Note on gptree: While available, I do not recommend it for production discovery workflows due to lack of token counting. It’s suitable only for small, simple projects where you can afford to operate without token visibility.

The Ideal Discovery Token Budget: 50K–70K

From extensive testing across multiple models (including OpenAI’s GPT 5.1 and Claude variants), I’ve found that the ideal Discovery phase token budget is approximately 60K tokens, with a practical range of 50K to 70K.

This is not arbitrary. At 60K tokens, a Discovery Agent can:

Load the full directory structure (1K–3K tokens)
Include 20–40 carefully curated source files (40K–50K tokens)
Leave 7K–20K tokens for reasoning and output generation
Avoid the noise and hallucination that comes from ‘show me everything’ (200K+ tokens)

Below 50K: You risk incomplete context. Agents miss critical files and start making assumptions. This is why tools with inadequate defaults fall short—developers become responsible for knowing and overriding constraints they shouldn’t need to think about.

Above 70K: You hit diminishing returns. Agents spend cognitive effort on noise. Hallucination rates increase. You’re no longer doing Discovery; you’re just dumping code.

The sweet spot (50K–70K): Agents have enough signal to ask clarifying questions. They can reason about architectural patterns. They identify edge cases. They hand off to the Builder with confidence, not guesses.

I recommend RepoMix and code2prompt for production discovery workflows—both provide token transparency as a first-class feature and run cross-platform. RepoPrompt is excellent for token efficiency (80% reduction via CodeMaps) but is Mac-only, limiting its use to local development or organizations with dedicated Mac infrastructure (AWS EC2 Mac instances, etc.). If you need cloud-based discovery agents on standard Linux platforms, RepoPrompt’s MCP server is unavailable. gptree is available but not recommended: it lacks token counting entirely, meaning the model operates blind with no visibility into token consumption. You would need to implement external token counting workflows—a burden the recommended tools eliminate. In a mature discovery workflow, token transparency should be built in, not bolted on, and platform constraints should not block deployment.

The Engine: Claude Code & Haiku 4.5

If RepoMix and RepoPrompt are the set design, Claude Code is the fast-moving technician.

For the Discovery phase, you don’t need the deep, slow introspection of GPT-5.1 Codex (gpt-5.1-codex-max, gpt-5.1-codex), Gemini Pro (gemini-3-pro-preview, gemini-2.5-pro), or Sonnet 4.5. You need speed. You need an agent that can swing the spotlight wildly, run grep, execute RepoMix, and filter results in seconds.

My favorites?

Claude Code (Haiku 4.5 model) is perfect for this. It acts as a high-speed scout, exploring the dark corners of the repo, running the repomix commands with the right flags, and setting the scene.

OpenAI Codex (GPT 5.1 Codex Mini or GPT 5.1 Low model) is also a great choice. It’s faster than the full Codex models and still powerful enough to handle the Discovery phase.

Gemini CLI (Gemini 2.5 Flash or Flash Lite model) are excellent for quick, lightweight Discovery tasks. They’re fast and cost-effective, making them ideal for initial exploration.

The Builder Should Not Be The Scout

OpenAI’s guide argues that during the Build phase, agents can ‘search and modify code across dozens of files.’

This is like asking the lead actor to hang the lights while delivering a monologue.

When an agent is in ‘Build’ mode, it should be an Executor, not an Explorer. If a Builder agent has to waste its attention span fumbling around in the dark looking for a file, its performance degrades.

In a mature AI-native workflow:

Phase 0 (Discover): Haiku 4.5 model uses RepoMix to find the props and aim the spotlight.
Phase 1 (Build): GPT 5.1 Codex (Max High or Extra High model) steps onto a fully lit stage with exactly the context it needs, and performs immediately.

Proactive Ambiguity Resolution

OpenAI suggests reviewing the agent’s work after the plan is made. This puts the burden of catching errors on the audience.

A Discovery-first approach handles ambiguity proactively.

During the Discovery phase, if the technician realizes a prop is missing or the stage layout doesn’t match the script, they flag it immediately. The ‘Handoff Prompt’ explicitly lists these ambiguities. We fix the set before the curtain rises.

From Optimism to Determinism

OpenAI’s guide describes the future we all want: a theater where the lights are always on everywhere.

But we are building teams for today. Today’s context windows are finite spotlights.

If you want to build an AI-native engineering team, don’t just automate the actors. Engineer the lighting.

Insert Phase 0. Use token-aware discovery tools—RepoMix or code2prompt (cross-platform), or RepoPrompt (Mac-only)—with a target budget of 50K–70K tokens paired with fast scouts like Haiku. Don’t let your agents perform in the dark.

References

Building an AI-native engineering team by OpenAI: https://cdn.openai.com/business-guides-and-resources/building-an-ai-native-engineering-team.pdf
RepoMix PR #896 - Add --include-full-directory-structure flag: https://github.com/yamadashy/repomix/pull/896
RepoMix v1.8.0 Release: https://github.com/yamadashy/repomix/releases/tag/v1.8.0

Building an AI-Native Engineering Team: The Missing Phase