Agent Stories: Frameworks for AI Agents and Agentic Developers

In the evolving landscape of AI-assisted software development, I’m witnessing a fundamental shift in how I collaborate with intelligent systems. Just as user stories revolutionized agile development by focusing on human outcomes rather than technical tasks, I need a similar framework for AI agents. Enter agent stories - a concept that adapts the proven user stories format to manage AI collaborators in software creation.

This exploration draws inspiration from recent insights into unit of work management in AI-assisted development, where the authors argue that ‘the craft of AI-assisted software creation is substantially about correctly managing units of work.’ As AI agents become more autonomous and begin to outnumber human users—potentially scaling infinitely compared to our 8 billion human cap—I need frameworks that ensure their work delivers legible business value while maintaining human oversight.

This isn’t just theoretical. Cloud platforms like Fly.io recently discovered that AI agents have become their fastest-growing customer segment, outpacing human developers. As Kurt Mackey writes in ‘Our Best Customers Are Now Robots’, ‘the users driving the most growth on the platform aren’t people at all. They’re… robots.’

Agent stories align with the emerging paradigm of Agentic Experience (AX), where I design software primarily for AI agents rather than humans. In a world where agents become the dominant ‘users,’ Agent stories provide the structured framework needed to coordinate this new workforce effectively.

Put simply: as AI agents and agentic workflows rise, I expect ‘user stories’ to increasingly be written with an AI agent as the user. Agent Stories make this explicit—they are user stories where the ‘user’ is an AI agent, treated as a first‑class actor with clear context, constraints, and verifiable outcomes.

Business Implications: The Agentclass

A business is, at core, an organization that provides a product or service. Historically, those offerings were designed for human users. As more ‘users’ become AI agents, an evolution of the user base is emerging—the agentclass. This shift means AI agents will increasingly dictate how businesses are built: what gets prioritized, how value is exposed, and how products are packaged for machine consumption.

Implications for companies:

Design for agent utilization: assume agents need fast, reliable paths to discover, understand, and use your product/service without human mediation.
Treat agents as first‑class users: publish clear constraints, quotas, and verifiable outcomes; make success machine‑checkable.
Offer agent‑friendly surfaces: stable APIs, machine‑readable docs, context bundles, and predictable auth flows.

One concrete pattern: llms.txt. Inspired by the emerging specification at llmstxt.org, companies can expose a public, agent‑friendly entry point at /llms.txt that:

Summarizes what the service does in concise, expert‑level language
Links to LLM‑readable markdown versions of key docs and endpoints
Curates the minimal set of resources agents need to act effectively

This is the agent‑era analogue to robots.txt/sitemap.xml: a simple, predictable surface that helps agents get the right context quickly. Pairing llms.txt with Agent Stories creates a closed loop—Agent Stories define the machine‑verifiable work to produce and maintain these surfaces; llms.txt makes them discoverable and consumable by agents in the wild.

Two Types of Agent Stories

As the agentclass grows (AI agents becoming first‑class ‘users’), two formats emerge that work together:

Agent Stories (for AI Agents)

Machine‑actionable stories written for agents to execute autonomously. They package just‑enough context and guardrails to deliver a small, verifiable unit of business value.

Inputs: curated context bundle (files/APIs/constraints), links to LLM‑readable docs (e.g., /llms.txt)
Behavior: multi‑step workflows with explicit boundaries and fallback paths
Output: concrete diffs/artifacts plus scriptable verification and human‑legible acceptance
Horizon: short execution windows (minutes → hours)

Agentic Developer Stories (for Human Developers)

Human‑readable stories for developers who build and operate agent‑ready systems. They create the capabilities and surfaces agents depend on, with reliability and safety baked in.

Focus: APIs/MCP endpoints, /llms.txt, .md docs mirrors, routing/affinity, tokenized secrets, SLOs
Objective: enable, constrain, and measure agent utilization of products/services
Output: running endpoints, docs and samples, monitors/runbooks, policy/quotas
Horizon: longer cycles (days → sprints) to shape durable capabilities

While both adapt the user stories format, Agent Stories target execution by agents, and Agentic Developer Stories target enablement by humans.

Key Distinctions: Agent Stories vs. Agentic Developer Stories

Element	Agent Stories	Agentic Developer Stories
Audience	AI agents (machine executors)	Human developers (system builders)
Author & Voice	Imperative, machine‑actionable; increasingly agent‑authored	Human‑written; design/coordination‑oriented
Objective	Small, verifiable unit of business value	Enable systems so agents deliver value reliably
Inputs	Curated context bundle; links (e.g., `/llms.txt`); explicit guardrails	PRDs/specs/SLAs; platform constraints; architecture/ops
Outputs	Concrete diffs/artifacts; tests/docs updates; script‑verifiable success	New/updated agent surfaces (APIs, MCP, `/llms.txt`, routing, secrets)
Verification	Deterministic checks/commands/fixtures	Reviewable designs; working endpoints; monitors; playbooks
Lifecycle Horizon	Minutes → hours	Days → sprints

Agent Onboarding Surfaces: AGENTS.md and ai-docs

A great first surface for agents is AGENTS.md — a ‘README for agents.’ It gives coding agents a predictable place to find setup, test, and style guidance.

AGENTS.md: See https://agents.md/. Keep human‑focused details in README.md; put agent‑focused instructions in AGENTS.md (build/test commands, code style, CI expectations, security notes). Nested AGENTS.md files can live in subprojects; the closest one to a file takes precedence.
What to include: quick setup, how to run tests, code style, lint/type rules, repo quirks, PR/commit guidelines, and any programmatic checks the agent should run before finishing work.

Best practice from my client work: create an ai-docs/ folder at the repo root to host agent‑friendly context that’s safe to ingest and easy to link from /llms.txt and AGENTS.md.

ai-docs/agent-stories/: living, agent‑readable stories with tight scope and verifiable checks.
ai-docs/ai-specs/ and ai-docs/ai-reqs/: concise specifications and requirements written for agents (analogs to specs/reqs), focused on runnable examples, standards, and acceptance.
Link these from /llms.txt and AGENTS.md so agents find the minimal, authoritative context first.

The User Stories Foundation

User stories emerged from agile methodologies as a way to capture requirements from the perspective of end users. The classic format - ‘As a [type of user], I want [some goal] so that [some reason]’ - serves several crucial purposes:

Human-centric focus: Stories center on user outcomes rather than technical implementation
Conversation starters: They provide just enough detail to spark discussion
Deliverable value: Each story represents a complete, valuable increment
Scope negotiation: Stories help teams agree on what’s ‘done’

These principles work because software development involves complex coordination between technical teams, product owners, and business stakeholders. User stories bridge the communication gap between these groups.

The AI Collaboration Challenge

AI agents introduce new coordination challenges. Unlike human developers who share context through meetings, documentation, and code reviews, AI agents operate in isolated context windows. As the nilenso article points out:

‘The right sized unit of work respects the context… If you don’t provide the necessary information in the context to do a good job, your AI will hallucinate or generate code that is not congruent with the practices of your codebase.’

The problem compounds in multi-turn workflows where errors accumulate. With a 5% error rate per action, a 10-turn task has only a 59.9% success rate. AI agents need verifiable checkpoints that are legible to humans.

This challenge is already manifesting in production systems. Fly.io’s experience shows how AI agents operate differently from humans - they need fast VM startup/shutdown cycles, persistent storage for iterative development, and specialized networking for protocols like MCP (Model Context Protocol). As Mackey notes, ‘robots don’t run existing applications. They build new ones’ through what he calls ‘vibe coding’ - iterative, stateful development that requires different infrastructure than traditional container workflows.

Agent Stories: Framework for AI Agents

Agent stories are a direct response to the challenge outlined in nilenso’s ‘The quality of AI-assisted software depends on unit of work management.’ They adapt the proven user stories format to specify small, verifiable, business‑value units of work that AI agents can reliably execute with the right context. Crucially, they are written for agents to read and execute to achieve user outcomes—agents act on the story; humans verify the value. Authorship can be human or agent; as LLM/model costs drop, I expect most Agent Stories to be agent‑authored, including agents writing stories that coordinate other agents (or generate new agents) in recursive workflows.

Agent Stories adapt the user stories format for AI collaborators:

As an [AI Agent], I need [specific context and constraints] to deliver [business value outcome] so that [human stakeholders can verify and integrate the work].

Agent Story Template (Agent‑Readable, User‑Outcome Oriented)

Title: Audience: AI agent (executor); Stakeholder: <human user/role>

As a <Agent/Tool + capability>, I need <precise context package: files, APIs, standards, constraints>, to deliver <business-value outcome visible to humans (user outcome)>, so that <verification + integration path is clear>.

Acceptance Criteria:

Value:
Verification: <command/check/script or artifact to confirm>
Integration: <where/how this plugs into the system>
Constraints: <performance/security/accessibility/compatibility>
Definition of Done: <files updated, tests added, docs touched>

Notes:

Risks/Open Questions:
Dependencies: <required approvals, secrets, environments>
Fallback:

Key Components of an Agent Story

Example Agent Story (consumes llms.txt)

Title: Build agent‑readable API quickstart Audience: AI agent (docs generator); Stakeholder: Developer Relations

As an API‑docs generator agent, I need the service OpenAPI spec, auth instructions, and the /llms.txt index (plus links to .md docs), to deliver a concise quickstart with runnable code samples and a 5‑minute setup path, so that new developers can make a successful authenticated request.

Acceptance Criteria:

Value: quickstart.md added and linked from /llms.txt
Verification: make validate-quickstart executes all sample calls locally
Integration: Docs sidebar updated; CI job validates samples on PRs
Constraints: Uses stable endpoints and rate‑limit guidance from /llms.txt
DoD: Files committed; CI green; docs preview link posted

Agentic Developer Stories: Framework for Humans

Agentic Developer Stories define the human work required to build and maintain the capabilities, guardrails, and surfaces that agents rely on. They translate business needs into agent‑usable platforms (APIs, MCP endpoints, /llms.txt, context pipelines, dynamic routing, tokenized secrets) and ensure reliability, observability, and compliance.

Agentic Developer Story Template (Human‑Readable, System‑Building)

Title: <capability/outcome> Audience: Human developers (platform/feature/ops); Stakeholder: <product/partner/agent>

As an agentic developer, I will design/implement with <SLOs/quotas/guardrails>, so that AI agents can autonomously achieve with verifiable success and bounded risk.

Acceptance Criteria:

Agent Surfaces: /llms.txt updated; LLM‑readable .md docs published; APIs/MCP endpoints exposed
Verification: Postman/pytest collection executes golden flows; health checks and monitors in place
Operations: Dynamic request routing, idempotency, rate limits, and tokenized secrets configured
Compliance/Security: Access scopes and audit logs documented; failure modes and rollout plan prepared
DoD: Runbooks and examples added; success measured via defined SLIs

Example Agentic Developer Story (produces llms.txt)

Title: Publish agent‑friendly docs surface Audience: Platform team; Stakeholder: External AI agents and partners

As an agentic developer, I will add a /llms.txt endpoint and a docs pipeline that emits .md mirrors for key pages, so that agents can discover and ingest minimal, expert‑level context reliably.

Acceptance Criteria:

Value: /llms.txt deployed; primary docs pages available at identical URLs with .md
Verification: CLI check returns 200 for /llms.txt and linked .md pages; link rot CI gate enabled
Integration: Developer portal navigation updated; examples for Claude/OpenAI ingestion published
Constraints: Backwards‑compatible URLs; CDN cache headers set; update cadence documented
DoD: Observability dashboards and alerts configured; ownership recorded

Agent Role Definition: Specifies which AI agent or tool will perform the work
Context Requirements: Defines the exact information the agent needs
Success Criteria: Clear, verifiable outcomes that demonstrate business value
Integration Points: How the work connects to the broader system

Example Agent Stories

Story 1: Database Schema Creation

As a Claude Code agent,
I need access to the existing database schema, business requirements document, and data model standards
to create a new user authentication table with proper relationships
so that the authentication system can store user credentials securely and the backend team can integrate it immediately.

Story 2: API Endpoint Implementation

As a ChatGPT Agent with web browsing tools,
I need the API specification, existing endpoint patterns, and error handling standards
to implement a RESTful user profile update endpoint
so that mobile developers can integrate user profile management without additional backend work.

Story 3: Component Development

As a Cursor AI agent,
I need the design system documentation, existing component patterns, and accessibility requirements
to build a responsive navigation component with keyboard navigation
so that UX designers can immediately test the component in the design system.

Story 4: Cloud Infrastructure Deployment

As a Claude Code agent with Fly.io CLI access,
I need the application Dockerfile, environment configuration, and deployment standards
to deploy a containerized application to Fly.io with proper networking and scaling
so that the development team can immediately access the live application and stakeholders can review the deployment.

Generate Agent Stories with Context Engineering

Agent Stories get their power from great context. One reliable way to produce that context is to pair the story with a dedicated ‘discover’ pass that curates the exact files and clarifies the handoff. Inspired by RepoPrompt’s MCP Discover approach, you can use a system prompt like the following to generate the selection and handoff prompt that accompany each Agent Story. MCP Discover — Essentials

System: MCP Discover (Essentials)
Your output must be a short handoff with exactly these sections:

Task:
- <one‑sentence restatement of the goal>

Context:
- Selected files/dirs: [paths]
- Key symbols/entry points: [functions, components, routes]
- Related standards/patterns: [coding/UX/infra]
- Context gaps: [missing docs, env, secrets]

Constraints:
- Tools allowed: workspace_context, get_file_tree, get_code_structure, file_search, read_file, manage_selection, prompt
- Token target: 50–80k (exceed if relevance demands)
- No implementation — discovery + handoff only

Open questions:
- <clarifying questions or decisions needed>

How to use it with Agent stories

Attach this ‘Discover’ step to each story before implementation.
Treat the resulting selection + handoff prompt as the story’s context package.
Enforce Definition of Done to include verification steps that humans can check.

Why Agent Stories Matter

1. Context Engineering for Scale

Agent stories force explicit definition of what context an AI agent needs. This combats ‘context rot’ where too much or too little information degrades performance. Each story becomes a carefully engineered context window - essential in a world where agents will outnumber humans and require precise, machine-readable instructions.

2. Error Containment at Agent Scale

By breaking work into small, verifiable units, Agent stories prevent error propagation. Each completed story provides a checkpoint where humans can verify correctness before proceeding. This becomes critical when managing thousands of agents rather than dozens of developers.

3. Business Value Alignment for Infinite Agents

Unlike technical tasks that focus on implementation details, Agent stories emphasize outcomes that matter to stakeholders. This ensures AI work contributes to actual business objectives, even as the ratio of agents to humans approaches infinity.

This shift has immediate business implications. As Fly.io discovered, AI agents ‘spend money just the same as people money.’ When agents become the dominant users, platforms must prioritize ‘Robot Experience (RX)’ alongside traditional ‘Developer Experience (DX)’. This means designing infrastructure that supports the unique workflows of AI agents - their need for rapid iteration, stateful development, and specialized protocols - rather than optimizing solely for human preferences.

4. Human-Agent Coordination

Agent stories create a shared language between humans and AI agents. Product owners, developers, and AI systems can all understand and contribute to the same framework. In the AX paradigm, this coordination becomes the primary interface between the finite human workforce and the infinite agent workforce.

Context Engineering for Agent Stories

One of the most powerful aspects of Agent stories is how they operationalize context engineering - the practice of providing AI agents with exactly the right information to succeed. Rather than vague task descriptions, Agent Stories structure context in a way that agents can immediately understand and act upon.

Consider how an agent might transform a simple user request into a comprehensive Agent Story:

User Request: ‘Debug the extension context invalidation error in my Chrome extension’

Agent-Generated Agent Story:

Task: Debug the 'extension_context_invalid' error occurring in the extension's heartbeat mechanism. The error manifests as "Scheduling recovery attempt 1/5 for extension_context_invalid in 1000ms" from the content script's heartbeat function.

Context: The heartbeat function validates chrome.runtime availability through isExtensionContextValid(). When invalid, it triggers recovery with cleanup and reinitialization. Selected files include the main content script, service worker, manifest, and related tests.

Constraints: This is a Chrome extension MV3 issue, likely related to extension context invalidation during reloads, updates, or background script lifecycle events.

Open questions: What specific conditions cause chrome.runtime to become invalid? How can validation and recovery be improved to prevent this error?

This transformation from vague request to structured context demonstrates the power of Agent Stories. The agent doesn’t just receive a task - it receives a complete operational framework with:

Precise problem definition with specific error messages and locations
Technical context including relevant files and system components
Operational constraints that bound the solution space
Investigation roadmap through open questions

RepoPrompt: The Context Engineering Inspiration

My inspiration for context engineering in Agent Stories comes from RepoPrompt, a powerful tool that demonstrates how systematic context curation transforms AI-assisted development.

RepoPrompt is a Mac-native application that builds optimal prompts from codebases, featuring:

CodeMaps: Intelligent code structure analysis that extracts classes, functions, and relationships
MCP Server Integration: 15+ specialized tools for AI editors, enabling persistent context sync across tools
Agent-to-Agent Collaboration: Allows AI assistants to consult with advanced reasoning models

The real breakthrough comes from RepoPrompt’s MCP Discover system, which operationalizes context engineering through a structured workflow:

The Discovery Workflow:

Understand existing context - Map current selection and reasoning
Map the terrain - Use directory-first exploration to understand architecture
Bulk investigate - Generate codemaps for entire modules at once
Connect user language to code - Find where terms appear in the codebase
Surgical reads - Targeted file inspection guided by codemaps
Execute selection - Curate the perfect file set (50-80k tokens)
Craft handoff prompt - Distill discovery into actionable clarity

This systematic approach transforms vague requests into structured, context-rich Agent Stories. Rather than hoping AI agents will ‘figure it out,’ RepoPrompt shows how deliberate context engineering creates reliable, scalable AI collaboration.

The StoryMachine Experiment

The concept of Agent Stories finds practical expression in tools like StoryMachine, a CLI tool that generates context-enriched user stories from Product Requirements Documents (PRDs) and Technical Specifications.

StoryMachine represents an early attempt to operationalize these ideas:

It processes PRDs and technical specs using AI
Generates structured stories with acceptance criteria
Focuses on creating ‘context-enriched’ stories that provide better guidance for implementation

While StoryMachine currently generates traditional user stories, its architecture suggests a path toward Agent Stories that explicitly consider AI agent capabilities and context requirements.

Practical Agent Story — End-to-End Example

Story: ‘Generate RSS feed for deep posts with correct metadata’

As a Claude Code agent with file edit capabilities, I need the content folder for deep posts, existing RSS generator patterns, and site metadata standards, to deliver a valid RSS feed at /deep-posts/rss.xml with correct titles, links, dates, and tags, so that readers can subscribe and the feed validates in common readers.

Acceptance Criteria

Value: New feed available at /deep-posts/rss.xml and renders in a reader
Verification: Run npm run build and validate XML with xmllint; open the file in browser
Integration: Link added to deep-posts index page; robots.txt allows crawling
Constraints: Must match site’s existing RSS style; handle 100+ items within memory/time limits
Definition of Done: New generator file, route added, tests updated if present, short docs note

Discover Pass Result (abridged)

Selected dirs: src/pages/deep-posts, src/utils, src/pages/rss.xml.ts, astro.config.ts
Handoff Prompt: ‘Implement deep-posts RSS using getSortedContent(); mirror structure from rss.xml.ts; update robots.txt.ts and sitemap if needed. Verify with xmllint.’

This pattern scales: one Agent Story + one Discover pass yields a verifiable, human‑legible unit of business value with bounded risk and clear integration.

Challenges and Considerations

Technical Context Requirements

AI agents need different types of context than human developers:

Code context: Existing patterns, conventions, and architecture
Tool access: Available APIs, libraries, and development tools
Quality standards: Testing requirements, security constraints, performance benchmarks
Integration points: How the work connects to existing systems

Real-world implementations reveal specific context needs for AI agents:

Compute flexibility: Fast startup/shutdown cycles for conversational development sessions
Stateful storage: Filesystems for incremental, trial-and-error development (contrary to immutable containers)
Network protocols: MCP support for external tool integration and API access
Security models: Tokenized secrets that provide temporary, scoped access rather than persistent credentials

Fly.io’s adaptation to these requirements demonstrates how infrastructure must evolve to support AI agent workflows effectively.

Agent Capability Matching

Different AI agents have different strengths:

Claude Code excels at terminal-based development tasks
ChatGPT Agents work well with GUI tools and web interfaces
Specialized agents might focus on testing, documentation, or deployment

Agent Stories need to match tasks to appropriate agent capabilities.

Verification and Quality Gates

The nilenso article emphasizes the need for ‘verifiable checkpoints’ that are ‘legible to humans.’ Agent Stories must define clear success criteria that humans can evaluate without deep technical expertise.

Implementation Strategies

1. Story Refinement Process

Start with traditional user stories, then refine them for AI execution:

Identify which AI agent is best suited for the task
Specify required context and tools
Define success criteria that can be automatically verified where possible

2. Context Packaging

Develop standard ways to package context for AI agents:

Codebase summaries and architecture documents
Tool configurations and API specifications
Quality standards and testing requirements
Integration guides and deployment procedures

3. Progress Tracking

Create dashboards that track Agent Story completion:

Success rates by agent type and task category
Common failure modes and context gaps
Business value delivered per story

The Future of Agent Stories

As AI agents become more sophisticated, Agent Stories could evolve to include:

Self-verification: Agents that can automatically validate their own work
Dynamic context loading: Stories that specify how agents should gather additional context as needed
Collaborative refinement: Human-AI pairs that iteratively improve story definitions
Cross-agent workflows: Stories that coordinate multiple AI agents working together

As AI agents become economically significant customers, Agent Stories could incorporate:

Cost optimization: Stories that balance agent productivity with infrastructure costs
Platform integration: Native support for agent-specific protocols and authentication
Economic coordination: Stories that manage agent-to-agent transactions and resource sharing
Robot experience design: Infrastructure optimized for agent workflows rather than human preferences

References

Unit of work management: https://blog.nilenso.com/blog/2025/09/15/ai-unit-of-work/
StoryMachine (context‑enriched stories CLI): https://github.com/nilenso/storymachine
RepoPrompt (context builder inspiration): https://repoprompt.com
Fly.io’s robot revolution: https://fly.io/blog/fuckin-robots/

Conclusion

Agent Stories represent a bridge between traditional agile practices and the emerging world of Agentic Experience (AX). By adapting the User Stories framework to explicitly consider AI agent capabilities and context requirements, I can create a more reliable, collaborative, and valuable approach to AI-assisted software creation.

As agents scale infinitely beyond our 8 billion human limit, Agent Stories become essential infrastructure for coordinating this new dominant workforce. The key insight from the nilenso article—that ‘managing units of work is perhaps the most important technique to get better results out of AI tools’—points to Agent Stories as a crucial evolution in how I structure AI collaboration.

In the AX paradigm, Agent Stories aren’t just a development methodology—they’re the coordination language for an agent-dominated future. As tools like StoryMachine mature, I expect the emergence of standardized Agent Story formats that make AI-assisted development as reliable and predictable as human-only development.

The question isn’t whether AI agents will transform software development, but whether the industry will develop the frameworks to harness that transformation effectively. Agent Stories offer a promising path forward in a world where agents become the primary users, collaborators, and creators.