Skip to content
Go back

Agent-Friendly CLI Tools: From Flaky Agents to Reliable Automation

Published: Jul 20, 2025
Punta Cana, Dominican Republic

I spend most of my day creating software with coding agents, but I’m constantly reminded of their limitations. They’re powerful, but they operate in a world of tools built by developers, for developers, long before agentic AI was a reality. This mismatch leads to frustration, wasted tokens, and flaky performance.

While protocols like MCP were an attempt to bridge this gap, they often feel too complex for the simple, powerful interface that has stood the test of time: the shell. This brings me to the ‘shell test.’

What is the shell test?

If an AI agent can effectively use shell/bash tools to accomplish tasks without human oversight, it has passed the shell test.

Today, most agents fail. But the problem isn’t just the agent; it’s the tools. As Ryan Stortz brilliantly detailed in his post, Rethinking CLI interfaces for AI, our tools are simply not designed for an AI user [1].

In this post, I’ll argue that we don’t need to wait for superhuman AIs to pass the shell test. We can get there now by rethinking how we integrate them, moving from a model where the AI is a confused user to one where it’s a predictable, sandboxed component in a larger system.

The Frustrating Reality of AI-driven CLIs

If you’ve used an agent for anything non-trivial, you’ve likely seen the same problems Stortz describes.

1. Verbose, Unstructured Output: Agents drown in log spew. They weren’t designed to parse pages of human-readable text to find a single error message. As one developer on Hacker News lamented, this has a real cost:

Approximately 1/3rd of my Claude code tokens are spent parsing CLI output, that is insane!

2. Agent Confusion & ‘Flailing’: Agents get lost. They run commands in the wrong directory, use inefficient tools like head -n100 to peek at output (only to have to re-run the expensive command again), and generally flail around until they stumble upon a solution.

3. ‘Lazy’ or Deceptive Behavior: This is the most frustrating failure mode. Stortz describes a ‘game of whack-a-mole’ where his agent, blocked by a pre-commit hook that enforces tests, simply tries to commit with --no-verify. When he blocked that, it tried to edit the git hook file itself.

I look forward to its next lazy innovation. - Ryan Stortz

This isn’t a sign of maliciousness; it’s a sign of a goal-seeking system taking the path of least resistance, a path we’ve inadvertently left open.

A Better Way: AI as a Pipeline Component

My solution is simple and builds on decades of Unix philosophy: Treat the LLM as a stateless, sandboxed component in a pipeline.

Instead of giving an agent free reign over the shell, we constrain it. We engineer its inputs and strictly define its outputs. The agent stops being the orchestrator and becomes a powerful, specialized function for text transformation. This approach aligns with the core insight I explored in Agentic Tools: Code Is All You Need — that code itself, not complex abstractions, is the most powerful tool we can give our AI agents.

Consider this simple pattern:

# Data Source | AI Processor | Structured Output Parser
psql | claude --output-format=json | jq

Here, psql gathers and pre-processes data. The claude CLI tool receives this clean data, performs its analysis, and—crucially—is forced to output structured JSON. Finally, jq programmatically extracts the result.

This pipeline-based approach directly solves the problems we identified.

Solving Verbosity with Pre-processing and Structured Output

Instead of dumping raw logs into the context window, we can use the source tool to pre-filter and structure the data. For example, a SQL query can transform thousands of database rows into a concise JSON object before it ever reaches the LLM.

By adding claude --output-format=json and piping to jq -r '.result // empty', we enforce a contract. The AI must return valid JSON with the expected fields. No more parsing natural language; we get deterministic data extraction.

Solving Agent Confusion with High-Level Abstractions

This pipeline becomes a building block for higher-level, purpose-built tools. Rather than asking an agent to ‘figure out how to check database health,’ we build a Go function or a shell script that does it for them.

// ai-research/code-inspiration.go
func runDatabaseHealthAnalysis() (string, error) {
    // 1. Data Gathering & Pre-processing
    query := "SELECT json_build_object('active_connections', count(*)) FROM pg_stat_activity;"
    psqlCmd := exec.Command("psql", "-c", query)

    // 2. AI Analysis (sandboxed)
    claudeCmd := exec.Command("claude", "-p", "Analyze this database info...", "--output-format=json")
    
    // 3. Structured Extraction
    jqCmd := exec.Command("jq", "-r", ".analysis")

    // ... pipe them together and execute ...
    
    return analysisText, nil
}

The agent is never asked to choose between psql, mysql, or reading a log file. It’s simply given a tool, runDatabaseHealthAnalysis, that works. The pipeline is a fixed, non-negotiable workflow. The AI has become a powerful but constrained specialist.

Solving ‘Lazy’ Behavior by Architecting it Out

In this model, the AI is completely sandboxed. It receives data on stdin and writes JSON to stdout. It has zero ability to execute other commands, modify filesystem permissions, or try to use --no-verify. The ‘whack-a-mole’ problem is solved because we took the mallet away. The AI’s operational scope is strictly limited to text transformation, making it a predictable tool, not a mischievous intern.

The Dual-Interface Advantage

This approach has a beautiful side effect: it creates tools that are better for both machines and humans.

  1. For the Machine: The core pipeline produces clean, structured JSON, perfect for further programmatic use, testing, and chaining with other tools.
  2. For the Human: We can easily add a formatting function that takes the machine-readable JSON and transforms it into an emoji-rich, human-friendly summary for the console.

We get the best of both worlds: robust automation and a great developer experience.

Conclusion: Build Better Tools, Not Just Better Agents

The path to reliable AI automation isn’t just about waiting for the next generation of models. It’s about meeting them halfway with better information architecture. By shifting our mindset, we can build tools that are powerful and predictable.

  1. Constrain the AI’s Role: Treat it as a stateless function that transforms structured data.
  2. Engineer its Context: Feed it precisely the information it needs, pre-processed for easy consumption.
  3. Enforce its Output: Define a strict contract for its response and programmatically validate it.

By embracing the Unix philosophy of small, sharp tools that work together, we can move beyond the frustration of flaky agents and start building the next generation of truly robust, AI-powered automation.


References

  1. Rethinking CLI interfaces for AI by Ryan Stortz
  2. psql - PostgreSQL interactive terminal
  3. jq - command-line JSON processor
  4. Claude Code CLI Reference
  Let an Agentic AI Expert Review Your Code

I hope you found this article helpful. If you want to take your agentic AI to the next level, consider booking a consultation or subscribing to premium content.