Vercel published results claiming AGENTS.md beats skills 100% to 79%. They embedded an 8KB docs index directly in AGENTS.md and watched it outperform a .skills/ folder, even when they told the agent to use the skills.
The test had a flaw.
Vercel tested Claude Code but used only name and description frontmatter, ignoring optional fields that control behavior. The test was incomplete. This matters because agent systems handle frontmatter differently, and those differences determine whether skills trigger reliably.
OpenAI documented the problem:
When you’re iterating on a skill for an agent like Codex, it’s hard to tell whether you’re actually improving it or just changing its behavior. One version feels faster, another seems more reliable, and then a regression slips in: the skill doesn’t trigger, it skips a required step, or it leaves extra files behind.
The issue is not skills versus context stuffing. It is frontmatter quality.
How Claude Code Handles Frontmatter
Claude Code supports multiple YAML fields beyond name and description. The description determines when to apply the skill. Other fields control behavior: argument-hint, disable-model-invocation, user-invocable, allowed-tools, model, context, agent, hooks.
String substitutions work inside skill content: $ARGUMENTS, $ARGUMENTS[N], $N, ${CLAUDE_SESSION_ID}.
Claude Code frontmatter configures real behavior—gating, tool allowlists, subagent execution—not just metadata.
How Codex Handles Frontmatter
Codex requires name and description. It ignores extra YAML keys. At startup, Codex loads only the skill’s name, description, and file path into context. The instruction body loads when the skill is invoked.
Constraints: name must fit in 100 characters, description in 500 characters.
In Codex, adding more YAML fields does nothing. Description quality is the only lever.
Does Frontmatter Solve Skill Selection?
Frontmatter solves context bloat through progressive disclosure. It does not automatically solve selection reliability when you have many overlapping skills.
Both systems activate skills based on a short semantic summary. With many skills, vague or similar descriptions cause the model to pick the wrong one. Sharp descriptions fix this.
Frontmatter Strategy for Many Skills
For Codex
Use only name and description:
---
name: fix-typescript-build
description: Fix TypeScript build errors in a Vite/React repo; do not refactor unrelated code or change formatting.
---
Make the description do three jobs in one or two lines:
- Trigger: ‘When user asks X’
- Scope: repo/framework/context keywords
- Hard negatives: ‘Do NOT do Y’
Vague descriptions cause over-triggering.
For Claude Code
Use the same description discipline, plus add behavior controls:
---
name: setup-demo-app
description: Scaffold a Vite+React+Tailwind demo app; use when starting a new UI repro repo.
argument-hint: [project-name]
allowed-tools: Read, Grep
disable-model-invocation: false
user-invocable: true
---
Use disable-model-invocation: true for skills you only want via /name. Use allowed-tools to prevent permission friction. Use context: fork + agent for clean subagent execution boundaries.
Make Skills Reliable
Standardize descriptions with a consistent formula:
Do <action> for <context>. Trigger: <requests>. Exclude: <non-goals>.
This consistency makes the model’s matching stable across dozens or hundreds of skills.