Micro Models: The $100 AI Revolution

The AI industry tells one story: building a language model costs millions. OpenAI, Anthropic, Meta—they all spend fortunes on compute. The narrative is clear: serious AI requires serious money.

Then Andrej Karpathy trained a working ChatGPT clone for $92.40.

Ninety-two dollars. Four hours on a cloud GPU. That cost covers the entire pipeline, from raw text to a web UI where you can chat with an LLM you own.

This isn’t a toy. It’s nanochat, a full-stack, 8,000-line blueprint for building your own AI.

Welcome to the age of micro models.

What Are Micro Models?

Micro models are language models trained for under $1,000. That’s the total cost—from raw data to a deployed model you fully own.

Until recently, this idea was a fantasy. Training a useful AI was a fortress only the rich could enter. nanochat provides the key. It’s a clean, minimal blueprint showing one clear path from start to finish.

The magic number? 4e19 FLOPs. On modern hardware, that’s four hours of work and less than $100. Spend $300, and you match GPT-2. Spend $1,000, and your model can solve simple math and code problems.

The `nanochat` Blueprint

The nanochat script is a four-stage recipe for a $100 model.

1. Tokenization First, the script trains a custom tokenizer on two billion characters of web text. In one minute, it creates a vocabulary that’s more efficient than GPT-2’s.

2. Pretraining Next, it pretrains a 560-million-parameter Transformer on 11 billion tokens. This is the heavy lift: three hours on eight GPUs. The model learns facts—Paris is in France, gold is Au—and becomes a powerful autocomplete engine, already outperforming GPT-2 Large.

3. Midtraining An eight-minute finetuning session teaches the model to be a chatbot. It learns conversational structure, how to answer multiple-choice questions, and how to use a Python interpreter as a tool.

4. Final Polish (SFT) A final seven-minute round of supervised finetuning on high-quality examples tightens the model’s alignment, boosting its benchmark scores.

The result is a model you can talk to. It’s not GPT-4, but it’s yours. You built it, you own it, and you control it.

Why This Matters

Why does a $100 model matter in a world of billion-dollar AIs?

It reopens AI research. When training costs fall from millions to hundreds, anyone can experiment. A PhD student can test a new architecture. A startup can build a dozen specialized models. The scientific method depends on reproducibility; micro models make it possible again.

It transforms learning. Instead of just using AI APIs, students can now build their own models from scratch. This creates a deeper, more visceral understanding of how AI works—its strengths, its flaws, its limits. You learn more from building one model than from making a thousand API calls.

It enables true specialization. Why use a massive, general-purpose model when you can train a specialized one for less? A medical AI doesn’t need to write poetry. A code review bot doesn’t need to know sports trivia. Micro models trained on domain-specific data can outperform larger models on targeted tasks for a fraction of the cost.

It solves privacy. Training your own model on your own infrastructure means your data never leaves your control. For healthcare, finance, and law, this is a game-changer. The cost of compliance for an external API can easily exceed the cost of training a private micro model.

The Performance Reality Check

Let’s be clear: a $100 model is not GPT-4. Karpathy calls it ‘like talking to a kindergartener.’ It scores just above random chance on tough benchmarks.

But it shows real sparks of knowledge and reason. It’s a starting point.

A $300 model matches GPT-2, a landmark AI from 2019. A $1,000 model solves basic math and code problems. The gap is closing. Techniques developed for huge models are making small models surprisingly capable.

Who Should Build a Micro Model?

Startups Creating a Moat

Most startups should begin by fine-tuning an open model. As I wrote in private models, GPT-OSS is a game-changer that gets you most of the way there for a fraction of the cost.

But if fine-tuning doesn’t create the edge you need, training a micro model from scratch builds an unbeatable moat. Building on a closed API leaves your competitive advantage thin. Your costs scale with usage, and you’re subject to the whims of your provider. A micro model offers an escape. For less than $1,000, you can train a model on your unique data and achieve 85% accuracy where general APIs only reach 70%. That 15-point gap is your moat. You control your roadmap, your costs, and your destiny.

Enterprises with Ironclad Compliance

For any company handling sensitive data, external AI APIs are a compliance nightmare. Where does the data go? Who sees it? A micro model erases these questions. Train it on your own servers with your own data. Nothing ever leaves your control. For healthcare, finance, or government, a micro model isn’t just cheaper—it’s often the only compliant path forward.

A New Way Forward

Micro models are part of a larger shift. The first wave of AI was about accessing massive, centralized models. This next wave is about owning smaller, specialized ones. It’s a move toward control, privacy, and true innovation.

Ready to build your own?

Define your goal. Do you need a generalist or a specialist? Fine-tuning an open model like GPT-OSS might be enough. If you need total control and data sovereignty, train from scratch.
Experiment with nanochat. Clone the repo and run the tutorial. Break things. Change the data. See what happens. This hands-on experience is invaluable.
Test honestly. Don’t just look at benchmarks. Does it solve your problem? If a $500 model doesn’t meet your needs, don’t force it.
Iterate. The beauty of cheap training is that you can do it again. Collect new data, tweak the architecture, and retrain. A cycle that costs millions for big tech costs you hundreds.

The frontier models will always push the limits of what’s possible. But for most real-world problems, you don’t need the biggest model. You need the right one. Micro models make it possible for anyone to build it.

The future of AI isn’t just about building bigger models. It’s about building the right model for the right job.

The revolution isn’t coming. It’s here. And it costs $100.

Further Reading:

Explore nanochat on GitHub for the complete implementation
Learn about Zero-Dependency Advantage for strategic AI infrastructure decisions
Understand Private Models and the GPT-OSS revolution
Master Agentic AI fundamentals for building effective workflows
See how ctx solves token economics for local and cloud models