Skip to content
Go back

Inferring Intent

Published: Jun 1, 2026
Updated: Jun 4, 2026
Vancouver, Canada

My client sent me:

still researching the right wedge.

there’s a market, but no real defensibility.

building software is getting easier, building a business is a lot harder.

That is the new default.

Software is cheaper to build. Agents can write code, generate documents, test flows, summarize calls, and ship artifacts faster than small teams could a few years ago.

That does not make companies easier to build. It moves the moat.

For a long time, the dashboard was the product. The company collected data, organized it in a web app, and sold access to the workflow.

That still works. But if an interface can be generated, copied, or replaced by an agent, the interface is not the moat.

The moat shifts to what the agent can see, infer, and do.

‘Data’ is too broad to be useful. Most teams already have more data than they can use. The question is whether the company captures data that reveals intent.

Agents matter because they can infer intent from messy signals. They can ask follow-up questions. They can observe hesitation. They can remember constraints. They can notice what a user avoids saying. They can compare stated preferences with actual behavior.

A dashboard waits. An agent participates and creates the next signal.

The next signal is voice.

Intent is not only in words. It is in pace, timing, interruption, silence, certainty, confusion, warmth, impatience, and the way someone changes tone when the conversation touches the real constraint. A transcript captures what was said. Audio captures how it was said.

GPT Realtime 2 makes audio a first-class input, not just a preprocessing step. OpenAI describes it as its ‘most capable realtime voice model’ for speech-to-speech interactions, with text, audio, and image input and text and audio output.

Traditional voice agents stitch together speech-to-text, a language model, and text-to-speech. That works, but the transcript becomes a lossy interface. It may tell you a user said ‘sure.’ It may not preserve whether the ‘sure’ sounded excited, reluctant, exhausted, sarcastic, rushed, or socially polite.

Native multimodal voice models keep the voice signal inside the reasoning loop. That gives builders richer data than text.

The highest-value intent signals appear in negotiation.

Negotiation is where hidden state becomes visible. A buyer reveals urgency. A seller reveals flexibility. A founder reveals what they actually need. A partner reveals which terms matter and which terms are theater.

Google DeepMind’s work on AI for the board game Diplomacy points at the same primitive. Diplomacy is hard because the board is a negotiation surface. DeepMind describes it as ‘a seven-player game of negotiation and alliance formation’ and writes that ‘The heart of Diplomacy is the negotiation phase.’

A clever agent can evaluate the board. A powerful agent can infer intent, propose terms, model who will honor an agreement, and adjust when another party defects.

Who is willing to move?

Who has budget?

Who needs the deal this week?

Who says no to the listed price but yes to a different package?

That is the signal an agent can capture if it is close enough to the transaction.

Relationship agents point toward this future. A matching agent can learn who should meet whom. That is useful. But the deeper primitive is not matching. It is helping the parties reach the right terms after the match.

Boardy is an early proof point. Boardy describes itself as ‘The AI Superconnector’ and promises, ‘I know who you should meet before you do.’ It is building a business around inferred intent in relationship matching. Its public materials point to multi-million-dollar ARR momentum, including a Series A deck that says Boardy added $4.4M ARR in the last four weeks of 2025.

That tweet is a signal. Boardy is learning from how people behave inside a network. Rudeness, responsiveness, follow-through, and reputation all reveal intent. But if the product mostly sees transcripts and text artifacts, it is still working with second-tier data.

The deeper opportunity is voice-native intent. A voice agent can hear hesitation before the user says no. It can notice when someone becomes defensive, curious, bored, rushed, or unusually engaged. It can preserve emotional context instead of reducing the conversation to text.

That gives a new entrant a way to compete without owning the incumbent’s historical database. It can capture a richer class of signal from day one.

Imagine the difference between an agent that says, ‘You two should talk,’ and an agent that says, ‘You two should talk, here is the useful reason, here are the constraints on both sides, here is the smallest next step, and here is the deal structure most likely to survive first contact.’

The second agent owns more of the value chain.

Freight is an obvious example. A shipment is not just an origin, destination, cargo type, and price. It is timing, trust, capacity, risk, payment terms, customs friction, lane familiarity, and the cost of being wrong. The negotiation reveals the real market.

That creates three moats.

First, proprietary negotiation traces. The company learns from the path to agreement, not just the final outcome.

Second, better liquidity. If the agent understands both sides, it can create matches that search, filters, and generic outreach miss.

Third, execution trust. Once users let the agent negotiate, coordinate, and close small steps, the agent becomes part of the relationship. The value is not just prediction. It is reputation, memory, and knowing which promises survive incentives.

This is why ‘AI agent’ is not enough as a company thesis. An agent wrapped around commodity data is a feature. An agent that sits inside a high-value negotiation loop can become infrastructure.

The product question becomes sharper:

What negotiation does your agent see that others do not?

What can it infer from that negotiation?

What action can it take that makes both sides better off?

If the answer is weak, the agent is probably just a nicer interface.

If the answer is strong, the agent may be the moat.

Content Attribution: 20% by Alpha, 80% by Codex (GPT-5.5 High, OpenAI)
  • 20% by Alpha: Original draft and core concepts
  • 80% by Codex (GPT-5.5 High, OpenAI): Content editing and refinement
  • Note: Estimated 80% AI contribution based on 20% lexical similarity and 210% content expansion.