May 10, 2026 · 3 min read
How agents actually plan, in practice
Notes from six months of shipping production agents — and why the textbook ReAct loop is almost never what you want.
The textbook agent loop is beautiful: observe, think, act, observe, think, act. It is also, almost without exception, not what ends up in production. After six months of shipping agentic systems for real customers, the pattern that survives contact with reality looks less like a loop and more like a thin scaffold around a much more opinionated planner.
This post is a field note: what we tried, what broke, and the heuristics that finally stuck.
The honeymoon
Every new agent project starts in the same place. Someone wires up a tool-using loop with a frontier model, gives it five or six tools, and watches it solve a demo task on the first try. The room exhales. We have a product.
Two weeks later, the same loop is failing on prompts that look almost identical to the ones it aced on day one — just with a slightly fuller context, slightly noisier tool outputs, or a slightly broader question. Welcome.
What actually breaks
Three failure modes, in order of how often we hit them:
- The model plans, then forgets it was planning. Mid-trajectory, attention drifts to whatever the last tool returned, and the agent silently abandons the original objective.
- The model commits to a bad plan early and refuses to revise. This is the inverse problem. Once the first tool call is in the transcript, the model treats it like load-bearing evidence rather than a guess.
- The model hallucinates affordances. It invents tool parameters, infers tools that don't exist, or assumes that calling a tool twice will produce a different result.
None of these are model bugs. They are predictable consequences of asking a next-token predictor to act as a long-horizon planner over a noisy state.
The shape that survives
What works in production looks much more like a two-tier system: a deliberate planner that produces an explicit, inspectable plan, and an executor that follows the plan with a much tighter loop and clear stopping conditions.
The planner is allowed to be slow and expensive. The executor is allowed to be dumb and fast. The boundary between them is the contract that makes everything else debuggable.
The single biggest lift you can give a production agent is to make the plan a first-class artifact — written down, versioned, and re-examinable — rather than something that lives only inside the model's attention.
Three heuristics worth stealing
- Always have a written plan. If you can't
catthe plan, the agent doesn't have one. - Failure is a planning event, not an execution event. When a tool errors, route control back to the planner, not back to the same model that just failed.
- Treat the context as a budget. Past a certain depth, summarize aggressively. The model that thinks it can hold ten tool outputs in working memory is lying to you.
I'll write up the supervisor pattern we ended up with in a follow-up. The short version is that the most expensive token in your agent system is probably the one that isn't a plan.