I have built around 300 agents, worked at 5 startups. Here's what I learnt about AI Agent
The shape of an agent
Most "agents" in production are not the autonomous things the marketing implies. They are pipelines with one or two LLM calls and a lot of validation around them. That's fine — pipelines work — but it pays to be honest about what you're actually shipping.
Here's the rough taxonomy I ended up with after building this stuff for a year and a half:
- Reactive agents. Take an input, do one step of reasoning, return a structured output. ~70% of what I shipped.
- Stepwise agents. Plan a small finite sequence of tool calls, execute, return. ~25%.
- Open-loop agents. Decide their own stopping condition. ~5%, and almost always too expensive to run in production at scale.
What actually moved the needle
Three things, in order:
- Tighter input schemas. The single biggest reliability gain I got wasn't from a better model — it was from refusing to accept loosely-typed inputs.
- Structured outputs everywhere. JSON mode + zod schemas. If you can't validate the output, you can't trust it.
- A clear retry policy. Not "try again on failure" — a specific, bounded fallback for the specific failure mode.
If the agent doesn't have a clear escape hatch when the model returns garbage, you don't have an agent. You have a flaky API call.
What I'd skip
- Multi-agent frameworks. I never shipped one that beat a well-instrumented single agent.
- Vector stores for FAQs. A keyword search and 30 lines of Python beat a vector DB for almost every "internal docs" use case I tried.
- Custom evals before product-market fit. Spend that time talking to users.
The honest summary
Agents are infrastructure. They are not magic. The teams I saw succeed treated them like databases: boring, observable, well-fenced. The teams I saw fail treated them like brilliant interns and were surprised when the intern made things up.
