• Home
  • /
  • Blog
  • /
  • The One-Sentence Diagnostic for Every Underperforming AI Agent

I was building a weekly briefing agent for Evan Baehr, and it kept missing calendar events.

This wasn't a minor issue — the whole point of the agent was to give him a clear view of his upcoming week. If it's dropping events, it's broken.

I tried the obvious things. Rewrote the prompt. Added more specific instructions. Tested with a different model. Nothing worked.

Then I looked at what the agent was actually processing: 50 meetings' worth of data in a single pass. A full week of calendar entries, meeting notes, and context, all dumped in at once.

That's not a prompting problem. That's a size problem.

The fix was simple: stop processing a whole week at once. Run the agent on each day separately. Then combine the daily summaries into a weekly brief at the end. The task got smaller. The results got accurate.

The Diagnostic

After building enough of these agents, I've landed on a single rule for debugging AI performance:

Whenever your AI isn't producing the results you want, it's almost always too big. Break it smaller.

Not a better model. Not a more detailed system prompt. The task itself needs to shrink.

This sounds counterintuitive because most people's instinct when something isn't working is to add more — more context, more instructions, more examples. But the context window doesn't work like that.

Think of it as a sheet of paper. The AI can read everything on one sheet with full attention and accuracy. Once you start cramming more onto it — data, instructions, previous outputs, background context — it starts skimming. It misses details. It hallucinates. It produces outputs that technically address the prompt but miss the point.

Once you exceed about half the context window's capacity, performance degrades noticeably. Accuracy drops. Errors increase. And no amount of additional instruction fixes it, because the problem isn't the instruction — it's the space.

What This Looks Like in Practice

A client was running an agent to process a large business database and generate reports. The agent was working fine at a small scale. At larger volume, the quality started declining — outputs were getting vague, missing specific data points, producing results that felt generically correct but weren't reliably accurate.

Cost per query: $9. That's also unsustainable at scale, but the quality issue was the real problem.

We restructured the architecture. Instead of feeding the agent the entire database and asking it to figure out what was relevant, we pre-processed the data into smaller, use-case-specific slices. Each agent run got exactly the information it needed for that particular task — nothing more.

Same output quality. Cost dropped to $0.07 per query.

The AI didn't get smarter. It got a smaller problem to solve.

The Scale Test

I use a simple mental check when building any agent: how would I do this for 150,000 items?

If I can picture the architecture running at that scale without falling apart, it's probably designed right. If I can't picture it — if the approach only works because the dataset is small — I need to redesign before building further.

This scale thinking catches most architectural problems early. A weekly briefing that processes everything at once might work fine for a light week. But give it 50 meetings and it breaks. That's a sign the architecture was never right; I just hadn't stressed it yet.

Breaking into smaller components usually means one of a few things:

Chunking the data. If you're processing a week's worth of content, process it day by day. If you're processing a database, slice it by category or use case. The agent only sees what it needs for the task at hand.

Staging the workflow. Run one agent to extract raw data, a second to analyze it, a third to format the output. Each step gets a clean context window rather than inheriting the full weight of every previous step.

Filtering before processing. Instead of giving the agent everything and asking it to figure out what matters, filter the data first. Extract the relevant subset, then run the agent on that subset.

Any of these approaches can make a failing agent work. And they usually don't require touching the prompt at all.

The Real Problem With “Add More”

The reflex to add more — more context, more examples, more instructions — makes sense from a human perspective. When we need to explain something more clearly, we add detail. We think the AI works the same way.

It doesn't. The AI works better with less information that's more precisely relevant than with more information that's broadly related.

When an agent is underperforming, the question isn't “what else can I tell it?” It's “what can I take away?”

The answer to that question usually fixes the problem.


Thanh Pham is the founder of Asian Efficiency and an AI consultant based in Austin, TX. If you want to get better at building AI agents that actually work at scale, start with the 4-Day AI Sprint.


You may also Like


ABOUT THE AUTHOR

Thanh Pham

Founder of Asian Efficiency where we help people become more productive at work and in life. I've been featured on Forbes, Fast Company, and The Globe & Mail as a productivity thought leader. At AE I'm responsible for leading teams and executing our vision to assist people all over the world live their best life possible.


Leave a Reply


Your email address will not be published. Required fields are marked

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}