When I was building a super agent for Evan at Arena Hall, the goal was straightforward: pull relevant context from Gmail, check the calendar, do some web research, and produce a polished meeting prep document.
My instinct was ChatGPT. It's what I use most days.
I tested it. The results were fine but inconsistent. Some steps in the chain would stall. The synthesis at the end sometimes lost context from earlier steps. It wolosesked — but it felt like it was working harder than it should.
I switched to Gemini 3.0. Same workflow, different model. The difference was noticeable: faster, cleaner handoffs between steps, better at keeping the reasoning thread intact across all four actions.
I haven't switched back.
What Multi-Step Agents Actually Do
A lot of AI use is single-step: you give it a prompt, it gives you an output. That's fine for most tasks.
Multi-step agents are different. They chain several actions together in sequence, where the output of one step feeds the input of the next.
An example from the Arena Hall workflow:
- Search Gmail for emails from specific contacts in the last 14 days
- Check the calendar for meeting context and logistics
- Pull web research on the person or company you're meeting
- Synthesize everything into a one-page briefing
Each of those steps is its own action. The agent has to complete them in order, pass context forward, and produce something coherent at the end.
That chaining is where Gemini 3.0 currently shines. It's faster than ChatGPT on this pattern. It handles the context handoffs better. And because it's a Google product working with Google tools — Gmail, Calendar, Drive — there's a native integration advantage that isn't just marketing.
The Bigger Point: Route Work to the Right Tool
Here's what I've noticed watching people level up with AI: the ones getting the best results stopped being loyal to one model.
That sounds obvious. But most people — even people who use AI a lot — tend to have a primary tool they default to for almost everything. ChatGPT for most users. Claude for technical folks. Gemini for people deep in Google Workspace.
The better approach is what I call being multi-tool native: route work to whichever model handles that specific job best, rather than forcing your primary tool to do everything.
Here's roughly how I route things now:
ChatGPT — daily driver, general-purpose thinking, strategy, brainstorming. The most capable overall. Best for open-ended tasks that don't require specialized performance.
Claude — technical reasoning, code review, careful long-form writing, anything where the thinking process should be visible and auditable. Better at showing its work.
Gemini — multi-step agents, tasks involving Google products, image analysis and generation, and anything where you need fast output at scale. Currently the best model for chaining agent actions.
Perplexity — real-time research, anything where you need current information rather than trained knowledge.
Lindy — recurring automations that run without you. Once a workflow is proven, it moves to Lindy.
This isn't a fixed system — model capabilities shift, and I update the routing as things change. Gemini 3.0 wasn't always my answer for multi-step agents. It's the right answer now, and something better might come along next quarter.
Why Tool Loyalty Is the Wrong Mindset
The instinct toward one tool is understandable. Learning a new AI platform takes time. Building familiarity with its strengths and quirks takes repetition. It feels inefficient to spread that attention across multiple tools.
But the cost of tool loyalty is forcing the wrong model onto tasks it's not best suited for. You get mediocre results that require more correction. You miss performance improvements that are sitting right there in a different tool.
The people getting the most out of AI right now are the ones treating model selection as a skill — not just prompting, not just use cases, but knowing which model to reach for based on what you're trying to do.
From tool loyalty to tool literacy. That's the shift.
A Simple Starting Point
If you're building agents and you haven't tried Gemini 3.0 for multi-step workflows, that's the easiest experiment you can run.
Take a workflow that chains several actions together — email + calendar + research + synthesis is the classic combo — and run it in Gemini. Compare it to whatever you've been using.
You might find it works better. You might find the difference is minimal for your specific use case. Either way, you're building the routing judgment that separates good AI users from great ones.
I help founders and operators build AI systems that actually fit how they work — including figuring out which tools belong in which parts of the workflow. If you're building agents and want an outside perspective, reach out or check out my AI consulting and workshop programs.
