Multi-Agent Systems: What PMs Need to Know

The mistake everyone makes

The biggest risk with multi-agent isn't building the wrong system. It's building a complex system when a simple one would've worked.

Every other pitch deck mentions "multi-agent AI" now. Your CEO saw Manus (acquired by Meta for $2B+) and wants to know why you don't have one. Your engineering team is excited about the architecture. And the instinct is to go big.

Don't.

I shipped a multi-agent system at a major cloud company. The most important thing I learned: we almost over-engineered it. We initially scoped 40 features from competitor analysis and the instinct was to build a complex multi-agent orchestration from day one. We didn't. We scoped down to 10 features, started with a simpler architecture, and only introduced multi-agent coordination when we hit a concrete wall.

That wall was specific: a single agent couldn't reason across 5 distributed cloud services simultaneously. It couldn't hold enough context. Different parts of the diagnosis needed different specializations. And the reasoning chain was too many steps deep for one model to handle reliably.

The result? We shipped 50% faster than the original timeline and cut SRE incident diagnosis time by 90%. Not because multi-agent was magic, but because we only used it where it was actually necessary.

When you actually need it

Multi-agent earns its complexity in exactly three situations. Everything else is a single agent with good tools.

Signal	Verdict
Task has 1-3 clear steps	Single agent. Give it good tools, clear prompt. Done.
Steps need different expertise	Multi-agent starts making sense. Diagnosis needs network logs, remediation needs infra configs. Those are different specializations.
Steps can run in parallel	Strong case. Researching 5 data sources simultaneously vs sequentially saves real time. Anthropic's research system does exactly this, and their data shows it outperformed a single agent by over 90% on complex tasks.
Problem spans multiple systems	Strongest signal. Root-cause analysis across 5 services with different data formats and access patterns. A single agent chokes on context.
It sounds cool	Stop. Ship the single-agent version first. You'll learn what actually needs splitting.

So what does multi-agent actually look like in practice? Simple: multiple AI agents working on the same problem, each specialized. One plans, another researches, another executes, another verifies. They pass work between each other, sometimes in parallel, sometimes in sequence.

Manus AI uses a three-agent split: Planner (strategist), Execution (doer), Verification (quality checker). That's what lets it handle end-to-end tasks autonomously, which is why Meta paid $2B+ for it. Anthropic's research system uses a lead agent that spawns parallel sub-agents, each searching different aspects simultaneously, then synthesizing results back.

5 questions to ask before your team builds anything

If your engineering team proposes multi-agent and can't answer these, they haven't done the thinking yet.

Your cheat sheet

1. "What happens if we build this as a single agent with good tools?"
If they can't articulate why that won't work, stop. Always start with the simplest architecture that could work. You can split later.

2. "How do agents communicate failures to each other?"
This is the hardest part and the one most teams underestimate. Agent A passes bad data to Agent B, which confidently produces a wrong answer. What catches that? If the answer is "nothing," you have a reliability problem that will kill you in production.

3. "What's our human-in-the-loop strategy?"
Full automation isn't always the goal. KPMG's Q4 data shows 60% of enterprises restrict agent access to sensitive data without human oversight. Know where humans approve, escalate, or override.

4. "How do we measure success before we build?"
Define the metric and target before a line of code. If the target can be hit with a single agent, you just saved your team months of orchestration complexity.

5. "What's the cost model?"
Each agent call costs tokens. Multi-agent multiplies costs by agents × interactions. A 5-agent system running 20 interactions per task at $0.01/call = $1/task. At 10,000 tasks/month, that's $10K in inference alone. Model that before you architect.

Who's building the infrastructure (and why PMs should care)

The vendor lock-in era for agents is ending. Three competing protocols just merged under one roof. This directly affects your build-vs-buy decisions.

In December 2025, something unusual happened: Anthropic, OpenAI, and Block co-founded the Agentic AI Foundation under the Linux Foundation, with AWS, Google, Microsoft, Bloomberg, and Cloudflare as platinum members. Each donated their core agent infrastructure:

Anthropic donated MCP (Model Context Protocol), the standard for connecting agents to tools and data. Think USB-C for AI: before MCP, every tool integration was custom. Now it has 10,000+ active public servers and 97 million monthly SDK downloads. TechCrunch covered this as the foundational plumbing of the agent era.

OpenAI donated AGENTS.md — a standard that tells agents how to behave inside specific projects. Over 60,000 open-source projects adopted it within months. Less about agent-to-agent communication, more about making agents predictable inside your codebase. OpenAI called it the foundation for portable agent tooling.

Google released A2A (Agent-to-Agent) — the protocol for peer-to-peer agent collaboration. Agents from different vendors can negotiate, share findings, and coordinate without a central controller. IBM predicts 2026 is when these patterns move from lab to production.

Why does this matter for PMs? Because the question used to be "which vendor do we lock into?" Now it's "which standard do we build on?" And increasingly the answer is all of them, because they're converging. Your build-vs-buy calculus just shifted.

So what do you actually do on Monday?

Don't start with multi-agent. Start with one agent, measure where it breaks, and introduce multi-agent only at those seams.

Gartner predicts 40% of enterprise applications will embed AI agents by end of 2026 (up from under 5% in 2025). The market is projected to reach $52B by 2030. This isn't hype that goes away. This is the next layer of how software gets built.

But the organizations getting stuck right now are the ones that jumped straight to complex orchestration without understanding what a well-tooled single agent could do. They're in what analysts call "perpetual pilot purgatory": impressive demos, no production value.

Your roadmap as a PM:

The playbook

Week 1-2: Ship a single agent with good tool access on your highest-value workflow. Measure its accuracy, speed, and failure modes.

Week 3-4: Document exactly where and why it fails. Is it context overload? Reasoning depth? Specialization gaps? Parallelism needs?

Month 2: Only if failures map to the multi-agent signals above, scope a multi-agent architecture targeting those specific seams. Not the whole workflow, just the parts where single-agent concretely breaks.

Ongoing: Monitor cost per task, accuracy by agent, and human escalation rate. These three metrics tell you if multi-agent is earning its complexity or just adding it.

Start with one agent. Give it great tools. Measure where it fails. That's your multi-agent roadmap.

Sources

[1] Anthropic, How we built our multi-agent research system, Jun 2025

[2] Linux Foundation, Agentic AI Foundation announcement, Dec 2025

[3] TechCrunch, OpenAI, Anthropic, and Block join Linux Foundation effort, Dec 2025

[4] OpenAI, OpenAI for Developers in 2025

[5] KPMG, Q4 AI Pulse Survey: Agent-Driven Enterprise, Jan 2026

[6] Gartner via MLMastery, 7 Agentic AI Trends 2026

[7] IBM Think, AI and tech trends shaping 2026

[8] Manus AI, Introducing Wide Research

[9] Wikipedia, Manus (AI agent) - Meta acquisition