Multi-Agent AI Patterns: A Developer’s Field Guide

AI Agents

Design Patterns

LLMs

Software Architecture

Suman Das catalogs the eight core multi-agent design patterns every AI developer needs — from sequential pipelines to evaluator-optimizer loops — with clear guidance on when to use each, which frameworks support them, and how production systems combine them.

Author

Sean Lewis

Published

March 4, 2026

📄 Read the Full Article

The Gist

We solved the monolith problem in software engineering decades ago — we broke systems into microservices, each doing one thing well, communicating through clean interfaces. The same evolution is now happening in AI. When a single agent tries to research, write, code, review, and deploy all at once, it loses context, hallucinates, and forgets what it said three steps ago. The fix is the same: decompose.

Suman Das’s April 2026 article provides the most practical taxonomy I’ve seen of multi-agent design patterns — the architectural templates that define how multiple AI agents coordinate to solve problems that overwhelm any single agent. He identifies eight core patterns and three emerging ones, each with clear when-to-use / when-not-to-use guidance and framework support across LangGraph, CrewAI, AutoGen, Google ADK, and the Anthropic Agent SDK.

The key insight isn’t whether to use multiple agents — it’s which pattern fits your problem and when. Das provides a decision framework: start simple, add complexity only when you need it.

Why It Matters Now

Every major AI framework now supports multi-agent orchestration, but the documentation tends to show you how to wire agents together without telling you which wiring pattern to choose. This article fills that gap. If you’re building anything beyond a single chatbot — a research pipeline, a code generation system, a customer support platform — you’re implicitly choosing one of these patterns whether you know it or not. Knowing the taxonomy means making that choice deliberately.

The timing is also significant. As of early 2026, LangGraph, CrewAI, AutoGen, Google ADK, and the Anthropic Agent SDK have all matured their multi-agent APIs. The patterns Das describes aren’t theoretical — they map directly to production-ready framework primitives.

The Eight Core Patterns

flowchart TD
    A["🤖 Multi-Agent Patterns"] --> B["1. Sequential / Pipeline"]
    A --> C["2. Orchestrator-Worker"]
    A --> D["3. Parallel Fan-Out → Fan-In"]
    A --> E["4. Reflection / Self-Critique"]
    A --> F["5. Router / Dispatch"]
    A --> G["6. Planning + Execution"]
    A --> H["7. Handoff"]
    A --> I["8. Evaluator-Optimizer Loop"]

    style A fill:#f0ebe4,stroke:#0d7c5f,color:#1a1a1a
    style B fill:#faf6f1,stroke:#0d7c5f,color:#1a1a1a
    style C fill:#faf6f1,stroke:#0d7c5f,color:#1a1a1a
    style D fill:#faf6f1,stroke:#0d7c5f,color:#1a1a1a
    style E fill:#faf6f1,stroke:#0d7c5f,color:#1a1a1a
    style F fill:#faf6f1,stroke:#0d7c5f,color:#1a1a1a
    style G fill:#faf6f1,stroke:#0d7c5f,color:#1a1a1a
    style H fill:#faf6f1,stroke:#0d7c5f,color:#1a1a1a
    style I fill:#faf6f1,stroke:#0d7c5f,color:#1a1a1a

Multi-Agent Pattern Taxonomy

1. Sequential / Pipeline

Agents in a straight line — each does its job and passes the result to the next. The assembly line of multi-agent systems.

When to use: Clear, ordered stages where each step has a distinct responsibility and you need auditability at every step.

When NOT to use: Stages are independent (use Parallel instead), or latency matters — this pattern is only as fast as the slowest agent.

Sample Use Case: A content publishing pipeline — Agent A extracts key points from raw research, Agent B transforms them into a blog draft, Agent C validates facts and grammar, Agent D generates the final summary and SEO metadata.

# Sequential Pipeline with LangGraph
from langgraph.graph import StateGraph, END

def extract_agent(state):
    """Agent A: Extract key points from research."""
    prompt = f"Extract the 5 key findings from: {state['raw_research']}"
    state["key_points"] = llm.invoke(prompt)
    return state

def draft_agent(state):
    """Agent B: Transform key points into a blog draft."""
    prompt = f"Write a blog post based on these points: {state['key_points']}"
    state["draft"] = llm.invoke(prompt)
    return state

def validate_agent(state):
    """Agent C: Fact-check and grammar review."""
    prompt = f"Review this draft for accuracy and grammar: {state['draft']}"
    state["validated_draft"] = llm.invoke(prompt)
    return state

def summarize_agent(state):
    """Agent D: Generate summary and SEO metadata."""
    prompt = f"Create a summary and SEO tags for: {state['validated_draft']}"
    state["final_output"] = llm.invoke(prompt)
    return state

# Wire the pipeline
graph = StateGraph(dict)
graph.add_node("extract", extract_agent)
graph.add_node("draft", draft_agent)
graph.add_node("validate", validate_agent)
graph.add_node("summarize", summarize_agent)

graph.add_edge("extract", "draft")
graph.add_edge("draft", "validate")
graph.add_edge("validate", "summarize")
graph.add_edge("summarize", END)
graph.set_entry_point("extract")

pipeline = graph.compile()
result = pipeline.invoke({"raw_research": "..."})

Framework	Implementation
LangGraph	Native linear graph edges
CrewAI	`Process.sequential`
AutoGen	`initiate_chats` with sequential carryover
Google ADK	`SequentialAgent` (native workflow agent)
Anthropic Agent SDK	Claude chains steps through its reasoning loop

2. Orchestrator-Worker (Hierarchical)

A smart manager agent that understands the big picture, dynamically decides what sub-tasks to create, delegates to specialists, monitors progress, and stitches results together. The key word is dynamic — the orchestrator reasons about the task and may change its plan mid-execution.

How is this different from Parallel Fan-Out? In Fan-Out, you know all sub-tasks upfront. Here, the orchestrator figures out the sub-tasks at runtime, may run some in sequence and others in parallel, and can reassign or retry if something fails.

When to use: Sub-tasks aren’t known upfront, workers may have dependencies on each other’s output, you need centralized coordination and adaptive replanning.

When NOT to use: The task is simple enough for a single agent, or all sub-tasks are independent and known upfront (use Parallel instead).

Sample Use Case: An e-commerce order system — the orchestrator delegates to Inventory, Payment, and Shipping agents. If Inventory reports “out of stock,” the orchestrator adapts by asking a Recommendation agent to suggest alternatives.

# Orchestrator-Worker with CrewAI
from crewai import Agent, Task, Crew, Process

orchestrator = Agent(
    role="Project Manager",
    goal="Coordinate the team to fulfill customer orders",
    backstory="Expert at breaking down complex orders and delegating.",
    llm="gpt-4o"
)

inventory_agent = Agent(
    role="Inventory Specialist",
    goal="Check stock availability and report status",
    backstory="Has access to the warehouse database.",
    tools=[inventory_lookup_tool],
    llm="gpt-4o-mini"
)

payment_agent = Agent(
    role="Payment Processor",
    goal="Process payments securely",
    backstory="Handles all payment gateway interactions.",
    tools=[payment_tool],
    llm="gpt-4o-mini"
)

shipping_agent = Agent(
    role="Shipping Coordinator",
    goal="Calculate delivery options and schedule dispatch",
    backstory="Manages logistics and carrier APIs.",
    tools=[shipping_tool],
    llm="gpt-4o-mini"
)

# CrewAI's hierarchical process lets the manager delegate dynamically
crew = Crew(
    agents=[inventory_agent, payment_agent, shipping_agent],
    tasks=[Task(description="Process order #{order_id}", agent=orchestrator)],
    process=Process.hierarchical,
    manager_llm="gpt-4o"
)

result = crew.kickoff()

Framework	Implementation
LangGraph	Supervisor pattern with sub-graphs
CrewAI	`Process.hierarchical` with `manager_llm`
AutoGen	`GroupChat` with `GroupChatManager`
Google ADK	`LlmAgent` with `sub_agents` and `transfer_to_agent`
Anthropic Agent SDK	Parent agent spawns subagents via Agent tool

3. Parallel / Fan-Out → Fan-In

The speed pattern. When independent sub-tasks don’t depend on each other, fire them all at once, then merge results. Unlike Orchestrator-Worker, there’s no smart manager — a simple splitter distributes pre-defined tasks, and a simple aggregator combines results.

When to use: All sub-tasks are known upfront and independent, latency is critical, you’re gathering data from multiple sources.

When NOT to use: Tasks have dependencies on each other, or you need dynamic task creation.

Sample Use Case: A competitive analysis tool — Agent 1 scrapes websites, Agent 2 pulls financial data, Agent 3 gathers social sentiment, Agent 4 searches news. None need each other’s output. Total time = slowest agent, not the sum of all four.

# Parallel Fan-Out with Python asyncio + LangGraph Send API
import asyncio
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

async def scrape_websites(query: str) -> str:
    """Agent 1: Scrape competitor websites."""
    return await llm.ainvoke(f"Summarize competitor info for: {query}")

async def pull_financials(query: str) -> str:
    """Agent 2: Pull financial data from APIs."""
    return await llm.ainvoke(f"Get financial summary for: {query}")

async def gather_sentiment(query: str) -> str:
    """Agent 3: Analyze social media sentiment."""
    return await llm.ainvoke(f"Analyze social sentiment for: {query}")

async def search_news(query: str) -> str:
    """Agent 4: Search recent news articles."""
    return await llm.ainvoke(f"Find recent news about: {query}")

async def competitive_analysis(query: str) -> str:
    """Fan-out to all agents, fan-in to aggregate."""
    # Fan-out: all agents run concurrently
    results = await asyncio.gather(
        scrape_websites(query),
        pull_financials(query),
        gather_sentiment(query),
        search_news(query)
    )

    # Fan-in: aggregate results
    combined = "\n\n".join([r.content for r in results])
    final = await llm.ainvoke(
        f"Synthesize this competitive analysis:\n{combined}"
    )
    return final.content

report = asyncio.run(competitive_analysis("Tesla Q1 2026"))

Framework	Implementation
LangGraph	Parallel branches with fan-in node, `Send` API
CrewAI	`async_execution=True` on tasks
AutoGen	`a_initiate_chats` with concurrent execution
Google ADK	`ParallelAgent` (native workflow agent)
Anthropic Agent SDK	Multiple Agent tool calls in a single message

4. Reflection / Self-Critique

One agent generates, another reviews, and they iterate until the output meets a quality bar. This gives your AI a built-in editor.

When to use: Code generation (write → test → fix cycles), content creation where first drafts aren’t good enough, any task with clear quality criteria.

When NOT to use: Real-time responses needed (iteration adds latency), first pass is usually correct, or no clear “good enough” criteria.

Sample Use Case: An automated code review system — the Generator writes a function, the Reviewer runs tests and checks for edge cases. If anything fails, feedback goes back. Repeat until all tests pass.

# Reflection / Self-Critique Loop
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o")
MAX_ITERATIONS = 5

def generate(task: str, feedback: str = "") -> str:
    """Generator agent: produce or revise code."""
    context = f"Previous feedback: {feedback}\n" if feedback else ""
    return llm.invoke(
        f"{context}Write a Python function that: {task}"
    ).content

def review(code: str) -> dict:
    """Reviewer agent: evaluate code quality."""
    response = llm.invoke(
        f"""Review this code. Return JSON with:
        - "approved": true/false
        - "score": 1-10
        - "feedback": specific improvement suggestions

        Code:
        ```python
        {code}
        ```"""
    ).content
    return json.loads(response)

def reflection_loop(task: str) -> str:
    """Run the generate-review loop until quality threshold."""
    code = generate(task)

    for i in range(MAX_ITERATIONS):
        review_result = review(code)

        if review_result["approved"] and review_result["score"] >= 8:
            print(f"✅ Approved after {i+1} iteration(s)")
            return code

        print(f"🔄 Iteration {i+1}: Score {review_result['score']}/10")
        code = generate(task, feedback=review_result["feedback"])

    return code  # Return best effort after max iterations

final_code = reflection_loop("sort a list using merge sort")

Framework	Implementation
LangGraph	Cycles with conditional stop
CrewAI	`guardrail` for validation, `max_iter` for retries
AutoGen	`register_nested_chats` for inner critic
Google ADK	`LoopAgent` with `max_iterations` and `escalate=True`
Anthropic Agent SDK	Claude self-evaluates through its reasoning loop

5. Router / Dispatch

A lightweight router classifies the input once and sends it to the best-fit specialist. That’s it — the router’s job is done. It’s a traffic cop, not a project manager.

How is this different from Orchestrator-Worker? The router makes one decision (which agent?) and steps aside. It doesn’t break tasks into sub-tasks, wait for results, or aggregate anything. Use Router when the task goes to one specialist. Use Orchestrator when the task needs to be split across multiple specialists.

When to use: Diverse input types each needing a single specialist, cost optimization (route simple queries to cheaper models), customer support and helpdesks.

When NOT to use: Task needs splitting across multiple agents, all queries need the same processing, or you only have one specialist.

Sample Use Case: A SaaS customer support system — the Router classifies tickets as billing, technical, or feature-request, and each goes entirely to the appropriate specialist.

# Router / Dispatch Pattern
from langchain_openai import ChatOpenAI
from enum import Enum

class TicketType(Enum):
    BILLING = "billing"
    TECHNICAL = "technical"
    SALES = "sales"

# Lightweight router — uses a small, fast model
router_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Specialist agents — can use different models/tools
billing_llm = ChatOpenAI(model="gpt-4o-mini")
technical_llm = ChatOpenAI(model="gpt-4o")  # Harder problems, better model
sales_llm = ChatOpenAI(model="gpt-4o-mini")

def route(query: str) -> TicketType:
    """Classify the query and return the ticket type."""
    response = router_llm.invoke(
        f"""Classify this customer query into exactly one category:
        - billing: payment, invoices, subscriptions, pricing
        - technical: bugs, errors, how-to, integrations
        - sales: upgrades, new features, enterprise plans

        Query: {query}
        Category:"""
    ).content.strip().lower()
    return TicketType(response)

def dispatch(query: str) -> str:
    """Route to the appropriate specialist agent."""
    ticket_type = route(query)

    agents = {
        TicketType.BILLING: billing_llm,
        TicketType.TECHNICAL: technical_llm,
        TicketType.SALES: sales_llm,
    }

    specialist = agents[ticket_type]
    return specialist.invoke(
        f"You are a {ticket_type.value} specialist. Help with: {query}"
    ).content

answer = dispatch("My API keeps returning 429 errors")

Framework	Implementation
LangGraph	Conditional edges with routing functions
CrewAI	`@router` decorator in Flows, `ConditionalTask`
AutoGen	Custom `speaker_selection_method` in GroupChat
Google ADK	`LlmAgent` with `sub_agents` and routing instructions
Anthropic Agent SDK	LLM-driven dispatch via Agent tool descriptions

6. Planning + Execution

Separate the thinking from the doing. A planner agent creates the full roadmap upfront, then executor agents carry out each step. The planner steps back and only re-engages if something fails and requires replanning.

How is this different from Orchestrator-Worker? The Orchestrator is a micromanager — involved at every step. Plan + Execute is more like an architect and builders — the architect draws the blueprint, hands it to builders, and only comes back if the foundation cracks. Planning and execution are clearly separated phases.

When to use: Complex multi-step goals, you need the ability to replan when intermediate results change the approach, research tasks and coding projects.

When NOT to use: Workflow is fixed (use Pipeline), need real-time decision-making at every step (use Orchestrator), or task is too simple to justify a planning step.

Sample Use Case: An automated research assistant — the Planner breaks down “write a market analysis report on EV batteries” into: (1) identify top 5 companies, (2) gather financial data, (3) analyze patents, (4) compare market share, (5) draft report. Executors handle each step. If Step 2 reveals a major company was missed, the Planner revises.

# Planning + Execution Pattern
from langchain_openai import ChatOpenAI
import json

planner_llm = ChatOpenAI(model="gpt-4o", temperature=0)
executor_llm = ChatOpenAI(model="gpt-4o-mini")

def create_plan(goal: str) -> list[dict]:
    """Planner agent: decompose goal into ordered steps."""
    response = planner_llm.invoke(
        f"""Break this goal into 3-7 concrete steps.
        Return JSON array of objects with "step" and "description".

        Goal: {goal}"""
    ).content
    return json.loads(response)

def execute_step(step: dict, context: str = "") -> str:
    """Executor agent: carry out a single step."""
    return executor_llm.invoke(
        f"""Previous context: {context}
        Execute this step: {step['description']}
        Be thorough and specific."""
    ).content

def should_replan(step_result: str, remaining_steps: list) -> bool:
    """Check if results require replanning."""
    response = planner_llm.invoke(
        f"""Given this result: {step_result}
        And remaining plan: {json.dumps(remaining_steps)}
        Should we replan? Reply YES or NO with brief reason."""
    ).content
    return "YES" in response.upper()

def plan_and_execute(goal: str) -> str:
    """Full plan-and-execute loop with replanning."""
    steps = create_plan(goal)
    context = ""

    i = 0
    while i < len(steps):
        result = execute_step(steps[i], context)
        context += f"\nStep {i+1} result: {result}"

        if i < len(steps) - 1 and should_replan(result, steps[i+1:]):
            print(f"🔄 Replanning after step {i+1}")
            steps = steps[:i+1] + create_plan(
                f"Continue from: {context}\nOriginal goal: {goal}"
            )
        i += 1

    return context

report = plan_and_execute("Write a market analysis of EV batteries")

Framework	Implementation
LangGraph	Plan-and-execute pattern (well-documented)
CrewAI	Task `context` dependencies + Flows for planning
AutoGen	Planner + executor via two-agent or GroupChat
Google ADK	Compose with `SequentialAgent` + `LoopAgent`
Anthropic Agent SDK	Claude naturally plans and executes through its loop

7. Handoff

Sometimes an agent knows it’s out of its depth. Instead of hallucinating, it hands off the conversation — with full context — to a more capable agent or a human.

When to use: Customer service with escalation tiers, multi-domain assistants where scope changes mid-conversation, human-in-the-loop workflows.

When NOT to use: A single agent can handle everything, or you prefer routing upfront rather than mid-conversation handoffs.

Sample Use Case: A healthcare triage bot — Agent A handles general wellness questions. When it detects symptoms needing medical advice, it hands off to Agent B (medically-trained, with access to clinical guidelines). If urgent, Agent B escalates to Agent C — a human doctor — with the full conversation history.

# Handoff Pattern
from dataclasses import dataclass

@dataclass
class ConversationState:
    messages: list
    current_agent: str
    metadata: dict

def general_agent(state: ConversationState) -> ConversationState:
    """Agent A: Handle general queries, detect escalation needs."""
    response = llm.invoke(
        f"""You are a general wellness assistant.
        If the query involves symptoms, medications, or urgent health
        concerns, respond with HANDOFF: medical_agent.
        If you can handle it, respond normally.

        Conversation: {state.messages}"""
    ).content

    if "HANDOFF:" in response:
        target = response.split("HANDOFF:")[1].strip()
        state.current_agent = target
        state.messages.append({
            "role": "system",
            "content": f"Handed off to {target} with full context"
        })
    else:
        state.messages.append({"role": "assistant", "content": response})

    return state

def medical_agent(state: ConversationState) -> ConversationState:
    """Agent B: Specialized medical guidance with escalation."""
    response = llm.invoke(
        f"""You are a medical advisor with access to clinical guidelines.
        Review the FULL conversation history for context.
        If the situation is urgent, respond with HANDOFF: human_doctor.

        Full history: {state.messages}"""
    ).content

    if "HANDOFF:" in response:
        state.current_agent = "human_doctor"
        state.messages.append({
            "role": "system",
            "content": "Escalated to human doctor — urgent case"
        })
    else:
        state.messages.append({"role": "assistant", "content": response})

    return state

# Dispatch loop
agents = {
    "general": general_agent,
    "medical_agent": medical_agent,
}

def run_conversation(query: str):
    state = ConversationState(
        messages=[{"role": "user", "content": query}],
        current_agent="general",
        metadata={}
    )
    while state.current_agent in agents:
        state = agents[state.current_agent](state)
    return state

Framework	Implementation
LangGraph	Conditional edges with full state transfer
CrewAI	`allow_delegation=True` for autonomous delegation
AutoGen	`register_hand_off` in Swarm pattern (v0.4+)
Google ADK	`transfer_to_agent` with session state context
Anthropic Agent SDK	Subagents return to parent (no direct peer handoff)

8. Evaluator-Optimizer Loop

Generate multiple candidates, score them, and use the feedback to generate better ones. It’s evolution in action — each iteration gets closer to optimal.

When to use: You can define a clear scoring function, prompt optimization, query refinement, code optimization where you can measure performance.

When NOT to use: No clear evaluation criteria, you need a single quick answer, or generation cost is too high for multiple candidates.

Sample Use Case: An ad copy optimizer — the Generator creates 10 variations, the Evaluator scores each on readability, brand alignment, and CTA strength. The top 3 go back to the Generator with feedback. Repeat for 3 rounds and pick the winner.

# Evaluator-Optimizer Loop
from langchain_openai import ChatOpenAI
import json

generator_llm = ChatOpenAI(model="gpt-4o", temperature=0.9)
evaluator_llm = ChatOpenAI(model="gpt-4o", temperature=0)

def generate_candidates(brief: str, n: int = 5, feedback: str = "") -> list:
    """Generator: produce N candidate outputs."""
    context = f"Previous feedback: {feedback}\n" if feedback else ""
    response = generator_llm.invoke(
        f"""{context}Generate {n} different ad copy variations for:
        {brief}
        Return as JSON array of strings."""
    ).content
    return json.loads(response)

def evaluate(candidates: list, criteria: dict) -> list[dict]:
    """Evaluator: score each candidate on multiple criteria."""
    response = evaluator_llm.invoke(
        f"""Score each candidate on these criteria (1-10 each):
        {json.dumps(criteria)}

        Candidates: {json.dumps(candidates)}

        Return JSON array of objects with "text", "scores", "total", "feedback"
        Sorted by total score descending."""
    ).content
    return json.loads(response)

def evaluator_optimizer(brief: str, rounds: int = 3) -> str:
    """Run the full eval-optimize loop."""
    criteria = {
        "readability": "Clear, concise, easy to understand",
        "brand_voice": "Matches professional but friendly tone",
        "cta_strength": "Compelling call to action",
        "emotional_appeal": "Creates urgency or desire"
    }

    feedback = ""
    best_candidate = None

    for round_num in range(rounds):
        candidates = generate_candidates(brief, n=5, feedback=feedback)
        scored = evaluate(candidates, criteria)

        best = scored[0]
        best_candidate = best["text"]
        feedback = f"Top scorer ({best['total']}/40): {best['text']}\n"
        feedback += f"Improve on: {best['feedback']}"

        print(f"Round {round_num+1}: Best score = {best['total']}/40")

    return best_candidate

winner = evaluator_optimizer("Launch ad for an AI-powered code review tool")

Framework	Implementation
LangGraph	Cycles with scoring nodes and conditional exit
CrewAI	Task output validation + `guardrail` + `max_iter`
AutoGen	`register_nested_chats` for evaluation before reply
Google ADK	`LoopAgent` with evaluator `LlmAgent`
Anthropic Agent SDK	Custom scoring tool, Claude iterates through loop

Bonus: Emerging Patterns

Three patterns that are powerful in specific scenarios but less commonly deployed in production today.

Debate / Adversarial: Set up agents on opposing sides and let them challenge each other. A judge agent listens to both and makes the final call. Best for high-stakes decisions, fact verification, and red-teaming. Sample use case: an investment system where a Bull agent argues for, a Bear agent argues against, and a Judge synthesizes.

Multi-Agent Group Chat: Multiple agents sit in a shared conversation, each contributing from their expertise. Best for brainstorming and simulating cross-functional team discussions. Expensive — every agent reads every message. AutoGen’s GroupChat + GroupChatManager has the best native support here.

Mixture of Agents (MoA): Use multiple different models to generate diverse responses, then refine and aggregate. Different models have different strengths — together, they’re better than any one alone. Best for maximum accuracy when cost and latency aren’t constraints. Sample use case: a legal contract review where Claude, GPT-4, and Gemini each analyze from different angles, and an aggregator synthesizes.

The Decision Framework

Das closes with a simple decision framework that maps your core need to the right pattern:

Need	Pattern
Can one agent handle it?	Don’t use multi-agent. Keep it simple.
Specialization?	Orchestrator-Worker or Router
Speed?	Parallel Fan-Out
Accuracy?	Reflection, Debate, or Mixture of Agents
Adaptability?	Plan + Execute
Graceful escalation?	Handoff

Combining Patterns

In practice, production systems rarely use a single pattern in isolation. Das highlights common combinations: Router + Reflection (route to specialists, each with a quality loop), Orchestrator + Parallel + Reflection (decompose, fan out, each worker self-critiques, then aggregate), Plan + Execute + Debate (plan, execute, and use debate at critical decision points), and Router + Handoff + Human-in-the-Loop (classify, handle, escalate to humans if confidence is low).

Rubber-Ducking the Jargon

Orchestrator: A manager agent that decomposes tasks, delegates to workers, monitors progress, and reassembles results. Unlike a simple splitter, it reasons and adapts.

Fan-Out / Fan-In: Scatter-gather. Fan-out = split a task into parallel sub-tasks. Fan-in = collect and merge results. Borrowed from distributed systems terminology.

Reflection: An agent loop where output is reviewed and iteratively improved. The “reviewer” can be the same agent with a different prompt, or a separate specialized agent.

Router / Dispatch: A classification step that sends each input to exactly one specialist. Makes one decision and exits — no coordination, no aggregation.

Handoff: Mid-conversation transfer from one agent to another, including full context/state. Distinguished from routing (which happens upfront) by occurring after an agent realizes it can’t complete the task.

Mixture of Agents (MoA): Ensemble approach using multiple different LLMs, inspired by “Mixture-of-Agents Enhances Large Language Model Capabilities” (2024). Each model contributes unique strengths.

What to Watch Out For

This is a practitioner’s taxonomy, not an academic one. Das is writing from production experience, not from a formal analysis of multi-agent theory. The patterns are presented as clean categories, but real systems are messy — and as the “Combining Patterns” section acknowledges, you’ll usually blend two or three.

The framework comparisons are useful but will date quickly — all five frameworks are under active development. The article also doesn’t address cost analysis or failure modes in depth. Running multiple agents means multiplying API calls, and coordination failures (agents misunderstanding each other, infinite loops in reflection, routing errors) are real production concerns.

Finally, the article is a Medium blog post, not a peer-reviewed paper. It’s well-sourced and practical, but treat it as an experienced engineer’s field guide rather than a formal reference.

So What?

If you’re building AI systems beyond a single chatbot, this taxonomy gives you the vocabulary and decision framework to choose your architecture deliberately. The practical takeaways: start with the simplest pattern that could work (often a single agent), escalate to Pipeline or Router when you need structure, and only reach for Orchestrator-Worker or Plan + Execute when you genuinely need dynamic coordination. The most common mistake is over-engineering — reaching for multi-agent when good tool use and a clean single-agent flow would suffice.

The code examples for each pattern give you a starting template you can adapt to your framework of choice. And the decision framework on page 25 is worth printing out and sticking on your monitor.

Reproduction & Implementation

Environment Setup

# Core dependencies
pip install langchain langchain-openai langgraph

# Framework-specific (pick your stack)
pip install crewai           # CrewAI
pip install pyautogen        # AutoGen
pip install google-adk       # Google ADK

# Set your API key
export OPENAI_API_KEY="sk-..."

Pattern Selection Pseudo-Code

def select_pattern(requirements: dict) -> str:
    """Decision framework from the article."""
    if requirements.get("single_agent_sufficient"):
        return "No multi-agent needed"

    if requirements.get("fixed_ordered_stages"):
        return "Sequential / Pipeline"

    if requirements.get("diverse_input_types"):
        if requirements.get("one_specialist_per_input"):
            return "Router / Dispatch"

    if requirements.get("independent_subtasks"):
        if requirements.get("latency_critical"):
            return "Parallel Fan-Out → Fan-In"

    if requirements.get("dynamic_decomposition"):
        if requirements.get("realtime_coordination"):
            return "Orchestrator-Worker"
        else:
            return "Planning + Execution"

    if requirements.get("quality_iteration"):
        if requirements.get("clear_scoring_function"):
            return "Evaluator-Optimizer Loop"
        else:
            return "Reflection / Self-Critique"

    if requirements.get("escalation_tiers"):
        return "Handoff"

    if requirements.get("high_stakes_decisions"):
        return "Debate / Adversarial"

    return "Start with single agent + good tools"

Resources & Links

Original Article:

Multi-Agent AI Patterns for Developers — Suman Das (Medium, Apr 2026)

Framework Documentation:

Pattern Deep Dives:

Research Papers: