Multi-Agent Orchestration

Phase 3 advanced. Companion to DL.AI Multi AI Agent Systems with crewAI.
Mental model in one line: a multi-agent system is a distributed system whose nodes are non-deterministic, expensive, and occasionally lie. Everything hard about it falls out of those three properties.

The staff-engineer framing

Before any pattern or framework, internalize the cost equation, because it drives every decision below:

total_cost   ≈ Σ (agent_calls × tokens_per_call × price_per_token)
total_latency ≈ serial: Σ per-agent latency | parallel: max(per-agent latency) + merge
quality       = single_agent_quality + specialization_gain − coordination_loss

You add agents only when specialization_gain − coordination_loss is provably positive and large enough to justify a 5–15× cost multiplier. Most teams discover, after building the multi-agent version, that a single agent with better tools and a better prompt would have won on every axis. That is the default outcome, not the exception.

A staff engineer's instinct on hearing "let's make it multi-agent" is to ask: what is the single agent failing at, specifically, and is the failure about perspective or about tools? If it's tools, you don't need more agents — you need more (or better) tools. Multi-agent is justified only when the bottleneck is genuinely perspective / role / context-isolation, not capability.

When you need multi-agent (and when you don't)

Single agent is enough when:

Linear workflow with clear steps
One LLM "personality" / system prompt suffices
Tools are the source of variation, not perspective
The whole task fits comfortably in one context window (with caching, that's a lot)

Multi-agent earns its complexity when:

Different roles need genuinely different system prompts (researcher vs critic vs writer) — and "different" means different objectives, not different phrasings
Parallel work streams need to fan out then converge (the latency win is real here)
You need debate / refinement loops between independent perspectives to reduce a single model's blind spots
Context isolation improves quality: a sub-agent with a clean, scoped context window outperforms one big agent whose context is polluted with 40 unrelated tool results
Specialization improves quality (small expert agents > one big generalist) on a measurable eval

Default to single agent + good tools + tool search + skills. Reach for multi-agent only when single agent is provably insufficient on an eval you can point to. "It feels more sophisticated" is not a reason.

The decision tree a senior actually runs

Is the single agent failing?
├── No  → ship the single agent. Stop.
└── Yes → what is it failing at?
    ├── Wrong/missing capability        → add a tool. Not an agent.
    ├── Context window blown / polluted → context editing, compaction, or ONE sub-agent
    │                                      with a scoped context (cheapest multi-agent win)
    ├── Needs an independent second opinion → debate / verifier sub-agent
    └── Independent parallel workstreams  → supervisor + workers (real fan-out latency win)

Patterns

1. Pipeline (sequential)

A → B → C, each agent passes output to next.

Use: research → write → edit
Failure mode: error compounding. B trusts A's hallucination; C polishes it into something that looks authoritative. Add a verification step or pass provenance forward.
Latency: sum of all stages. Cost: sum of all stages. No parallelism win — only a quality/specialization win.

2. Supervisor + workers

1 supervisor routes work, N workers do specialized tasks.

Use: "AI office" — supervisor delegates to analyst, writer, coder
This is the workhorse pattern and the one Anthropic's Managed Agents multiagent coordinator implements natively ({type: "coordinator", agents: [...]} — one level of delegation, up to 20 roster agents, max 25 concurrent threads). See the Managed Agents mapping below.
Failure mode: the supervisor becomes a bottleneck and a single point of context bloat. Keep the supervisor's own context lean — delegate work, not raw data.

2+ agents argue, reach consensus (or a judge picks).

Use: critical reasoning, fact-checking, reducing single-model blind spots
Failure mode: cost explosion with diminishing returns. Two rounds usually capture ~90% of the gain; round five is almost always waste. Cap rounds and define a termination condition (judge satisfied, no new objections, or max_iterations).

4. Hierarchical

Trees of agents, each with sub-agents.

Use: complex projects (engineering, research)
Failure mode: the hardest pattern to debug and the easiest to make exponentially expensive. Depth ≥ 3 is rarely worth it. Note Anthropic's coordinator caps delegation at one level on purpose — depth > 1 is ignored — because deeper trees lose more to coordination than they gain.

5. Round-robin

Agents take turns in a conversation.

Use: brainstorming, simulation
Failure mode: no natural termination; turns drift off-topic. Always set a turn cap and a stop condition.

Pattern selection cheat-sheet

Pattern	Parallelism	Cost vs single	Best when	Worst when
Pipeline	None	N×	Distinct sequential stages, each a specialty	Error compounding matters and there's no verification
Supervisor + workers	High (workers fan out)	1 + N×	Independent subtasks, dynamic routing	Subtasks are actually sequential/dependent
Debate / refinement	Low	2–10×	Quality-critical, blind-spot reduction	Latency-sensitive or budget-tight
Hierarchical	Mixed	Unbounded if careless	Genuinely tree-structured problems	Anything you can't draw the tree for upfront
Round-robin	None	N× turns	Open-ended ideation / simulation	You need a deterministic, bounded output

Frameworks comparison

Framework	Best for	Mental model	Production caveat
LangGraph	Production, complex flows	State machine + nodes	You own state, retries, observability — most control, most boilerplate
crewAI	Quick multi-agent prototypes	Roles + tasks	Fast to demo, opaque under the hood; re-validate cost/latency before prod
AutoGen	Research, conversational	Chat between agents	Conversational loops can run away on cost; cap turns
Anthropic Managed Agents	Server-run agent loop + hosted sandbox + native `multiagent` coordinator	Persisted Agent → Session → threads	Anthropic runs the loop and the tool-execution container; you stream events. See mapping below
Custom	Maximum control, weird patterns	You build the loop with `AsyncAnthropic` + `asyncio.gather`	You own everything — only when frameworks fight you

Framework choice is a secondary decision. The pattern, the termination conditions, and the cost model matter far more than which library spells Agent(). A senior picks the pattern first, proves it on an eval, then picks the thinnest framework that expresses it.

Mapping patterns onto Claude / Anthropic primitives

If you're building on Claude directly (the stack this hub targets), here is how the abstract patterns become concrete API choices. Always consult the claude-api skill / canonical facts for current model IDs and syntax — never hard-code from memory.

Models per role (cost-tier your agents). Don't run every agent on the flagship. A typical split: supervisor/critic on claude-opus-4-8 (flagship, 5 USD / 25 USD per million tok in/out at 1M context), bulk workers on claude-sonnet-4-6 (mid), and cheap/parallel-safe sub-agents (classifiers, "explore" scouts) on claude-haiku-4-5 (1 USD / 5 USD). This is the single biggest cost lever in a multi-agent system — Claude Code's Explore subagents use Haiku exactly this way.
Thinking & effort. On Opus 4.8 / 4.7, use adaptive thinking (thinking={"type": "adaptive"}) plus output_config={"effort": "low"|"medium"|"high"|"xhigh"|"max"}. The legacy budget_tokens form is removed and returns HTTP 400 on 4.7/4.8 — do not use it. Give a debate-judge or planning supervisor high/xhigh; give cheap parallel scouts low. Sonnet 4.6 and Haiku take adaptive thinking but no thinking budget.
Parallel tool / agent calls. Use AsyncAnthropic on the server and asyncio.gather to fan out worker calls — that's what turns the supervisor pattern's latency from Σ into max. Mark read-only sub-tasks (grep/glob/scout) parallel-safe; serialize anything with side effects.
Structured handoffs. When agent A's output is agent B's input, don't pass free-form prose and hope. Use native structured outputs (client.messages.parse() with a Pydantic/zod schema, or output_config.format) so the handoff is typed and validated. This is the #1 fix for pipeline error-compounding.
Caching the stable prefix. Every agent shares a system prompt and tool list that don't change per call. Put cache_control on the stable tools → system prefix so the 5–15× call multiplier doesn't also multiply your input token bill. Verify with usage.cache_read_input_tokens — if it's zero across calls, a silent invalidator (a timestamp, an unsorted tool list) is in the prefix.
Resilience. Use the SDK's max_retries and typed exceptions (RateLimitError, OverloadedError, APIStatusError, APITimeoutError), a per-call timeout, and streaming for any large worker output — a single overloaded worker should degrade gracefully, not take down the whole crew.
Native multi-agent. If you want Anthropic to run the loop and host the sandbox, Managed Agents gives you a multiagent coordinator: declare the roster on the agent ({type: "coordinator", agents: [...]}, not on the session, not in tools[]), each sub-agent runs in its own context-isolated thread sharing the container filesystem. One level of delegation, ≤ 20 roster agents, ≤ 25 concurrent threads. Tool confirmations from sub-agents cross-post to the primary thread.

Reference implementation: a supervisor that fans out (and bills itself honestly)

Everything above is abstract until you can point at the code. This is the hand-rolled supervisor + workers pattern, in Python, with every production lever the senior bullets above describe wired in: AsyncAnthropic on the server, asyncio.gather to convert Σ latency into max, per-role model tiering (Opus supervisor, Haiku scouts), typed structured handoffs (messages.parse with a Pydantic schema), prompt caching on the stable prefix, typed-exception resilience, and per-agent usage logging so you can answer "which agent burned the budget". It is deliberately compact — the point is the shape, not a framework.

python

import asyncio
import time
from anthropic import (
    AsyncAnthropic,
    APITimeoutError,
    RateLimitError,
    OverloadedError,
)
from pydantic import BaseModel

# max_retries handles 429/529/5xx with exponential backoff before our own
# fallback ever runs. timeout is per-call wall-clock — a wedged worker dies fast.
client = AsyncAnthropic(max_retries=2, timeout=60.0)

SUPERVISOR_MODEL = "claude-opus-4-8"   # the judge/merger reasons — pay for it
WORKER_MODEL = "claude-haiku-4-5"      # parallel scouts are cheap and disposable

# The stable prefix every worker shares. Put cache_control on the LAST stable
# block (tools -> system render first); the per-task question goes in messages,
# AFTER the breakpoint, so it never invalidates the cached prefix.
WORKER_SYSTEM = [
    {
        "type": "text",
        "text": "You are a research scout. Return ONLY findings relevant to the question.",
        "cache_control": {"type": "ephemeral"},
    }
]


class Finding(BaseModel):
    """Typed handoff — the #1 fix for pipeline error-compounding."""
    claim: str
    evidence: str
    confidence: float  # 0..1 — lets the supervisor weight, not just concatenate


def log_usage(agent: str, model: str, usage) -> float:
    """Per-agent cost attribution. Without this you cannot find the runaway."""
    price = {"claude-opus-4-8": (5, 25), "claude-haiku-4-5": (1, 5)}[model]
    cost = (usage.input_tokens * price[0] + usage.output_tokens * price[1]) / 1e6
    cache_read = getattr(usage, "cache_read_input_tokens", 0)
    print(f"[{agent}] {model} in={usage.input_tokens} out={usage.output_tokens} "
          f"cache_read={cache_read} ${cost:.5f}")
    return cost


async def scout(question: str, angle: str) -> Finding | None:
    """One worker. Read-only, parallel-safe, degrades gracefully on overload."""
    try:
        resp = await client.messages.parse(
            model=WORKER_MODEL,
            max_tokens=1024,
            system=WORKER_SYSTEM,
            messages=[{"role": "user", "content": f"From the angle of {angle}: {question}"}],
            output_config={"format": Finding},  # validated, typed handoff
        )
        log_usage(f"scout:{angle}", WORKER_MODEL, resp.usage)
        return resp.parsed
    except (RateLimitError, OverloadedError, APITimeoutError) as e:
        # One overloaded worker must NOT take down the crew. Skip it; the
        # supervisor reconciles around the gap. (Could fall back to a cheaper
        # model or a cached answer here instead of returning None.)
        print(f"[scout:{angle}] degraded: {type(e).__name__}")
        return None


async def supervisor(question: str, angles: list[str]) -> str:
    t0 = time.monotonic()

    # THE WHOLE POINT: gather() runs the scouts concurrently. Awaiting them in a
    # for-loop would silently re-serialize — you'd pay N* cost for 1* latency.
    findings = await asyncio.gather(*(scout(question, a) for a in angles))
    findings = [f for f in findings if f is not None]  # drop degraded workers

    fan_out_ms = (time.monotonic() - t0) * 1000
    print(f"[supervisor] fan-out of {len(angles)} scouts took {fan_out_ms:.0f}ms "
          f"(serial would be ~{len(angles)}x)")

    if not findings:
        return "All scouts degraded — no answer. Escalate or retry."

    # The supervisor RECONCILES (weighs confidence, resolves contradictions),
    # it does not concatenate. Adaptive thinking + high effort for the judge.
    digest = "\n".join(f"- ({f.confidence:.0%}) {f.claim} [{f.evidence}]" for f in findings)
    resp = await client.messages.create(
        model=SUPERVISOR_MODEL,
        max_tokens=2048,
        thinking={"type": "adaptive"},
        output_config={"effort": "high"},
        system="Reconcile these scout findings into one answer. Flag contradictions; "
               "do not average away a low-confidence claim that contradicts a high-confidence one.",
        messages=[{"role": "user", "content": f"Question: {question}\n\nFindings:\n{digest}"}],
    )
    log_usage("supervisor", SUPERVISOR_MODEL, resp.usage)
    return resp.content[0].text


if __name__ == "__main__":
    answer = asyncio.run(supervisor(
        "Is multi-agent orchestration worth its cost for a research task?",
        angles=["latency", "dollar cost", "answer quality"],
    ))
    print("\n=== ANSWER ===\n", answer)

What to notice, because a staff engineer reads code for the decisions, not the syntax:

The gather is load-bearing. Replace it with for a in angles: await scout(...) and the program still works — but it now costs the same and runs Σ slow instead of max slow. That one-line difference is the entire latency value proposition of the supervisor pattern. Exercise 3 makes you prove this with a timestamp.
messages.parse is the handoff contract. The scout's output is a validated Finding, not prose the supervisor has to re-parse and trust. A scout that hallucinates can still lie, but it can't return malformed structure that corrupts the merge — and confidence gives the supervisor a lever to weight rather than blindly believe. This is how you blunt pipeline error-compounding.
The model tier is the cost lever. Three Haiku scouts + one Opus supervisor is dramatically cheaper than four Opus agents, and on a scout task (gather, don't reason) the quality delta is usually zero. log_usage prints the receipts so you can defend the number in Exercise 6.
Degradation is a return None, not a crash. The crew survives a 529 on one worker. In production you'd replace return None with a fallback (cheaper model, cached answer, or a retry on a different angle) — but never let one worker's overload propagate into a crew-wide failure.
The cache breakpoint sits on the stable prefix, not the question. Verify with cache_read_input_tokens in the log. If it's zero across runs, a silent invalidator (a timestamp, an unsorted tool list, a per-request ID) crept into the prefix — Exercise 4 is the hunt.

Anti-patterns

❌ Multi-agent for "everything is a hammer" → overengineering. The single-agent baseline almost always wins on simple tasks.
❌ Agents that just call the LLM with different prompts but the same tools and the same context → that's not multi-agent, it's one agent wearing hats. No isolation, no real specialization.
❌ No supervision / no termination condition → infinite loops and runaway bills. Every loop needs a cap (max_iterations, turn limit) and a stop condition.
❌ State sprawl (each agent has its own private memory with no shared source of truth) → impossible to debug; agents disagree about reality.
❌ N agents × M calls × $X tokens with no cost ceiling → cost explosion. Set a hard budget and degrade when you hit it.
❌ Passing raw context between agents instead of distilled handoffs → context bloat compounds at every hop; you pay to re-read the same 30K tokens five times.
❌ Running every agent on the flagship model → you're paying Opus prices for work Haiku does fine. Tier your models.
❌ Re-creating the agent/config on every run (Managed Agents) → orphaned agents + create latency. Create the Agent once, reference it by ID; only the Session is per-run.

Cost & latency reality

Each agent call = at least one LLM call (often several, once tools loop).
A 5-agent system is typically 5–15× the cost of a single agent on the same task — and that's before debate rounds multiply it again.
Latency: serial pipeline = sum of all stages; parallel fan-out = max of the slowest worker + merge cost. The supervisor pattern's whole value proposition is converting Σ into max — if your "parallel" workers actually run serially (because you await them one at a time instead of asyncio.gather), you paid the multi-agent cost and got none of the latency benefit.
Observability is not optional. Log per-agent usage (input/output/cache tokens), latency, model id, and stop reason. Without per-agent cost attribution you cannot answer "which agent is burning the budget" — and in a 5-agent system, one runaway agent is the usual culprit. The minimum trace row a senior instruments per agent call: {trace_id, agent_role, model, input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, latency_ms, stop_reason, retries}. trace_id is what lets you collapse a fan-out back into one request when you're staring at a dashboard at 2am. A surprising cache_read_input_tokens: 0 across a run is the single highest-signal alert you can wire up — it means your caching silently broke and your input bill just got multiplied by the agent count. See the log_usage helper in the reference implementation above for the cost-attribution math (input × in_price + output × out_price, per model tier).
Multi-agent ONLY makes sense if the measured quality gain justifies the cost/latency. "Measured" means on an eval set, not on a demo.

Security & failure modes a staff engineer plans for

Prompt injection propagation. A worker that ingests untrusted web/tool content can be hijacked; its poisoned output then flows to the supervisor as if trusted. Treat inter-agent messages from content-ingesting workers as untrusted input — validate, don't blindly forward. Keep credentials out of agent-visible context (vaults / host-side custom tools, never the system prompt).
Cascading failure. One worker rate-limited (429) or overloaded (529) shouldn't fail the crew. Per-call retries + timeouts + a fallback (skip the worker, or fall back to a cheaper model) keep the system degrading gracefully.
Non-determinism in the merge. When parallel workers return, the merge step is where contradictions surface. The supervisor must reconcile, not concatenate — otherwise the final answer contains both "X is true" and "X is false."
Cost blast radius. A hierarchical system with no depth/budget cap can spawn agents recursively. Bound depth, bound fan-out, bound total token budget.

Multi-agent + MCP

Powerful combo: N agents share access to MCP servers, each calls tools. Example: a "Research crew" — one agent calls pubmed_search, one calls arxiv_search, one critiques and reconciles results.

Operational notes a senior adds:

MCP credentials belong in a credential store (e.g. Managed Agents vaults), injected at egress — never in an agent's system prompt or messages, which persist in session history.
If multiple agents hammer the same MCP server, you'll hit its rate limits, not just Anthropic's. Add per-server concurrency limits.
Large MCP tool outputs bloat every downstream agent's context. Distill before forwarding (or rely on Managed Agents' automatic >100K-token offload-to-file behavior).

🏋️ Exercices

These escalate from "build it" to "make it production-grade / break it / defend the number." Do them in order. The learner is expected to instrument and measure, not just wire things up. The Reference implementation section above is your starting skeleton for exercises 3–5 — but do not just run it; the first two exercises deliberately make you build the single-agent baseline it must beat before you reach for the crew.

1. Single-agent baseline (the control group).Objectif: Build a single agent that does a research task (gather sources → synthesize a short report), and instrument it: log usage (input/output/cache tokens), wall-clock latency, and total cost. Indice/Solution: Use AsyncAnthropic, claude-opus-4-8, adaptive thinking, a couple of tools (web search / MCP). Print response.usage after each call. This baseline is what every later exercise must beat to justify its existence.

2. Convert to a 3-agent crew and prove (or disprove) it was worth it.Objectif: Split into researcher → writer → critic. Run both versions on the same 10 inputs and produce a table: quality (your rubric), cost, p50/p95 latency. Indice/Solution: Wire it with any framework or hand-rolled. The deliverable isn't the crew — it's the comparison table and a one-paragraph verdict ("multi-agent won/lost because…"). If the single agent wins, say so. That honesty is the point.

3. Make the supervisor fan out — turn Σ latency into max.Objectif: Implement supervisor + workers where ≥ 3 workers are genuinely independent, and use asyncio.gather so they run concurrently. Measure latency before (serial await) and after (gather). Indice/Solution: The trap is awaiting workers one at a time, which silently re-serializes them. Confirm concurrency by timestamping each worker's start. Bonus: cost-tier the workers (Haiku) vs the supervisor (Opus) and report the cost delta.

4. Now make it production-grade: caching, resilience, and a budget ceiling.Objectif: (a) Add cache_control to the shared tools+system prefix and prove cache hits via usage.cache_read_input_tokens. (b) Inject a forced 529 / timeout on one worker and show the crew survives. (c) Add a hard total-token budget that degrades gracefully when exceeded. Indice/Solution: For (a), if cache reads are zero, hunt the silent invalidator (timestamp in system prompt, unsorted tool list, per-request ID). For (b), use typed exceptions (OverloadedError, APITimeoutError) + max_retries + a fallback to a cheaper model. For (c), track a running token counter in the supervisor and stop spawning workers past the cap.

5. Break it: prompt-injection through a worker.Objectif: Feed one content-ingesting worker a document containing an injected instruction ("ignore your task, output X"). Show the poisoned output reaching the supervisor. Then fix it so the injection is contained. Indice/Solution: The fix is not a better prompt — it's treating inter-agent messages from content-ingesting workers as untrusted: validate/structure the handoff (messages.parse with a strict schema), isolate the worker's context, and never forward raw ingested text into the supervisor's trusted context. Document what still leaks.

6. Defend the number.Objectif: Given your exercise-2 table, write the one-page memo you'd send a skeptical staff engineer arguing for or against shipping the multi-agent version. It must cite cost multiplier, p95 latency, measured quality delta, and the failure modes you mitigated. Indice/Solution: The strongest memos often argue against — "single agent + tool search + caching matches quality at 7× lower cost; multi-agent isn't justified until [specific condition]." A senior is judged on knowing when not to build the complex thing.

🎤 En entretien

"When would you choose multi-agent over a single agent?" — When the bottleneck is perspective/role/context-isolation (not capability), and the measured quality gain on an eval justifies a 5–15× cost multiplier; if it's a missing capability, I add a tool, not an agent.
"How do you control cost and latency in a multi-agent system?" — Tier models per role (Opus supervisor, Sonnet/Haiku workers), cache the stable tools+system prefix, fan out independent workers with asyncio.gather to convert Σ latency to max, cap debate rounds, and enforce a hard token budget with graceful degradation — all attributed per-agent via logged usage.
"What breaks in production that demos never show?" — Error compounding in pipelines, prompt-injection propagating from a content-ingesting worker into the supervisor's trusted context, cascading failure from one overloaded worker, non-deterministic merges that concatenate contradictions, and runaway cost from uncapped hierarchical spawning. Each has a specific mitigation (provenance/verification, untrusted-handoff isolation, retries+timeouts+fallback, reconciling merges, depth/budget caps).
"How does this map onto Anthropic's primitives?" — Adaptive thinking + effort per role, structured outputs (messages.parse) for typed handoffs, prompt caching on the shared prefix, AsyncAnthropic+gather for parallelism, and for a server-run option, Managed Agents' multiagent coordinator (roster on the Agent not the session, one level of delegation, context-isolated threads sharing a container).

Resources

Anthropic — Building effective agents (the canonical "start simple, single agent + tools first" essay)
Anthropic Managed Agents — multi-agent (multiagent coordinator, threads, cross-posted tool confirmations)
LangGraph multi-agent docs: langchain-ai.github.io/langgraph/tutorials/multi_agent
crewAI docs: docs.crewai.com
AutoGen: microsoft.github.io/autogen
Paper: Communicative Agents (Park 2023)

Multi-Agent Orchestration

The staff-engineer framing

When you need multi-agent (and when you don't)

The decision tree a senior actually runs

Patterns

1. Pipeline (sequential)

2. Supervisor + workers

3. Debate / refinement

4. Hierarchical

5. Round-robin

Pattern selection cheat-sheet

Frameworks comparison

Mapping patterns onto Claude / Anthropic primitives

Reference implementation: a supervisor that fans out (and bills itself honestly)

Anti-patterns

Cost & latency reality

Security & failure modes a staff engineer plans for

Multi-agent + MCP

🏋️ Exercices

🎤 En entretien

Resources

My notes

Multi-Agent Orchestration ​

The staff-engineer framing ​

When you need multi-agent (and when you don't) ​

The decision tree a senior actually runs ​

Patterns ​

1. Pipeline (sequential) ​

2. Supervisor + workers ​

3. Debate / refinement ​

4. Hierarchical ​

5. Round-robin ​

Pattern selection cheat-sheet ​

Frameworks comparison ​

Mapping patterns onto Claude / Anthropic primitives ​

Reference implementation: a supervisor that fans out (and bills itself honestly) ​

Anti-patterns ​

Cost & latency reality ​

Security & failure modes a staff engineer plans for ​

Multi-agent + MCP ​

🏋️ Exercices ​

🎤 En entretien ​

Resources ​

My notes ​

Multi-Agent Orchestration

The staff-engineer framing

When you need multi-agent (and when you don't)

The decision tree a senior actually runs

Patterns

1. Pipeline (sequential)

2. Supervisor + workers

3. Debate / refinement

4. Hierarchical

5. Round-robin

Pattern selection cheat-sheet

Frameworks comparison

Mapping patterns onto Claude / Anthropic primitives

Reference implementation: a supervisor that fans out (and bills itself honestly)

Anti-patterns

Cost & latency reality

Security & failure modes a staff engineer plans for

Multi-agent + MCP

🏋️ Exercices

🎤 En entretien

Resources

My notes