Step 8: Multi-Agent Systems — AI Agents Tutorial

出發點

一、什麼時候你「真的」需要多 Agent？

2024 年 CrewAI 與 AutoGen 帶起「multi-agent」風潮，但 OpenAI 自家 2024 年的 Swarm 報告與 Anthropic 2025 年的 multi-agent best practices 都先打了預防針：絕大多數任務應該先用單一 agent 跑通，再考慮拆多個。

多 agent 的代價：

溝通成本：每次 hand-off 都是 LLM 呼叫 + token 開銷
錯誤傳播：A agent 的錯誤輸出變成 B agent 的輸入
除錯難度：trace 跨多個 agent，定位 bug 困難
一致性：不同 agent 可能對同一事實給出矛盾結論

但有四種場景確實值得拆：(1) 需要不同領域的「專家 persona」；(2) 需要平行處理子任務；(3) 需要不同信任等級隔離 (一個受限制、一個高權限)；(4) 需要明確的「審查 / 批准」階段。

CrewAI and AutoGen lit up the "multi-agent" trend in 2024, but both OpenAI's own 2024 Swarm report and Anthropic's 2025 multi-agent guide caution: most tasks should run on a single agent first, only split when justified.

The price of going multi-agent:

Communication overhead: each hand-off is an LLM call + tokens
Error propagation: agent A's bad output becomes agent B's input
Debug difficulty: traces span multiple agents
Consistency: different agents may state contradictory facts

Four scenarios actually justify splitting: (1) distinct domain "personas" needed; (2) parallel subtasks; (3) trust-level isolation (one restricted, one privileged); (4) explicit "review / approval" gates.

四大協作模式

二、四種主流多 Agent 模式

👑

① Supervisor / Orchestrator

有一個「主管 agent」決定把任務分派給哪個專家 agent，並彙整最終答案。每個 worker agent 只看到自己的子任務。

適合：任務可清楚分派的場景。LangGraph 與 OpenAI Agents SDK 的預設架構。

A "manager agent" routes tasks to specialist workers and aggregates results. Each worker only sees its sub-task.

Best for: clearly delegatable tasks. The default architecture in LangGraph and the OpenAI Agents SDK.

🐝

② Swarm / Hand-off

沒有主管。每個 agent 知道「我能做什麼，做不到時該交給誰」。對話像接力棒在 agent 間傳遞。

適合：客服、票務分流。OpenAI Swarm（2024）與 Agents SDK 都採此模式。

No manager. Each agent knows "what I do, who to hand off to when stuck." The conversation gets passed like a relay baton.

Best for: customer support, ticket routing. The model behind OpenAI Swarm (2024) and the Agents SDK.

🗣️

③ Debate / Group Chat

多個 agent 同時在「群組」中討論。一個 selector 決定誰下一個發言。可用「正方 vs 反方」結構提高決策品質。

適合：高風險決策、需要多視角的創意任務。AutoGen / AG2 的 GroupChat 模式。

Multiple agents in a shared "group chat"; a selector decides who speaks next. A "for vs against" structure can improve decision quality.

Best for: high-stakes decisions, multi-perspective creative work. AutoGen / AG2's GroupChat pattern.

🪜

④ Hierarchical

多層 supervisor。例如 CEO agent → 部門主管 agents → 員工 agents。階層越深越能處理複雜任務，但溝通成本爆炸。

適合：企業流程自動化、研究類大型 agent（如 AutoGPT 後繼）。

Stacked supervisors — CEO agent → department managers → workers. Depth scales complexity but communication overhead explodes.

Best for: enterprise workflow automation, research-style large agents (AutoGPT successors).

實作：Supervisor

三、Supervisor Pattern 範例

# pip install langgraph langchain-anthropic
from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
from typing import TypedDict, Literal

class State(TypedDict):
    query: str
    next: Literal["researcher","writer","reviewer","end"]
    notes: str
    draft: str
    final: str

llm = ChatAnthropic(model="claude-sonnet-4-6")

def supervisor(s: State):
    prompt = f"""You manage a 3-agent team: researcher, writer, reviewer.
    Task: {s['query']}
    State: notes={bool(s.get('notes'))}, draft={bool(s.get('draft'))}
    Decide next: researcher / writer / reviewer / end. Reply ONE word."""
    return {"next": llm.invoke(prompt).content.strip().lower()}

def researcher(s): return {"notes": llm.invoke(f"Research: {s['query']}").content}
def writer(s):     return {"draft": llm.invoke(f"Write using notes: {s['notes']}").content}
def reviewer(s):   return {"final": llm.invoke(f"Polish: {s['draft']}").content}

g = StateGraph(State)
g.add_node("supervisor", supervisor)
g.add_node("researcher", researcher)
g.add_node("writer", writer)
g.add_node("reviewer", reviewer)
g.set_entry_point("supervisor")
g.add_conditional_edges("supervisor", lambda s: s["next"],
    {"researcher":"researcher", "writer":"writer", "reviewer":"reviewer", "end": END})
for n in ["researcher","writer","reviewer"]: g.add_edge(n,"supervisor")

app = g.compile()
result = app.invoke({"query":"Write a 200-word brief on CRISPR base editing"})

# pip install crewai
from crewai import Agent, Task, Crew, Process

researcher = Agent(role="Researcher", goal="Gather facts", backstory="PhD in molecular biology.")
writer     = Agent(role="Writer",     goal="Compose readable prose", backstory="Science journalist.")
reviewer   = Agent(role="Reviewer",   goal="Verify accuracy", backstory="Editor-in-chief.")

t1 = Task(description="Research CRISPR base editing", agent=researcher, expected_output="3 bullets")
t2 = Task(description="Write 200-word brief",       agent=writer,     expected_output="draft")
t3 = Task(description="Polish for accuracy",       agent=reviewer,   expected_output="final")

crew = Crew(agents=[researcher,writer,reviewer], tasks=[t1,t2,t3], process=Process.sequential)
print(crew.kickoff())

實作：Hand-off

四、Swarm 的 Hand-off 設計

Hand-off 不是「把任務丟給下一個 agent」這麼簡單，需要設計：

什麼時候 hand off（trigger conditions）
把什麼 context 傳過去（不要全傳，會炸 token）
下一個 agent 知道是誰交給它的（避免無限互踢）
失敗 fallback（沒人接 = 找 human）

OpenAI Agents SDK 把 hand-off 當成「特殊工具」：每個 agent 的 tools 中包含「transfer_to_X」函式。LLM 自己決定何時呼叫。

Hand-off isn't just "throw the task to the next agent." Design needs:

When to hand off (trigger conditions)
What context to pass (not everything — token blowup)
Recipient knows who handed off (avoid infinite ping-pong)
Failure fallback (no taker → human)

The OpenAI Agents SDK models hand-off as a "special tool": each agent's tool list contains a "transfer_to_X" function — the LLM decides when to call it.

User ─► [Triage Agent] │ ┌──────────┼──────────┐ ▼ billing ▼ tech ▼ angry [Billing] [Tech] [Retention] │ │ │ └─ all can ─► [Human escalation]

互動模擬

五、多 Agent 路由模擬

選擇一個使用者問題，觀察 triage agent 如何 hand off：

Pick a user query and watch the triage agent hand off:

陷阱與最佳實踐

六、多 Agent 系統的五大陷阱

① 無止盡的乒乓

A 把任務丟 B，B 又丟回 A。對策：記錄 hand-off 鏈，禁止反向轉移；hand-off 次數上限。

A passes to B, B back to A. Fix: log the hand-off chain, forbid reverse; cap total hand-offs.

② Context 雪崩

每次 hand-off 都把完整 history 傳下去 → 第 5 個 agent 時 context 已破 100K。對策：每次 hand-off 只傳「相關摘要 + 新任務」。

Every hand-off forwards full history → by agent 5 context exceeds 100K. Fix: pass only "relevant summary + new task."

③ 一致性破裂

研究 agent 引用了「BRCA1 致病」，寫稿 agent 引用了不同論文寫「BRCA1 致病性不確定」。對策：共享 vector memory 作 single source of truth。

Researcher cites "BRCA1 pathogenic"; writer cites a different paper saying "uncertain." Fix: shared vector memory as single source of truth.

④ 權限混亂

某 agent 不該有 write_db 權限，卻透過 hand-off 拿到。對策：per-agent ACL；hand-off 時權限不繼承，需重新請求。

An agent that shouldn't have write_db inherits it via hand-off. Fix: per-agent ACLs; permissions don't pass with hand-offs.

🚫

反模式：「先做 10 個 agent 看看會發生什麼」。Anthropic 2025 multi-agent best practices 建議：先用一個 agent + 良好分工的 prompt，跑不通再拆；每多一個 agent 都要說得出明確理由。 Anti-pattern: "Let's spin up 10 agents and see." Anthropic's 2025 multi-agent guide: start with one well-prompted agent; only split when you can name a concrete reason; every additional agent must be justified.

🎓 章節小測

Q1. 下列哪一項不是多 agent 系統的常見代價？

Q1. Which is not a typical cost of multi-agent systems?

A) 溝通成本

B) 錯誤傳播

C) 除錯難度增加

D) 模型本身會變笨

✅ 模型本身不會變笨；變笨的是「整體系統」因協調摩擦而退化。✅ The model itself doesn't degrade — the system degrades from coordination friction.

Q2. Supervisor pattern 的核心特性是？

Q2. The core property of the Supervisor pattern?

A) 主管 agent 分派任務

B) 所有 agent 同時討論

C) 沒有任何中心

D) 兩個 agent 互相辯論

✅ Supervisor 是顯式中心；其他選項對應 Debate / Swarm。✅ Supervisor is the explicit center; other options match Debate / Swarm.

Q3. 防止「無止盡 ping-pong」最直接的對策是？

Q3. The most direct fix for infinite ping-pong hand-offs?

A) 換更大的 LLM

B) 記錄 hand-off 鏈、禁上限

C) 把 temperature 設成 1.5

D) 增加更多 agent

✅ 直接以系統規則切斷反覆轉移最有效。✅ Hard system rules to break loops are the most reliable fix.

← Step 7RAG 檢索增強 Step 9 →Agent 評估