Step 10: Framework Comparison — AI Agents Tutorial

出發點

一、要不要用框架？

2025 年 Anthropic 工程部落格寫過一篇有名的文章「Building effective agents」：對絕大多數任務，純 Python while loop + LLM API + 一份工具註冊表就夠了。框架的價值在於：

提供已測試的設計模式（避免重造輪子）
觀測性 / 持久化 / checkpoint 已內建
多 agent 的 hand-off / 訊息協定已標準化
能畫出圖、視覺化 debug

但框架也有代價：

抽象層讓除錯變難（「為什麼這個 prompt 變成這樣？」）
升級風險（LangChain 0.x → 0.y 的破壞性改動）
過度工程化簡單任務

本章對比六大主流選項，幫你做出有依據的選擇。

Anthropic's famous 2025 engineering post "Building effective agents" argues that for most tasks, a plain Python while-loop + LLM API + a tool registry is enough. The value of a framework is:

Battle-tested patterns (no reinventing wheels)
Built-in observability / persistence / checkpointing
Standardized multi-agent hand-off & messaging
Visualizable graphs for debugging

The cost:

Abstractions make debugging harder ("why did this prompt mutate?")
Upgrade risk (LangChain 0.x → 0.y breaking changes)
Overkill for simple tasks

This chapter compares the six mainstream options to inform a grounded choice.

六大框架

二、2026 主流 Agent 框架對照表

框架	主打	心智模型	多 agent	觀測性	學習曲線	2026 狀態
LangGraph	高可控、複雜流程	Maximum control, complex flows	有向狀態圖	Directed state graph	★★★★★	★★★★★ (LangSmith)	★★★★★ (LangSmith)	中等偏陡	Steep-ish	🚀 生產首選	🚀 production king
CrewAI	快速組「角色團隊」	Quick role-based teams	role + task DSL	role + task DSL	★★★★	★★★	★ 最低	★ lowest	🌱 快速 prototype	🌱 fast prototyping
AutoGen / AG2	會話式多 agent、群聊	Conversational multi-agent, group chat	GroupChat + Speaker	GroupChat + Speaker	★★★★	★★★	中	Moderate	⚠️ Microsoft AutoGen 維護中、AG2 fork 接手	⚠️ MS AutoGen in maintenance; AG2 fork active
LlamaIndex Agents	RAG 為主軸的 agent	RAG-first agents	QueryEngine + Agent	QueryEngine + Agent	★★★	★★★	中	Moderate	📚 文件密集場景	📚 doc-heavy use cases
OpenAI Agents SDK	OpenAI 模型 + Swarm 模式	OpenAI models + Swarm pattern	Agent + Tool + Hand-off	Agent + Tool + Hand-off	★★★★	★★★★ (Traces)	★★ 低	★★ low	🆕 2025 Q1 GA	🆕 2025 Q1 GA
Anthropic Agent SDK	原生 MCP + Claude + sub-agent	Native MCP + Claude + sub-agent	Sub-agent + MCP servers	Sub-agent + MCP servers	★★★★	★★★★	★★ 低	★★ low	🆕 2025 Q4 推出	🆕 launched 2025 Q4

深度解析

三、三大主流框架的代碼風味比較

同樣的「研究 → 寫稿」雙 agent 任務，三種寫法：

The same "researcher → writer" two-agent task, three ways:

from langgraph.graph import StateGraph, END
from typing import TypedDict
class S(TypedDict): query:str; notes:str; draft:str

def research(s): return {"notes": llm.invoke(f"Research: {s['query']}").content}
def write(s):    return {"draft": llm.invoke(f"Write using: {s['notes']}").content}

g = StateGraph(S)
g.add_node("research", research); g.add_node("write", write)
g.set_entry_point("research"); g.add_edge("research","write"); g.add_edge("write", END)
app = g.compile(checkpointer=memory)   # 內建持久化
result = app.invoke({"query":"CRISPR base editing"})

from crewai import Agent, Task, Crew, Process

researcher = Agent(role="Researcher", goal="Gather 3 facts", backstory="PhD biologist.")
writer     = Agent(role="Writer",     goal="200-word brief", backstory="Science journalist.")

t1 = Task(description="Research CRISPR base editing", agent=researcher, expected_output="3 bullets")
t2 = Task(description="Write 200-word brief",        agent=writer,     expected_output="draft", context=[t1])

crew = Crew(agents=[researcher,writer], tasks=[t1,t2], process=Process.sequential)
result = crew.kickoff()

from openai_agents import Agent, Runner

researcher = Agent(name="Researcher", instructions="Gather 3 key facts on the topic.")
writer     = Agent(name="Writer",     instructions="Write a 200-word brief from the researcher's notes.",
                  handoffs=[])
researcher.handoffs = [writer]   # researcher 完成後可 hand off

result = Runner.run_sync(researcher, "CRISPR base editing")
print(result.final_output)

🔍

觀察風味差異：LangGraph 要求你顯式定義 state 與圖（最 verbose 但最可控）；CrewAI 把 agent 當「角色」描述（最直覺，自然語言為主）；OpenAI SDK 介於兩者間，hand-off 是 first-class citizen。 Style notes: LangGraph forces explicit state and graph (most verbose, most controllable); CrewAI talks about agents as "roles" (most intuitive, natural-language-first); OpenAI SDK sits in between with hand-offs as first-class citizens.

決策樹

四、該選哪個框架？三步決策樹

🌳 選擇你的框架

Q1:任務 < 5 步、單 agent、無持久化需求？< 5 steps, single agent, no persistence?▶ Pure Python loop

Q2:否則，需要正規生產部署 + 複雜流程 + 完整觀測？Else, formal production + complex flow + full observability?▶ LangGraph

Q3:需要快速 prototype 多 agent 團隊、非工程師也能改？Quick multi-agent prototype, non-engineers can tweak?▶ CrewAI

Q4:需要 agent 群聊辯論、人類 proxy 介入？Group-chat debate, human-proxy intervention?▶ AG2 (or AutoGen)

Q5:主軸是 RAG / 文件密集？Mostly RAG / doc-heavy?▶ LlamaIndex Agents

Q6:完全綁 OpenAI / Anthropic 一家、想用最薄抽象？Pinned to one vendor, thinnest abstraction?▶ OpenAI Agents SDK / Anthropic SDK

實務告白

五、選框架時的五個務實考量

① 團隊熟悉度

選團隊已熟悉的框架，學習成本比框架本身優劣更重要。LangChain 生態最大，找人最容易。

Pick what your team already knows — learning cost matters more than fine differences. LangChain has the largest ecosystem and talent pool.

② 模型廠商鎖定

OpenAI Agents SDK 與 Anthropic SDK 都只支援自家模型。LangGraph / CrewAI 跨廠商。看你是否在意。

OpenAI Agents SDK and Anthropic SDK lock you to one vendor; LangGraph / CrewAI are cross-vendor. Weigh the trade-off.

③ 可觀測性

LangGraph + LangSmith 是 2026 觀測性最完整組合（trace、time-travel debug、A/B）。其他框架靠第三方 (Langfuse、Arize)。

LangGraph + LangSmith is the most complete 2026 observability stack (traces, time-travel debug, A/B). Others rely on third-party (Langfuse, Arize).

④ 人類介入

LangGraph 原生支援「暫停 → 等人 → 繼續」(interrupt_before)。AutoGen/AG2 用 human-proxy agent；CrewAI 需要自己包。

LangGraph natively supports pause → wait for human → resume (interrupt_before). AutoGen/AG2 use a human-proxy agent; CrewAI needs custom wrappers.

⑤ 授權與營運

確認框架的 license 與你公司政策相容、確認原作者 / 公司會持續維護。AutoGen 2026 進入 maintenance 模式（MS 投入 Agent Framework），是個警訊。

Verify license compatibility and whether maintainers will keep shipping. AutoGen entered maintenance mode in 2026 (MS shifted to Agent Framework) — a cautionary signal.

2026 推薦組合

六、Charlene 推薦的 2026 起手式

🚀 從零開始的個人 / 小團隊路線

第 1 週：純 Python + Anthropic SDK 或 OpenAI SDK 跑通 while loop 版的 Step 1 agent，理解每一行代碼。
第 2 週：把工具改用 MCP 標準化（Step 11），用 Claude Desktop 或 Cursor 直接連線測試。
第 3 週：加上 Chroma 做 episodic memory + Cohere rerank 做 RAG。
第 4 週：把流程搬到 LangGraph，加上 checkpoint、observability、A/B canary。
之後：需要多 agent 時引入 OpenAI Agents SDK（單廠）或 LangGraph supervisor 模式（跨廠）。

Week 1: Plain Python + Anthropic / OpenAI SDK with a while loop agent (Chapter 1 starter). Understand every line.
Week 2: Standardize tools via MCP (Chapter 11). Test with Claude Desktop or Cursor.
Week 3: Add Chroma for episodic memory + Cohere rerank for RAG.
Week 4: Port to LangGraph; enable checkpointing, observability, A/B canary.
Later: Introduce OpenAI Agents SDK (single-vendor) or LangGraph supervisor pattern (cross-vendor) when multi-agent is justified.

🎓 章節小測

Q1. Anthropic「Building effective agents」一文的核心建議是？

Q1. The core advice of Anthropic's "Building effective agents" post?

A) 一定要用 LangChain

B) 多數任務純 Python loop + SDK 就夠

C) 必須多 agent

D) 不要用 LLM

✅ 簡潔優於複雜，必要時才引入框架。✅ Simplicity beats complexity; bring in frameworks only when justified.

Q2. 哪個框架原生支援「暫停 → 等人 → 繼續」？

Q2. Which framework natively supports pause → wait-for-human → resume?

A) LangGraph

B) CrewAI

C) LlamaIndex Agents

D) 都不支援

✅ LangGraph 的 interrupt_before + checkpointer 提供原生 HITL。✅ LangGraph's interrupt_before + checkpointer give native HITL.

Q3. 「先用框架再說」的最大風險？

Q3. The biggest risk of "framework-first"?

A) 太貴

B) 抽象層讓除錯與升級變複雜

C) 違反 GDPR

D) 模型會掛掉

✅ 套用前先確定真的需要，否則只是在自找麻煩。✅ Adopt only when the need is concrete; otherwise it's self-inflicted complexity.

← Step 9Agent 評估 Step 11 →MCP 協定