STEP 10 / 12 · 生產實戰

框架比較

LangGraph、CrewAI、AutoGen / AG2、LlamaIndex Agents、OpenAI Agents SDK、Anthropic SDK——一份 2026 選擇指南。

LangGraph, CrewAI, AutoGen / AG2, LlamaIndex Agents, OpenAI Agents SDK, Anthropic SDK — a 2026 selection guide.

一、要不要用框架?

2025 年 Anthropic 工程部落格寫過一篇有名的文章「Building effective agents」:對絕大多數任務,純 Python while loop + LLM API + 一份工具註冊表就夠了。框架的價值在於:

  • 提供已測試的設計模式(避免重造輪子)
  • 觀測性 / 持久化 / checkpoint 已內建
  • 多 agent 的 hand-off / 訊息協定已標準化
  • 能畫出圖、視覺化 debug

但框架也有代價:

  • 抽象層讓除錯變難(「為什麼這個 prompt 變成這樣?」)
  • 升級風險(LangChain 0.x → 0.y 的破壞性改動)
  • 過度工程化簡單任務

本章對比六大主流選項,幫你做出有依據的選擇。

Anthropic's famous 2025 engineering post "Building effective agents" argues that for most tasks, a plain Python while-loop + LLM API + a tool registry is enough. The value of a framework is:

  • Battle-tested patterns (no reinventing wheels)
  • Built-in observability / persistence / checkpointing
  • Standardized multi-agent hand-off & messaging
  • Visualizable graphs for debugging

The cost:

  • Abstractions make debugging harder ("why did this prompt mutate?")
  • Upgrade risk (LangChain 0.x → 0.y breaking changes)
  • Overkill for simple tasks

This chapter compares the six mainstream options to inform a grounded choice.

二、2026 主流 Agent 框架對照表

框架主打心智模型多 agent觀測性學習曲線2026 狀態
LangGraph高可控、複雜流程Maximum control, complex flows有向狀態圖Directed state graph★★★★★★★★★★ (LangSmith)★★★★★ (LangSmith)中等偏陡Steep-ish🚀 生產首選🚀 production king
CrewAI快速組「角色團隊」Quick role-based teamsrole + task DSLrole + task DSL★★★★★★★★ 最低★ lowest🌱 快速 prototype🌱 fast prototyping
AutoGen / AG2會話式多 agent、群聊Conversational multi-agent, group chatGroupChat + SpeakerGroupChat + Speaker★★★★★★★Moderate⚠️ Microsoft AutoGen 維護中、AG2 fork 接手⚠️ MS AutoGen in maintenance; AG2 fork active
LlamaIndex AgentsRAG 為主軸的 agentRAG-first agentsQueryEngine + AgentQueryEngine + Agent★★★★★★Moderate📚 文件密集場景📚 doc-heavy use cases
OpenAI Agents SDKOpenAI 模型 + Swarm 模式OpenAI models + Swarm patternAgent + Tool + Hand-offAgent + Tool + Hand-off★★★★★★★★ (Traces)★★ 低★★ low🆕 2025 Q1 GA🆕 2025 Q1 GA
Anthropic Agent SDK原生 MCP + Claude + sub-agentNative MCP + Claude + sub-agentSub-agent + MCP serversSub-agent + MCP servers★★★★★★★★★★ 低★★ low🆕 2025 Q4 推出🆕 launched 2025 Q4

三、三大主流框架的代碼風味比較

同樣的「研究 → 寫稿」雙 agent 任務,三種寫法:

The same "researcher → writer" two-agent task, three ways:

from langgraph.graph import StateGraph, END
from typing import TypedDict
class S(TypedDict): query:str; notes:str; draft:str

def research(s): return {"notes": llm.invoke(f"Research: {s['query']}").content}
def write(s):    return {"draft": llm.invoke(f"Write using: {s['notes']}").content}

g = StateGraph(S)
g.add_node("research", research); g.add_node("write", write)
g.set_entry_point("research"); g.add_edge("research","write"); g.add_edge("write", END)
app = g.compile(checkpointer=memory)   # 內建持久化
result = app.invoke({"query":"CRISPR base editing"})
from crewai import Agent, Task, Crew, Process

researcher = Agent(role="Researcher", goal="Gather 3 facts", backstory="PhD biologist.")
writer     = Agent(role="Writer",     goal="200-word brief", backstory="Science journalist.")

t1 = Task(description="Research CRISPR base editing", agent=researcher, expected_output="3 bullets")
t2 = Task(description="Write 200-word brief",        agent=writer,     expected_output="draft", context=[t1])

crew = Crew(agents=[researcher,writer], tasks=[t1,t2], process=Process.sequential)
result = crew.kickoff()
from openai_agents import Agent, Runner

researcher = Agent(name="Researcher", instructions="Gather 3 key facts on the topic.")
writer     = Agent(name="Writer",     instructions="Write a 200-word brief from the researcher's notes.",
                  handoffs=[])
researcher.handoffs = [writer]   # researcher 完成後可 hand off

result = Runner.run_sync(researcher, "CRISPR base editing")
print(result.final_output)
🔍
觀察風味差異:LangGraph 要求你顯式定義 state 與圖(最 verbose 但最可控);CrewAI 把 agent 當「角色」描述(最直覺,自然語言為主);OpenAI SDK 介於兩者間,hand-off 是 first-class citizen。 Style notes: LangGraph forces explicit state and graph (most verbose, most controllable); CrewAI talks about agents as "roles" (most intuitive, natural-language-first); OpenAI SDK sits in between with hand-offs as first-class citizens.

四、該選哪個框架?三步決策樹

🌳 選擇你的框架

Q1:任務 < 5 步、單 agent、無持久化需求?< 5 steps, single agent, no persistence?Pure Python loop
Q2:否則,需要正規生產部署 + 複雜流程 + 完整觀測?Else, formal production + complex flow + full observability?LangGraph
Q3:需要快速 prototype 多 agent 團隊、非工程師也能改?Quick multi-agent prototype, non-engineers can tweak?CrewAI
Q4:需要 agent 群聊辯論、人類 proxy 介入?Group-chat debate, human-proxy intervention?AG2 (or AutoGen)
Q5:主軸是 RAG / 文件密集?Mostly RAG / doc-heavy?LlamaIndex Agents
Q6:完全綁 OpenAI / Anthropic 一家、想用最薄抽象?Pinned to one vendor, thinnest abstraction?OpenAI Agents SDK / Anthropic SDK

五、選框架時的五個務實考量

團隊熟悉度

選團隊已熟悉的框架,學習成本比框架本身優劣更重要。LangChain 生態最大,找人最容易。

Pick what your team already knows — learning cost matters more than fine differences. LangChain has the largest ecosystem and talent pool.

模型廠商鎖定

OpenAI Agents SDK 與 Anthropic SDK 都只支援自家模型。LangGraph / CrewAI 跨廠商。看你是否在意。

OpenAI Agents SDK and Anthropic SDK lock you to one vendor; LangGraph / CrewAI are cross-vendor. Weigh the trade-off.

可觀測性

LangGraph + LangSmith 是 2026 觀測性最完整組合(trace、time-travel debug、A/B)。其他框架靠第三方 (Langfuse、Arize)。

LangGraph + LangSmith is the most complete 2026 observability stack (traces, time-travel debug, A/B). Others rely on third-party (Langfuse, Arize).

人類介入

LangGraph 原生支援「暫停 → 等人 → 繼續」(interrupt_before)。AutoGen/AG2 用 human-proxy agent;CrewAI 需要自己包。

LangGraph natively supports pause → wait for human → resume (interrupt_before). AutoGen/AG2 use a human-proxy agent; CrewAI needs custom wrappers.

授權與營運

確認框架的 license 與你公司政策相容、確認原作者 / 公司會持續維護。AutoGen 2026 進入 maintenance 模式(MS 投入 Agent Framework),是個警訊。

Verify license compatibility and whether maintainers will keep shipping. AutoGen entered maintenance mode in 2026 (MS shifted to Agent Framework) — a cautionary signal.

六、Charlene 推薦的 2026 起手式

🚀 從零開始的個人 / 小團隊路線

  1. 第 1 週:純 Python + Anthropic SDK 或 OpenAI SDK 跑通 while loop 版的 Step 1 agent,理解每一行代碼。
  2. 第 2 週:把工具改用 MCP 標準化(Step 11),用 Claude Desktop 或 Cursor 直接連線測試。
  3. 第 3 週:加上 Chroma 做 episodic memory + Cohere rerank 做 RAG。
  4. 第 4 週:把流程搬到 LangGraph,加上 checkpoint、observability、A/B canary。
  5. 之後:需要多 agent 時引入 OpenAI Agents SDK(單廠)或 LangGraph supervisor 模式(跨廠)。
  1. Week 1: Plain Python + Anthropic / OpenAI SDK with a while loop agent (Chapter 1 starter). Understand every line.
  2. Week 2: Standardize tools via MCP (Chapter 11). Test with Claude Desktop or Cursor.
  3. Week 3: Add Chroma for episodic memory + Cohere rerank for RAG.
  4. Week 4: Port to LangGraph; enable checkpointing, observability, A/B canary.
  5. Later: Introduce OpenAI Agents SDK (single-vendor) or LangGraph supervisor pattern (cross-vendor) when multi-agent is justified.

🎓 章節小測

Q1. Anthropic「Building effective agents」一文的核心建議是?

Q1. The core advice of Anthropic's "Building effective agents" post?

A) 一定要用 LangChain
B) 多數任務純 Python loop + SDK 就夠
C) 必須多 agent
D) 不要用 LLM
✅ 簡潔優於複雜,必要時才引入框架。✅ Simplicity beats complexity; bring in frameworks only when justified.

Q2. 哪個框架原生支援「暫停 → 等人 → 繼續」?

Q2. Which framework natively supports pause → wait-for-human → resume?

A) LangGraph
B) CrewAI
C) LlamaIndex Agents
D) 都不支援
✅ LangGraph 的 interrupt_before + checkpointer 提供原生 HITL。✅ LangGraph's interrupt_before + checkpointer give native HITL.

Q3. 「先用框架再說」的最大風險?

Q3. The biggest risk of "framework-first"?

A) 太貴
B) 抽象層讓除錯與升級變複雜
C) 違反 GDPR
D) 模型會掛掉
✅ 套用前先確定真的需要,否則只是在自找麻煩。✅ Adopt only when the need is concrete; otherwise it's self-inflicted complexity.