STEP 6 / 12 · 核心元件

規劃與推理

ReAct、Plan-and-Execute、Tree-of-Thoughts、Reflection——四種主流規劃迴圈與何時用哪個。

ReAct, Plan-and-Execute, Tree-of-Thoughts, Reflection — four mainstream planning loops and when to use each.

一、規劃迴圈是 Agent 的「控制流」

單一 LLM 呼叫只能「想一次」。Agent 要解決多步驟任務,必須把「想 → 行動 → 觀察」做成迴圈。這個迴圈如何組織,就是規劃模式 (planning pattern)

2026 年生產界的主流分四種:

  1. ReAct:邊想邊做,最普及
  2. Plan-and-Execute:先寫完整計劃再執行
  3. Tree-of-Thoughts (ToT):同時探索多條路徑
  4. Reflection:行動後自我批判 + 重做

不同任務適合不同模式。本章把四者拆解、給選擇指南、附完整程式碼。

A single LLM call thinks once. For multi-step tasks the agent must loop "think → act → observe." How that loop is organized = the planning pattern.

Four 2026 production-grade patterns dominate:

  1. ReAct — think and act interleaved, most popular
  2. Plan-and-Execute — write the full plan first, then execute
  3. Tree-of-Thoughts (ToT) — explore multiple paths in parallel
  4. Reflection — self-critique after each action and redo

Different tasks suit different patterns. This chapter dissects all four with selection rules and full code.

二、ReAct:Reasoning + Acting

ReAct (Yao et al. 2023) 是最有影響力的 agent 規劃模式。核心 prompt 結構:

ReAct (Yao et al. 2023) is the most influential agent pattern. The core prompt structure:

Thought: I need to find X. Let me call tool Y. Action: Y[args] Observation: Thought: Now I have X. I still need Z. Action: ... ... Thought: I have enough. Final answer. Final Answer:

優勢:單一 LLM 即可實作;推理 trace 可審計;錯誤恢復自然——觀察到失敗 → 下一個 Thought 自動調整。

劣勢:每步都呼叫 LLM,慢且貴;長迴圈中容易陷入無窮重複;推理品質依賴 prompt 質量。

適用:3–8 步左右、工具呼叫為主、任務難度中等的場景。也是 LangChain 的預設模式。

Pros: implementable with one LLM; reasoning trace is auditable; natural error recovery — observe failure → next Thought adjusts.

Cons: every step is an LLM call, slow and expensive; can loop infinitely on hard tasks; reasoning quality leans on prompt quality.

Best for: 3–8 step tool-call-heavy tasks of moderate difficulty. LangChain's default mode.

REACT_SYS = """You are a ReAct agent. Solve the user's task by alternating:
Thought: ...
Action: tool_name[args]   (one tool call)
After receiving the tool result you'll see an Observation, then continue.
When you have the answer, output:
Final Answer: ..."""

messages = [{"role":"system","content":REACT_SYS},
            {"role":"user","content": user_question}]
for step in range(MAX_STEPS):
    resp = client.messages.create(model="claude-sonnet-4-6", tools=TOOLS,
                                    max_tokens=1024, messages=messages)
    messages.append({"role":"assistant","content": resp.content})
    if resp.stop_reason == "end_turn":  break
    tool_results = [{"type":"tool_result",
                     "tool_use_id":b.id,
                     "content": run_tool(b.name, b.input)}
                    for b in resp.content if b.type=="tool_use"]
    messages.append({"role":"user","content": tool_results})

三、Plan-and-Execute:先規劃、後執行

把規劃與執行解耦

  1. Planner LLM:把使用者目標拆成 5–15 步的明確計劃,輸出 JSON。
  2. Executor:逐步執行(可用更便宜的模型或腳本)。
  3. Re-planner:若執行偏離計劃,回頭重新規劃。

優勢:Planner 用昂貴模型一次想清楚,後面 Executor 用便宜模型大量跑——成本大幅下降;計劃可給人類審查;步驟可並行。

劣勢:初始計劃若錯,整路都偏;中途新資訊難以利用(需要 re-plan)。

適用:步驟多 (>10)、流程穩定、可預測的任務。LangGraph 的「planner / executor」範例就是這模式。

Decouples planning from execution:

  1. Planner LLM: decomposes the goal into a 5–15 step JSON plan.
  2. Executor: runs each step (cheaper model or script).
  3. Re-planner: re-plans if execution diverges.

Pros: one expensive call to plan, many cheap calls to execute — large cost reduction; plan is human-reviewable; steps can run in parallel.

Cons: a bad initial plan cascades; mid-execution discoveries are hard to incorporate without re-planning.

Best for: long flows (>10 steps), stable and predictable. LangGraph's planner/executor example uses this.

User goal ─► [Planner LLM] ─► PLAN[step1, step2, ..., stepN] │ ▼ ┌─ Execute step1 ─┐ ├─ Execute step2 ─┤ ◄─ re-plan if needed └─ Execute stepN ─┘ ▼ Final result

四、Tree-of-Thoughts (ToT):分支搜尋

當任務需要「探索多種解法」時(如數學謎題、博弈、創意寫作),ReAct 的線性推理會卡住——一條路走錯就 GG。ToT (Yao et al. 2023) 把推理變成樹狀搜尋

  1. 每一步生成多個候選 thought (branch 3–5 條)
  2. 用 LLM 評估每條 thought 的「有希望程度」
  3. 用 BFS / DFS 展開最有希望的分支
  4. 抵達終點或預算用盡時停止

優勢:對需要回溯的任務(數獨、24-game、創意組合)顯著超越 ReAct。

劣勢:呼叫次數爆炸(10×–100× ReAct);很慢、很貴;多數實務任務用不上。

適用:離散搜尋空間、明確可評估的中間狀態。生產上很少用,學術論文常見。

When tasks need to "explore multiple solutions" (math puzzles, games, creative writing), ReAct's linear reasoning gets stuck — one wrong turn dooms you. ToT (Yao et al. 2023) turns reasoning into a tree search:

  1. Generate several candidate thoughts per step (3–5 branches)
  2. Have the LLM rate each branch's "promise"
  3. Expand most-promising branches via BFS / DFS
  4. Stop at goal or budget exhaustion

Pros: beats ReAct dramatically on backtracking tasks (Sudoku, 24-game, creative composition).

Cons: call count explodes (10×–100× ReAct); slow, costly; most real tasks don't need it.

Best for: discrete search spaces with clear intermediate evaluators. Rare in production, common in papers.

五、Reflection:自我批判 + 重做

Reflexion (Shinn et al. 2023) 與 Self-Refine (Madaan et al. 2023) 共同推動:讓 agent 在一個 task 結束後自我評估,把「下次該避免什麼 / 該怎麼改」寫進短期記憶,再重做一次。

常見實作流程:

  1. Agent 完成第一版答案 A1
  2. Critic LLM(可以是同個模型)審查 A1,輸出問題清單
  3. 把 critique 加入 context,Agent 產出 A2
  4. 重複 N 次或直到 critic 滿意

優勢:對寫作、程式碼、長推理任務很有效(程式 unit test pass rate 常見 +15%)。

劣勢:成本 2–5×;若 critic 本身錯,會強化錯誤。

適用:高品質要求、可離線執行、有自動評估訊號(test、規則)的任務。

Reflexion (Shinn et al. 2023) and Self-Refine (Madaan et al. 2023) popularized: after a task, have the agent self-assess, write "what to avoid / improve" into memory, and try again.

Typical flow:

  1. Agent produces draft A1
  2. Critic LLM (often the same model) reviews A1, lists issues
  3. Critique appended to context, Agent produces A2
  4. Repeat N times or until critic is satisfied

Pros: strong on writing, coding, long reasoning (commonly +15% unit-test pass rate).

Cons: 2–5× cost; a wrong critic reinforces errors.

Best for: quality-critical, offline tasks with auto-eval signal (tests, rules).

六、決策樹:該選哪種規劃模式?

🌳 選擇規劃模式

Q1:任務超過 10 步、流程可預測?More than 10 steps and predictable flow?▶ Plan-and-Execute
Q2:否則,是否需要回溯/分支搜尋(謎題、博弈)?Else, need backtracking / search (puzzles, games)?▶ Tree-of-Thoughts
Q3:否則,任務高品質要求 + 可自動評估?Else, quality-critical with auto-eval?▶ Reflection
Q4:以上皆否(即一般 3–8 步工具呼叫)Else (typical 3–8 step tool calls)▶ ReAct
💡
混合模式:生產 agent 常常混用。例如外層 Plan-and-Execute 拆分大步驟,每個步驟內部用 ReAct,最後加一層 Reflection 自我審查。LangGraph 設計時就支援這種組合。 Hybrid: production agents often mix patterns. Outer Plan-and-Execute decomposes high-level steps; each step runs ReAct internally; a final Reflection layer self-audits. LangGraph supports this natively.

七、防止無窮迴圈的五個保險絲

不論用哪種規劃模式,agent 都可能卡在迴圈。生產系統必裝:

  1. max_iterations 上限:超過 N 步強制停止(通常 10–30)。
  2. 相同動作偵測:連續 3 次呼叫同樣 (tool, args) 就強制結束。
  3. Token / 成本上限:累積成本超過閾值就 escalate。
  4. Progress check:每 K 步問 LLM「離目標多遠?」沒進展就強制結束或 re-plan。
  5. Human-in-the-loop checkpoint:執行敏感操作前必須人類核可。

Regardless of pattern, agents can loop. Production systems must install:

  1. max_iterations cap: stop after N steps (usually 10–30).
  2. Repeat-action detection: same (tool, args) three times → stop.
  3. Token / cost cap: escalate after threshold.
  4. Progress check: every K steps ask the LLM "how close to goal?" — no progress → stop or re-plan.
  5. Human-in-the-loop checkpoint: human approval before sensitive actions.

🎓 章節小測

Q1. ReAct 模式的核心結構是?

Q1. The core ReAct structure is?

A) 同時跑多條推理路徑
B) Thought → Action → Observation 交替循環
C) 先寫完整 plan 再執行
D) 自我批判後重做
✅ ReAct = Reasoning + Acting,思考與行動交替。✅ ReAct = Reasoning + Acting, alternating.

Q2. 為什麼 Tree-of-Thoughts 很少用在生產 agent?

Q2. Why is Tree-of-Thoughts rare in production agents?

A) 概念太複雜
B) 呼叫次數 10–100×,成本與延遲過高
C) 模型不支援
D) 違反 OWASP
✅ 分支搜尋意味著大量 LLM 呼叫,多數實務任務不值得。✅ Branch search means many LLM calls — rarely worth it in practice.

Q3. 防止 agent 無窮迴圈,下列哪一項不是有效手段?

Q3. Which is not an effective safeguard against infinite loops?

A) max_iterations 上限
B) 相同 (tool, args) 偵測
C) 累積成本上限
D) 把 temperature 設成 0
✅ Temperature=0 反而可能讓 agent 一直選同一個錯誤動作。✅ T=0 can actually make the agent stick to the same wrong action.