一、規劃迴圈是 Agent 的「控制流」
單一 LLM 呼叫只能「想一次」。Agent 要解決多步驟任務,必須把「想 → 行動 → 觀察」做成迴圈。這個迴圈如何組織,就是規劃模式 (planning pattern)。
2026 年生產界的主流分四種:
- ReAct:邊想邊做,最普及
- Plan-and-Execute:先寫完整計劃再執行
- Tree-of-Thoughts (ToT):同時探索多條路徑
- Reflection:行動後自我批判 + 重做
不同任務適合不同模式。本章把四者拆解、給選擇指南、附完整程式碼。
A single LLM call thinks once. For multi-step tasks the agent must loop "think → act → observe." How that loop is organized = the planning pattern.
Four 2026 production-grade patterns dominate:
- ReAct — think and act interleaved, most popular
- Plan-and-Execute — write the full plan first, then execute
- Tree-of-Thoughts (ToT) — explore multiple paths in parallel
- Reflection — self-critique after each action and redo
Different tasks suit different patterns. This chapter dissects all four with selection rules and full code.
二、ReAct:Reasoning + Acting
ReAct (Yao et al. 2023) 是最有影響力的 agent 規劃模式。核心 prompt 結構:
ReAct (Yao et al. 2023) is the most influential agent pattern. The core prompt structure:
優勢:單一 LLM 即可實作;推理 trace 可審計;錯誤恢復自然——觀察到失敗 → 下一個 Thought 自動調整。
劣勢:每步都呼叫 LLM,慢且貴;長迴圈中容易陷入無窮重複;推理品質依賴 prompt 質量。
適用:3–8 步左右、工具呼叫為主、任務難度中等的場景。也是 LangChain 的預設模式。
Pros: implementable with one LLM; reasoning trace is auditable; natural error recovery — observe failure → next Thought adjusts.
Cons: every step is an LLM call, slow and expensive; can loop infinitely on hard tasks; reasoning quality leans on prompt quality.
Best for: 3–8 step tool-call-heavy tasks of moderate difficulty. LangChain's default mode.
REACT_SYS = """You are a ReAct agent. Solve the user's task by alternating: Thought: ... Action: tool_name[args] (one tool call) After receiving the tool result you'll see an Observation, then continue. When you have the answer, output: Final Answer: ...""" messages = [{"role":"system","content":REACT_SYS}, {"role":"user","content": user_question}] for step in range(MAX_STEPS): resp = client.messages.create(model="claude-sonnet-4-6", tools=TOOLS, max_tokens=1024, messages=messages) messages.append({"role":"assistant","content": resp.content}) if resp.stop_reason == "end_turn": break tool_results = [{"type":"tool_result", "tool_use_id":b.id, "content": run_tool(b.name, b.input)} for b in resp.content if b.type=="tool_use"] messages.append({"role":"user","content": tool_results})
三、Plan-and-Execute:先規劃、後執行
把規劃與執行解耦:
- Planner LLM:把使用者目標拆成 5–15 步的明確計劃,輸出 JSON。
- Executor:逐步執行(可用更便宜的模型或腳本)。
- Re-planner:若執行偏離計劃,回頭重新規劃。
優勢:Planner 用昂貴模型一次想清楚,後面 Executor 用便宜模型大量跑——成本大幅下降;計劃可給人類審查;步驟可並行。
劣勢:初始計劃若錯,整路都偏;中途新資訊難以利用(需要 re-plan)。
適用:步驟多 (>10)、流程穩定、可預測的任務。LangGraph 的「planner / executor」範例就是這模式。
Decouples planning from execution:
- Planner LLM: decomposes the goal into a 5–15 step JSON plan.
- Executor: runs each step (cheaper model or script).
- Re-planner: re-plans if execution diverges.
Pros: one expensive call to plan, many cheap calls to execute — large cost reduction; plan is human-reviewable; steps can run in parallel.
Cons: a bad initial plan cascades; mid-execution discoveries are hard to incorporate without re-planning.
Best for: long flows (>10 steps), stable and predictable. LangGraph's planner/executor example uses this.
四、Tree-of-Thoughts (ToT):分支搜尋
當任務需要「探索多種解法」時(如數學謎題、博弈、創意寫作),ReAct 的線性推理會卡住——一條路走錯就 GG。ToT (Yao et al. 2023) 把推理變成樹狀搜尋:
- 每一步生成多個候選 thought (branch 3–5 條)
- 用 LLM 評估每條 thought 的「有希望程度」
- 用 BFS / DFS 展開最有希望的分支
- 抵達終點或預算用盡時停止
優勢:對需要回溯的任務(數獨、24-game、創意組合)顯著超越 ReAct。
劣勢:呼叫次數爆炸(10×–100× ReAct);很慢、很貴;多數實務任務用不上。
適用:離散搜尋空間、明確可評估的中間狀態。生產上很少用,學術論文常見。
When tasks need to "explore multiple solutions" (math puzzles, games, creative writing), ReAct's linear reasoning gets stuck — one wrong turn dooms you. ToT (Yao et al. 2023) turns reasoning into a tree search:
- Generate several candidate thoughts per step (3–5 branches)
- Have the LLM rate each branch's "promise"
- Expand most-promising branches via BFS / DFS
- Stop at goal or budget exhaustion
Pros: beats ReAct dramatically on backtracking tasks (Sudoku, 24-game, creative composition).
Cons: call count explodes (10×–100× ReAct); slow, costly; most real tasks don't need it.
Best for: discrete search spaces with clear intermediate evaluators. Rare in production, common in papers.
五、Reflection:自我批判 + 重做
Reflexion (Shinn et al. 2023) 與 Self-Refine (Madaan et al. 2023) 共同推動:讓 agent 在一個 task 結束後自我評估,把「下次該避免什麼 / 該怎麼改」寫進短期記憶,再重做一次。
常見實作流程:
- Agent 完成第一版答案 A1
- Critic LLM(可以是同個模型)審查 A1,輸出問題清單
- 把 critique 加入 context,Agent 產出 A2
- 重複 N 次或直到 critic 滿意
優勢:對寫作、程式碼、長推理任務很有效(程式 unit test pass rate 常見 +15%)。
劣勢:成本 2–5×;若 critic 本身錯,會強化錯誤。
適用:高品質要求、可離線執行、有自動評估訊號(test、規則)的任務。
Reflexion (Shinn et al. 2023) and Self-Refine (Madaan et al. 2023) popularized: after a task, have the agent self-assess, write "what to avoid / improve" into memory, and try again.
Typical flow:
- Agent produces draft A1
- Critic LLM (often the same model) reviews A1, lists issues
- Critique appended to context, Agent produces A2
- Repeat N times or until critic is satisfied
Pros: strong on writing, coding, long reasoning (commonly +15% unit-test pass rate).
Cons: 2–5× cost; a wrong critic reinforces errors.
Best for: quality-critical, offline tasks with auto-eval signal (tests, rules).
六、決策樹:該選哪種規劃模式?
🌳 選擇規劃模式
七、防止無窮迴圈的五個保險絲
不論用哪種規劃模式,agent 都可能卡在迴圈。生產系統必裝:
- max_iterations 上限:超過 N 步強制停止(通常 10–30)。
- 相同動作偵測:連續 3 次呼叫同樣 (tool, args) 就強制結束。
- Token / 成本上限:累積成本超過閾值就 escalate。
- Progress check:每 K 步問 LLM「離目標多遠?」沒進展就強制結束或 re-plan。
- Human-in-the-loop checkpoint:執行敏感操作前必須人類核可。
Regardless of pattern, agents can loop. Production systems must install:
- max_iterations cap: stop after N steps (usually 10–30).
- Repeat-action detection: same (tool, args) three times → stop.
- Token / cost cap: escalate after threshold.
- Progress check: every K steps ask the LLM "how close to goal?" — no progress → stop or re-plan.
- Human-in-the-loop checkpoint: human approval before sensitive actions.
🎓 章節小測
Q1. ReAct 模式的核心結構是?
Q1. The core ReAct structure is?
Q2. 為什麼 Tree-of-Thoughts 很少用在生產 agent?
Q2. Why is Tree-of-Thoughts rare in production agents?
Q3. 防止 agent 無窮迴圈,下列哪一項不是有效手段?
Q3. Which is not an effective safeguard against infinite loops?