Step 4: Tool Use — AI Agents Tutorial

出發點

一、什麼是 Function Calling？

2023 年 OpenAI 發布 function calling 後，整個 agent 生態就改變了。在此之前，要讓 LLM 觸發外部動作只能靠脆弱的 regex 解析；之後，LLM 學會了用結構化的 JSON 物件請求工具呼叫。

機制很單純：你在 API 呼叫時附上 tools 列表（每個工具是 name + description + JSON schema），模型就可以選擇回傳一個普通文字回覆，或回傳「我要用 tool X，參數是 Y」。你（程式）執行該工具、把結果塞回對話，模型再決定下一步。

這就是 agent 的核心動力。沒有 tool use，LLM 就只是一個會說漂亮話的 chatbot。

When OpenAI released function calling in 2023, the entire agent ecosystem changed. Before, triggering external actions required brittle regex parsing; after, the LLM could request tool calls as structured JSON objects.

The mechanism is simple: pass a tools list (each = name + description + JSON schema) when calling the API. The model either returns a plain text reply or a "I want to call tool X with arguments Y." You execute the tool, push the result back into the conversation, and the model decides what to do next.

This is the powerhouse of every agent. Without tool use, an LLM is just a well-spoken chatbot.

User → LLM ──tool_call──▶ Your code ──result──▶ LLM ──tool_call──▶ ... ↘ final answer → User

設計工具

二、一個工具長什麼樣子

{
  "name": "search_papers",
  "description": "Search PubMed for papers matching a query. "
                 "Use when the user asks about scientific literature. "
                 "Returns up to 10 result objects with title, authors, year, doi.",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": { "type": "string", "description": "PubMed boolean query syntax" },
      "max_results": { "type": "integer", "minimum": 1, "maximum": 50, "default": 10 },
      "year_min": { "type": "integer", "description": "Inclusive lower bound" }
    },
    "required": ["query"]
  }
}

{
  "type": "function",
  "function": {
    "name": "search_papers",
    "description": "Search PubMed ...",
    "parameters": {
      "type": "object",
      "properties": {
        "query": {"type":"string"},
        "max_results": {"type":"integer","default":10}
      },
      "required": ["query"],
      "additionalProperties": false
    },
    "strict": true
  }
}

🎯

Description 是 Agent 選工具的唯一線索。不要只寫「Search papers」，要寫「When to use」+「Inputs/Outputs」+「Edge cases」。一個好 description 通常 2–4 句、含至少 1 個範例。 Description is the agent's only signal for picking a tool. Don't write "Search papers" — write "When to use" + "Inputs/Outputs" + "Edge cases." A good description is usually 2–4 sentences with at least one example.

設計原則

三、設計工具的七條準則

① 少而精

研究顯示工具數 >15 後，模型挑錯工具的機率明顯上升。能用一個高階工具解的就不要拆三個。

Studies show tool selection accuracy drops sharply past ~15 tools. Prefer one high-level tool over three low-level ones if it solves the same problem.

② 動詞 + 名詞命名

search_papers ✅；papers ❌；do_thing ❌。動作明確的名稱讓模型推理更快。

search_papers ✅; papers ❌; do_thing ❌. Action-clear names speed up reasoning.

③ 冪等性

同樣參數呼叫多次得到同樣結果。Agent 會 retry，非冪等工具會引爆 bug（重複收費、重複建單）。

Same args = same result. Agents retry; non-idempotent tools explode bugs (double-charges, duplicate tickets).

④ 返回結構化錯誤

失敗回 {"error": "...", "hint": "try X"}，別 throw exception。讓 agent 能看到錯誤就調整。

On failure return {"error": "...", "hint": "try X"}, don't throw. The agent can read and adjust.

⑤ 最小權限

read_file ≠ execute_arbitrary_shell。提供 agent 「夠用」的權限，不是「方便」的權限。

read_file ≠ execute_arbitrary_shell. Give the agent "enough" permission, not "convenient" permission.

⑥ 輸出大小可控

單次工具回傳避免 > 5KB；過大會塞爆 context。提供 pagination 或 summary 模式。

Cap single tool returns at ≲ 5KB; oversized returns blow up context. Offer pagination or summary mode.

⑦ 敏感操作要二次確認

delete_user、send_money、send_email 應強制 confirm: true 參數。或暴露為「low-impact preview + high-impact commit」兩階段工具。

delete_user, send_money, send_email should force a confirm: true param — or split into "preview" + "commit" pairs.

互動模擬

四、Agent 工具迴圈模擬器

下方模擬一個 ReAct agent 接收使用者問題後的內部對話。按「下一步」逐步執行：

The simulator below shows a ReAct agent's internal log after receiving a user question. Click "Next step" to advance:

Step 0 / 6

進階：並行

五、並行工具呼叫 (Parallel Tool Use)

2024 年起，旗艦模型支援單一輪次回傳多個 tool_use blocks。比如 agent 想同時查 PubMed、ClinVar、OMIM，就一次回傳 3 個 tool_use，後端可並行執行，再把 3 個 tool_result 一次塞回去。

實務效益：對網路 I/O 密集的 agent，p95 延遲可降 3–5 倍。

Since 2024 flagship models can emit multiple tool_use blocks in a single turn. The agent can call PubMed, ClinVar, and OMIM concurrently — your runtime fans out, awaits all results, and returns them as one tool_result batch.

Production payoff: p95 latency cuts 3–5× for network-bound agents.

import asyncio
async def run_tools_parallel(tool_uses):
    async def one(tu):
        result = await TOOL_REGISTRY[tu.name](**tu.input)
        return {"type":"tool_result","tool_use_id":tu.id,"content":result}
    return await asyncio.gather(*[one(tu) for tu in tool_uses])

# In the agent loop:
tool_uses = [b for b in resp.content if b.type=="tool_use"]
results   = await run_tools_parallel(tool_uses)
messages.append({"role":"user","content":results})

async function runToolsParallel(toolCalls) {
  return Promise.all(toolCalls.map(async tc => {
    const args = JSON.parse(tc.function.arguments);
    const result = await TOOL_REGISTRY[tc.function.name](args);
    return { role:"tool", tool_call_id:tc.id, content:JSON.stringify(result) };
  }));
}
const outs = await runToolsParallel(msg.tool_calls);
messages.push(...outs);

錯誤處理

六、四種常見工具錯誤與恢復策略

錯誤類型	該回傳給 agent 的訊息	預期行為
參數驗證失敗	Invalid arguments	`{error:"validation", details:"max_results must be ≤ 50, got 200"}`	調整參數後重試	Adjust args & retry
API 暫時性故障	Transient API failure	`{error:"transient", retry_after:5}`	等待後重試（注意總次數）	Wait & retry (cap retries)
權限不足	Permission denied	`{error:"permission", hint:"escalate or ask user"}`	詢問使用者或 escalate	Ask user / escalate
找不到結果	Empty result	`{result:[], hint:"try broader query"}`	放寬條件再試	Broaden query & retry

🚫

反模式：千萬不要在工具失敗時 raise Exception 讓整個 agent loop 崩潰。Agent 應該能讀到錯誤訊息並自我修復——這正是它的價值。 Anti-pattern: never let tool failure raise Exception and crash the loop. The agent should see the error and self-recover — that's the whole point.

2026 趨勢

七、Computer Use 與 Browser Use

2024 年 Anthropic 發布 Computer Use beta，2025 年進入正式版：模型可以直接「看截圖、移動滑鼠、點按鈕、輸入文字」。OpenAI 也在 2025 年推出對應的 Operator/Computer Use。

本質上這是把整個 GUI 變成一組 tool：screenshot()、click(x,y)、type(text)。最大的優勢是不再需要 API——任何能用滑鼠的網站都能被 agent 操作。

但代價也大：每一步都要 vision LLM 處理截圖，又貴又慢；且安全風險顯著（agent 看到惡意網頁的 prompt injection 就會點 phishing 連結）。Anthropic 與 OpenAI 都建議 Computer Use 只在沙箱環境執行。

Anthropic released Computer Use beta in 2024 and went GA in 2025: the model directly "sees screenshots, moves the mouse, clicks buttons, types." OpenAI shipped Operator/Computer Use the same year.

Conceptually this is turning the whole GUI into a toolset: screenshot(), click(x,y), type(text). The big win: no API required — any website usable with a mouse becomes operable by an agent.

The cost: every step needs a vision LLM on the screenshot — expensive and slow. And safety risk is real: a malicious page's injected prompt can trick the agent into phishing clicks. Both Anthropic and OpenAI recommend running Computer Use only in sandboxes.

🎓 章節小測

Q1. 工具描述 (description) 寫得不好會發生什麼？

Q1. What happens if a tool description is poor?

A) API 會拒絕

B) Agent 會選錯工具或漏選正確工具

C) 模型會 crash

D) 工具會自動重啟

✅ Description 是 LLM 唯一用來「選工具」的線索。✅ The description is the only signal the LLM uses to choose a tool.

Q2. 為什麼工具最好設計成「冪等」？

Q2. Why design tools to be idempotent?

A) 可以加密

B) Agent 可能 retry，非冪等會造成重複副作用

C) 為了讓回應更快

D) 只是慣例

✅ 重複收費、重複建單都是非冪等惹的禍。✅ Double-charges and duplicate tickets all stem from non-idempotent tools.

Q3. 一個工具失敗時，下列哪種回傳對 agent 最有用？

Q3. Which failure response is most useful to the agent?

A) raise Exception

B) 回 \"error\" 字串

C) 結構化錯誤含 hint

D) 完全不回應

✅ Agent 能讀到 hint 並自我修復，這是 agentic 的核心優勢。✅ The agent can read the hint and self-recover — this is the core agentic advantage.

← Step 3Prompt 工程 Step 5 →記憶系統