一、為什麼 Agent 需要「記憶」?
純 LLM 是「無狀態」的——每次 API 呼叫都是新的開始,模型本身不會記得你昨天的對話。Context window 是 LLM 唯一的「記憶」,而且每次呼叫都得從頭塞滿。
對於單輪 chatbot 這沒問題,但對 Agent 是災難:
- 多輪任務需要記得前面做過什麼
- 跨 session 的個人化(使用者偏好)需要持久儲存
- 長對話超過 context window 後,舊資訊會被擠出
- Agent 之間的協作需要共享狀態
記憶系統就是把「短暫的 token」變成「持久的知識」的設計。本章拆解四種記憶角色。
A bare LLM is "stateless" — every API call is a fresh start, the model itself remembers nothing of yesterday. The context window is the only memory, and every call refills it from scratch.
Fine for single-turn chatbots; catastrophic for agents:
- Multi-step tasks need to remember prior steps
- Cross-session personalization (user preferences) needs persistent storage
- Long conversations past the context window evict old info
- Multi-agent collaboration requires shared state
Memory systems convert "ephemeral tokens" into "persistent knowledge." This chapter dissects four memory roles.
二、四種 Agent 記憶類型對照
借鑑認知科學,agent 圈普遍把記憶分為四類(與 CoALA 框架一致):
Borrowing from cognitive science (and the CoALA framework), the agent community recognizes four memory types:
工作記憶 (Working)
承載:當前對話歷史 + 工具結果 + 草稿區。
實作:就是 context window 中的 messages 陣列。
壽命:單次 task。
關鍵問題:token 預算管理、修剪策略。
Holds: current conversation + tool results + scratchpad.
Implementation: the messages array in the context window.
Lifespan: one task.
Key issue: token budgeting & trimming.
情節記憶 (Episodic)
承載:過去 task 的完整軌跡(query → steps → outcome)。
實作:vector DB + 時間戳索引。
壽命:跨 session 永久保存。
關鍵問題:relevance retrieval、隱私保留期。
Holds: full traces of past tasks (query → steps → outcome).
Implementation: vector DB + timestamp index.
Lifespan: persistent across sessions.
Key issue: relevance retrieval, retention policy.
語意記憶 (Semantic)
承載:抽象事實與知識(使用者偏好、領域規則、文件知識)。
實作:vector DB + 知識圖譜 + RAG。
壽命:持久。
關鍵問題:更新衝突、過時資訊汰換。
Holds: abstract facts (user preferences, domain rules, doc knowledge).
Implementation: vector DB + knowledge graph + RAG.
Lifespan: persistent.
Key issue: conflict resolution, staleness eviction.
程序記憶 (Procedural)
承載:「該怎麼做」——成功的 prompt 模板、工具序列、技能庫。
實作:prompt 庫、skill 註冊、fine-tuned weights。
壽命:持久。
關鍵問題:泛化、版本管理。
Holds: "how to do" — winning prompt templates, tool sequences, skill libraries.
Implementation: prompt library, skill registry, fine-tuned weights.
Lifespan: persistent.
Key issue: generalization, versioning.
三、Context Window 的修剪策略
當 messages 累積超過 context 上限或關鍵段位置不對時,要主動修剪。常見三種策略:
When messages exceed context or critical pieces drift to unhelpful positions, trim actively. Three common strategies:
| 策略 | 做法 | 優點 / 缺點 | ||
|---|---|---|---|---|
| Sliding Window | 只保留最近 N 輪 | Keep only the last N turns | ✅ 簡單、可預測 / ❌ 完全失去早期上下文 | ✅ simple, predictable / ❌ loses all early context |
| Summarization | 每 K 輪用 LLM 摘要舊訊息,替換掉 | Every K turns, LLM-summarize old messages and replace | ✅ 保留語意 / ❌ 摘要也會失真、額外成本 | ✅ retains gist / ❌ summary distorts, extra cost |
| Hierarchical / Selective | 舊訊息存 vector DB,需要時才檢索回來 | Push old messages to vector DB, retrieve on demand | ✅ 無上限、保真高 / ❌ 工程複雜 | ✅ unlimited, high fidelity / ❌ engineering overhead |
def trim_messages(messages, max_tokens=12000, keep_last=8): if count_tokens(messages) <= max_tokens: return messages system, *rest = messages head, tail = rest[:-keep_last], rest[-keep_last:] summary = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=600, messages=[{"role":"user", "content": "Summarize the following conversation in <200 words, " "preserve all decisions and tool results:\n" + render(head)}] ).content[0].text return [system, {"role":"user","content":f"[Earlier conversation summary] {summary}"}, *tail]
四、用向量資料庫實作情節 + 語意記憶
長期記憶的標準架構:把每段「值得記住」的內容用 embedding 模型轉成向量,存進向量 DB。Agent 每次新對話開始或需要回憶時,把當前 query 也 embed 成向量,檢索最相似的 K 條塞進 context。
2026 主流向量 DB:
- Pinecone — managed、生產級、貴
- Qdrant / Milvus — open source、自託管
- Weaviate — 內建 hybrid search
- pgvector — Postgres 擴充,<1M 向量首選
- Chroma / LanceDB — 輕量本機開發
Standard architecture: embed every "memory-worthy" snippet, store in a vector DB. On a new turn, embed the query and retrieve top-K most-similar items into context.
Mainstream vector DBs in 2026:
- Pinecone — managed, production, pricey
- Qdrant / Milvus — open-source, self-hostable
- Weaviate — built-in hybrid search
- pgvector — Postgres extension, ideal for <1M vectors
- Chroma / LanceDB — lightweight local dev
# pip install chromadb openai import chromadb from openai import OpenAI client = OpenAI() db = chromadb.PersistentClient(path="./agent_memory") mem = db.get_or_create_collection("episodes") def embed(text): return client.embeddings.create(model="text-embedding-3-small", input=text).data[0].embedding def remember(episode_id, text, metadata): mem.add(ids=[episode_id], documents=[text], embeddings=[embed(text)], metadatas=[metadata]) def recall(query, k=5, filter=None): res = mem.query(query_embeddings=[embed(query)], n_results=k, where=filter) return [{"text":d,"meta":m} for d,m in zip(res["documents"][0], res["metadatas"][0])] # Usage in agent loop remember("task_2026_05_12_001", "User asked to triage BRCA1 c.5266dupC. Agent used PubMed + ClinVar. Verdict: pathogenic.", {"user_id":"u123","date":"2026-05-12","domain":"variant"}) # On new conversation start recent = recall("BRCA variant questions", k=3, filter={"user_id":"u123"}) system_prompt += f"\n\nRelevant past tasks:\n{recent}"
五、什麼東西該寫進長期記憶?
「全部寫進去」聽起來方便,但會把 vector DB 變成噪音垃圾桶。Anthropic 與 LangChain 的指引建議用以下原則:
- 使用者明確要求「請記得 X」
- 偏好類事實:使用者習慣語言、時區、專業背景
- 非顯而易見的決策:「我們 H2 freeze 在 6/1」、「這個 repo 用 pnpm 不要 npm」
- 失敗或修正的軌跡:上次嘗試 X 失敗、用 Y 才成功
不該寫進去:
- 每次都能即時查的事實(用 RAG / API 即可)
- 個人敏感資料(醫療、財務、住址)除非使用者明確同意
- 暫時性對話狀態(用 working memory 就好)
"Write everything" sounds easy but turns the vector DB into a noisy junk drawer. Anthropic and LangChain guidelines suggest:
- The user explicitly asks "please remember X"
- Preferences: language, timezone, expertise
- Non-obvious decisions: "H2 freeze starts 6/1," "this repo uses pnpm, not npm"
- Failure / correction trajectories: last attempt of X failed, Y worked
Do NOT write:
- Facts that can be looked up live (use RAG / API instead)
- Sensitive PII (medical, financial, addresses) unless explicit consent
- Ephemeral conversation state (use working memory)
六、記憶層級互動模擬
下方模擬一個 agent 接到「我又要訂上次那家飯店」時各層記憶的協作:
Simulating an agent receiving "Book that hotel I stayed at last time" — see how layers cooperate:
🧠 Working Memory
承載:當前對話「我又要訂上次那家飯店」
Holds: current turn "Book that hotel I stayed at last time"
📖 Episodic Memory recall
召回:2026-03-14 訂了 Tokyo Hyatt Regency,房號 1502,4 晚
Recall: 2026-03-14 booked Tokyo Hyatt Regency, room 1502, 4 nights
📚 Semantic Memory recall
召回:使用者偏好——靠窗、高樓層、不要床型 King 以下
Recall: user prefers window, high floor, no smaller than King bed
🛠️ Procedural Memory recall
召回:訂房技能 = check_availability → confirm → send_to_user
Recall: booking skill = check_availability → confirm → send_to_user
✅ 最終行為
Agent 直接問「Tokyo Hyatt Regency,4 晚同樣偏好?」——零摩擦體驗
Agent asks "Tokyo Hyatt Regency, 4 nights, same preferences?" — zero-friction UX
🎓 章節小測
Q1. 下列哪一項不適合寫進長期記憶?
Q1. Which is not suitable for long-term memory?
Q2. Sliding window 修剪法的最大缺點是?
Q2. The biggest drawback of sliding-window trimming?
Q3. 「使用者上週和我討論過 X」是哪一種記憶?
Q3. "User discussed X with me last week" is which kind of memory?