STEP 5 / 12 · 核心元件

記憶系統

工作記憶、情節記憶、語意記憶、長期記憶——讓 Agent 跨越單次對話保持連貫性與成長性。

Working, episodic, semantic, and long-term memory — the layers that give an agent coherence and growth across sessions.

一、為什麼 Agent 需要「記憶」?

純 LLM 是「無狀態」的——每次 API 呼叫都是新的開始,模型本身不會記得你昨天的對話。Context window 是 LLM 唯一的「記憶」,而且每次呼叫都得從頭塞滿。

對於單輪 chatbot 這沒問題,但對 Agent 是災難:

  • 多輪任務需要記得前面做過什麼
  • 跨 session 的個人化(使用者偏好)需要持久儲存
  • 長對話超過 context window 後,舊資訊會被擠出
  • Agent 之間的協作需要共享狀態

記憶系統就是把「短暫的 token」變成「持久的知識」的設計。本章拆解四種記憶角色。

A bare LLM is "stateless" — every API call is a fresh start, the model itself remembers nothing of yesterday. The context window is the only memory, and every call refills it from scratch.

Fine for single-turn chatbots; catastrophic for agents:

  • Multi-step tasks need to remember prior steps
  • Cross-session personalization (user preferences) needs persistent storage
  • Long conversations past the context window evict old info
  • Multi-agent collaboration requires shared state

Memory systems convert "ephemeral tokens" into "persistent knowledge." This chapter dissects four memory roles.

二、四種 Agent 記憶類型對照

借鑑認知科學,agent 圈普遍把記憶分為四類(與 CoALA 框架一致):

Borrowing from cognitive science (and the CoALA framework), the agent community recognizes four memory types:

🧠

工作記憶 (Working)

承載:當前對話歷史 + 工具結果 + 草稿區。
實作:就是 context window 中的 messages 陣列。
壽命:單次 task。
關鍵問題:token 預算管理、修剪策略。

Holds: current conversation + tool results + scratchpad.
Implementation: the messages array in the context window.
Lifespan: one task.
Key issue: token budgeting & trimming.

📖

情節記憶 (Episodic)

承載:過去 task 的完整軌跡(query → steps → outcome)。
實作:vector DB + 時間戳索引。
壽命:跨 session 永久保存。
關鍵問題:relevance retrieval、隱私保留期。

Holds: full traces of past tasks (query → steps → outcome).
Implementation: vector DB + timestamp index.
Lifespan: persistent across sessions.
Key issue: relevance retrieval, retention policy.

📚

語意記憶 (Semantic)

承載:抽象事實與知識(使用者偏好、領域規則、文件知識)。
實作:vector DB + 知識圖譜 + RAG。
壽命:持久。
關鍵問題:更新衝突、過時資訊汰換。

Holds: abstract facts (user preferences, domain rules, doc knowledge).
Implementation: vector DB + knowledge graph + RAG.
Lifespan: persistent.
Key issue: conflict resolution, staleness eviction.

🛠️

程序記憶 (Procedural)

承載:「該怎麼做」——成功的 prompt 模板、工具序列、技能庫。
實作:prompt 庫、skill 註冊、fine-tuned weights。
壽命:持久。
關鍵問題:泛化、版本管理。

Holds: "how to do" — winning prompt templates, tool sequences, skill libraries.
Implementation: prompt library, skill registry, fine-tuned weights.
Lifespan: persistent.
Key issue: generalization, versioning.

🧭
對照人類:工作記憶 ≈ 你正在想的事;情節記憶 ≈ 「上週和 Alice 討論過 X」;語意記憶 ≈ 「Paris 是法國首都」;程序記憶 ≈ 「怎麼騎腳踏車」。 Human analogy: working ≈ what you're holding in mind; episodic ≈ "I discussed X with Alice last week"; semantic ≈ "Paris is the capital of France"; procedural ≈ "how to ride a bike."

三、Context Window 的修剪策略

當 messages 累積超過 context 上限或關鍵段位置不對時,要主動修剪。常見三種策略:

When messages exceed context or critical pieces drift to unhelpful positions, trim actively. Three common strategies:

策略做法優點 / 缺點
Sliding Window只保留最近 N 輪Keep only the last N turns✅ 簡單、可預測 / ❌ 完全失去早期上下文✅ simple, predictable / ❌ loses all early context
Summarization每 K 輪用 LLM 摘要舊訊息,替換掉Every K turns, LLM-summarize old messages and replace✅ 保留語意 / ❌ 摘要也會失真、額外成本✅ retains gist / ❌ summary distorts, extra cost
Hierarchical / Selective舊訊息存 vector DB,需要時才檢索回來Push old messages to vector DB, retrieve on demand✅ 無上限、保真高 / ❌ 工程複雜✅ unlimited, high fidelity / ❌ engineering overhead
def trim_messages(messages, max_tokens=12000, keep_last=8):
    if count_tokens(messages) <= max_tokens:
        return messages
    system, *rest = messages
    head, tail = rest[:-keep_last], rest[-keep_last:]
    summary = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=600,
        messages=[{"role":"user",
                   "content": "Summarize the following conversation in <200 words, "
                              "preserve all decisions and tool results:\n" + render(head)}]
    ).content[0].text
    return [system, {"role":"user","content":f"[Earlier conversation summary] {summary}"}, *tail]

四、用向量資料庫實作情節 + 語意記憶

長期記憶的標準架構:把每段「值得記住」的內容用 embedding 模型轉成向量,存進向量 DB。Agent 每次新對話開始或需要回憶時,把當前 query 也 embed 成向量,檢索最相似的 K 條塞進 context。

2026 主流向量 DB:

  • Pinecone — managed、生產級、貴
  • Qdrant / Milvus — open source、自託管
  • Weaviate — 內建 hybrid search
  • pgvector — Postgres 擴充,<1M 向量首選
  • Chroma / LanceDB — 輕量本機開發

Standard architecture: embed every "memory-worthy" snippet, store in a vector DB. On a new turn, embed the query and retrieve top-K most-similar items into context.

Mainstream vector DBs in 2026:

  • Pinecone — managed, production, pricey
  • Qdrant / Milvus — open-source, self-hostable
  • Weaviate — built-in hybrid search
  • pgvector — Postgres extension, ideal for <1M vectors
  • Chroma / LanceDB — lightweight local dev
# pip install chromadb openai
import chromadb
from openai import OpenAI
client = OpenAI()
db     = chromadb.PersistentClient(path="./agent_memory")
mem    = db.get_or_create_collection("episodes")

def embed(text):
    return client.embeddings.create(model="text-embedding-3-small", input=text).data[0].embedding

def remember(episode_id, text, metadata):
    mem.add(ids=[episode_id], documents=[text], embeddings=[embed(text)], metadatas=[metadata])

def recall(query, k=5, filter=None):
    res = mem.query(query_embeddings=[embed(query)], n_results=k, where=filter)
    return [{"text":d,"meta":m} for d,m in zip(res["documents"][0], res["metadatas"][0])]

# Usage in agent loop
remember("task_2026_05_12_001",
         "User asked to triage BRCA1 c.5266dupC. Agent used PubMed + ClinVar. Verdict: pathogenic.",
         {"user_id":"u123","date":"2026-05-12","domain":"variant"})

# On new conversation start
recent = recall("BRCA variant questions", k=3, filter={"user_id":"u123"})
system_prompt += f"\n\nRelevant past tasks:\n{recent}"

五、什麼東西該寫進長期記憶?

「全部寫進去」聽起來方便,但會把 vector DB 變成噪音垃圾桶。Anthropic 與 LangChain 的指引建議用以下原則:

  • 使用者明確要求「請記得 X」
  • 偏好類事實:使用者習慣語言、時區、專業背景
  • 非顯而易見的決策:「我們 H2 freeze 在 6/1」、「這個 repo 用 pnpm 不要 npm」
  • 失敗或修正的軌跡:上次嘗試 X 失敗、用 Y 才成功

不該寫進去:

  • 每次都能即時查的事實(用 RAG / API 即可)
  • 個人敏感資料(醫療、財務、住址)除非使用者明確同意
  • 暫時性對話狀態(用 working memory 就好)

"Write everything" sounds easy but turns the vector DB into a noisy junk drawer. Anthropic and LangChain guidelines suggest:

  • The user explicitly asks "please remember X"
  • Preferences: language, timezone, expertise
  • Non-obvious decisions: "H2 freeze starts 6/1," "this repo uses pnpm, not npm"
  • Failure / correction trajectories: last attempt of X failed, Y worked

Do NOT write:

  • Facts that can be looked up live (use RAG / API instead)
  • Sensitive PII (medical, financial, addresses) unless explicit consent
  • Ephemeral conversation state (use working memory)

六、記憶層級互動模擬

下方模擬一個 agent 接到「我又要訂上次那家飯店」時各層記憶的協作:

Simulating an agent receiving "Book that hotel I stayed at last time" — see how layers cooperate:

🧠 Working Memory

承載:當前對話「我又要訂上次那家飯店」

Holds: current turn "Book that hotel I stayed at last time"

📖 Episodic Memory recall

召回:2026-03-14 訂了 Tokyo Hyatt Regency,房號 1502,4 晚

Recall: 2026-03-14 booked Tokyo Hyatt Regency, room 1502, 4 nights

📚 Semantic Memory recall

召回:使用者偏好——靠窗、高樓層、不要床型 King 以下

Recall: user prefers window, high floor, no smaller than King bed

🛠️ Procedural Memory recall

召回:訂房技能 = check_availability → confirm → send_to_user

Recall: booking skill = check_availability → confirm → send_to_user

最終行為

Agent 直接問「Tokyo Hyatt Regency,4 晚同樣偏好?」——零摩擦體驗

Agent asks "Tokyo Hyatt Regency, 4 nights, same preferences?" — zero-friction UX

🎓 章節小測

Q1. 下列哪一項不適合寫進長期記憶?

Q1. Which is not suitable for long-term memory?

A) 使用者偏好的程式語言
B) 上次任務的修正軌跡
C) 未經同意的醫療紀錄
D) 專案的特殊命名規則
✅ 敏感 PII 沒有明確同意絕不可寫進長期記憶。✅ Sensitive PII without consent must never be persisted.

Q2. Sliding window 修剪法的最大缺點是?

Q2. The biggest drawback of sliding-window trimming?

A) 太貴
B) 完全失去早期上下文
C) 違反 GDPR
D) 沒辦法搭配工具
✅ 系統指令、初始任務描述會被切掉,導致 agent 「忘記目標」。✅ System instructions and the original task get cut, so the agent "forgets" its goal.

Q3. 「使用者上週和我討論過 X」是哪一種記憶?

Q3. "User discussed X with me last week" is which kind of memory?

A) Working
B) Episodic
C) Semantic
D) Procedural
✅ 帶時間戳的具體事件 = 情節記憶。✅ Time-stamped concrete events = episodic memory.