diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index c24c9b701..2f065c4fe 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -20,7 +20,7 @@ jobs:
           python-version: "3.11"
 
       - name: Install dependencies
-        run: pip install anthropic python-dotenv pytest
+        run: pip install -r requirements.txt pytest
 
       - name: Run Python smoke tests
         run: python -m pytest tests -q
diff --git a/s08_context_compact/README.en.md b/s08_context_compact/README.en.md
index 6c5941296..6b2f23593 100644
--- a/s08_context_compact/README.en.md
+++ b/s08_context_compact/README.en.md
@@ -39,20 +39,24 @@ Core design: cheap first, expensive last.
 
 The agent ran 80 turns of conversation, accumulating 160 `messages`. The very first "help me create hello.py" is barely relevant to current work, yet it still occupies space.
 
-Message count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle:
+Message count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle; the only extra boundary rule is that `assistant(tool_use)` must not be separated from the following `user(tool_result)`:
 
 ```python
 def snip_compact(messages, max_messages=50):
     if len(messages) <= max_messages:
         return messages
-    keep_head, keep_tail = 3, max_messages - 3
-    snipped = len(messages) - keep_head - keep_tail
-    placeholder = {"role": "user",
-                   "content": f"[snipped {snipped} messages from conversation middle]"}
-    return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
+    head_end, tail_start = 3, len(messages) - (max_messages - 3)
+    if _message_has_tool_use(messages[head_end - 1]):
+        while head_end < len(messages) and _is_tool_result_message(messages[head_end]):
+            head_end += 1
+    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):
+        tail_start -= 1
+    snipped = tail_start - head_end
+    placeholder = {"role": "user", "content": f"[snipped {snipped} messages from conversation middle]"}
+    return messages[:head_end] + [placeholder] + messages[tail_start:]
 ```
 
-Entire messages are trimmed, but `tool_result` content within remaining messages keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.
+Messages are still trimmed directly; this just adds one boundary guard. `tool_result` content within remaining messages still keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.
 
 ### L2: micro_compact — Placeholder for Old Tool Results
 
@@ -130,15 +134,17 @@ def compact_history(messages):
 
 Sometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers.
 
-This triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, trimming to an API-acceptable size with byte-level precision, keeping only the last 5 messages + summary.
+This triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, but still avoids leaving an orphaned `tool_result`.
 
 ```python
 def reactive_compact(messages):
     transcript = write_transcript(messages)
     summary = summarize_history(messages)
-    tail = messages[-5:]
+    tail_start = max(0, len(messages) - 5)
+    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):
+        tail_start -= 1
     return [{"role": "user",
-             "content": f"[Reactive compact]\n\n{summary}"}, *tail]
+             "content": f"[Reactive compact]\n\n{summary}"}, *messages[tail_start:]]
 ```
 
 Reactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11.
diff --git a/s08_context_compact/README.ja.md b/s08_context_compact/README.ja.md
index 934ae5564..84bfb381a 100644
--- a/s08_context_compact/README.ja.md
+++ b/s08_context_compact/README.ja.md
@@ -39,20 +39,24 @@ s07 のフック構造、スキルロード、サブ Agent の骨格を維持し
 
 Agent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。
 
-メッセージ数が 50 を超えた場合 → 先頭 3 件（初期コンテキスト）と末尾 47 件（現在の作業）を保持し、中間を切り捨て：
+メッセージ数が 50 を超えた場合 → 先頭 3 件（初期コンテキスト）と末尾 47 件（現在の作業）を保持して中間を切り詰める。ただし切れ目だけは調整し、`assistant(tool_use)` と後続の `user(tool_result)` を分断しない：
 
 ```python
 def snip_compact(messages, max_messages=50):
     if len(messages) <= max_messages:
         return messages
-    keep_head, keep_tail = 3, max_messages - 3
-    snipped = len(messages) - keep_head - keep_tail
-    placeholder = {"role": "user",
-                   "content": f"[snipped {snipped} messages from conversation middle]"}
-    return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
+    head_end, tail_start = 3, len(messages) - (max_messages - 3)
+    if _message_has_tool_use(messages[head_end - 1]):
+        while head_end < len(messages) and _is_tool_result_message(messages[head_end]):
+            head_end += 1
+    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):
+        tail_start -= 1
+    snipped = tail_start - head_end
+    placeholder = {"role": "user", "content": f"[snipped {snipped} messages from conversation middle]"}
+    return messages[:head_end] + [placeholder] + messages[tail_start:]
 ```
 
-メッセージ全体は切り捨てたが、残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。
+切り捨て自体は単純なままで、境界だけを保護する。残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。
 
 ### L2: micro_compact — 古いツール結果をプレースホルダに置換
 
@@ -130,15 +134,17 @@ def compact_history(messages):
 
 API がまだ `prompt_too_long`（413）を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。
 
-この時 **reactive_compact** がトリガーされる：compact_history よりもさらに積極的で、末尾からバイト単位の精度で API が受け入れ可能なサイズまで切り詰め、最後の 5 件のメッセージ + 要約のみを保持。
+この時 **reactive_compact** がトリガーされる：compact_history よりもさらに積極的だが、末尾を残す際も孤立した `tool_result` を残さないようにする。
 
 ```python
 def reactive_compact(messages):
     transcript = write_transcript(messages)
     summary = summarize_history(messages)
-    tail = messages[-5:]
+    tail_start = max(0, len(messages) - 5)
+    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):
+        tail_start -= 1
     return [{"role": "user",
-             "content": f"[Reactive compact]\n\n{summary}"}, *tail]
+             "content": f"[Reactive compact]\n\n{summary}"}, *messages[tail_start:]]
 ```
 
 reactive compact にはリトライ上限がある（デフォルト 1 回）。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。
diff --git a/s08_context_compact/README.md b/s08_context_compact/README.md
index c8e3c1cb4..22d967156 100644
--- a/s08_context_compact/README.md
+++ b/s08_context_compact/README.md
@@ -39,20 +39,24 @@ Agent 跑着跑着，不动了。
 
 Agent 跑了 80 轮对话，`messages` 攒了 160 条。最前面的"帮我创建 hello.py"和当前工作几乎无关了，但全占着位置。
 
-消息数超过 50 条 → 保留头部 3 条（初始上下文）和尾部 47 条（当前工作），中间裁掉：
+消息数超过 50 条 → 保留头部 3 条（初始上下文）和尾部 47 条（当前工作），中间裁掉；唯一额外边界条件是，不能把 `assistant(tool_use)` 和后面的 `user(tool_result)` 拆开：
 
 ```python
 def snip_compact(messages, max_messages=50):
     if len(messages) <= max_messages:
         return messages
-    keep_head, keep_tail = 3, max_messages - 3
-    snipped = len(messages) - keep_head - keep_tail
-    placeholder = {"role": "user",
-                   "content": f"[snipped {snipped} messages from conversation middle]"}
-    return messages[:keep_head] + [placeholder] + messages[-keep_tail:]
+    head_end, tail_start = 3, len(messages) - (max_messages - 3)
+    if _message_has_tool_use(messages[head_end - 1]):
+        while head_end < len(messages) and _is_tool_result_message(messages[head_end]):
+            head_end += 1
+    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):
+        tail_start -= 1
+    snipped = tail_start - head_end
+    placeholder = {"role": "user", "content": f"[snipped {snipped} messages from conversation middle]"}
+    return messages[:head_end] + [placeholder] + messages[tail_start:]
 ```
 
-裁掉了整条消息，但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。
+裁掉的是消息本身，只是在切口处多做一步保护；剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。
 
 ### L2: micro_compact — 旧工具结果占位
 
@@ -130,15 +134,17 @@ def compact_history(messages):
 
 有时候 API 还是返回 `prompt_too_long`（413），上下文增长速度快于压缩触发速度时。
 
-这时触发 **reactive_compact**：比 compact_history 更激进，从尾部回退，以字节级精度裁剪到 API 可接受的大小，只保留最后 5 条消息 + 摘要。
+这时触发 **reactive_compact**：比 compact_history 更激进，从尾部回退，但仍要避免留下孤立 `tool_result`。
 
 ```python
 def reactive_compact(messages):
     transcript = write_transcript(messages)
     summary = summarize_history(messages)
-    tail = messages[-5:]
+    tail_start = max(0, len(messages) - 5)
+    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):
+        tail_start -= 1
     return [{"role": "user",
-             "content": f"[Reactive compact]\n\n{summary}"}, *tail]
+             "content": f"[Reactive compact]\n\n{summary}"}, *messages[tail_start:]]
 ```
 
 reactive compact 有重试上限（默认 1 次）。再失败就抛出异常，不无限循环。完整的错误恢复逻辑留给 s11。
diff --git a/s08_context_compact/code.py b/s08_context_compact/code.py
index a9cc3092b..b9d78d425 100644
--- a/s08_context_compact/code.py
+++ b/s08_context_compact/code.py
@@ -268,13 +268,45 @@ def spawn_subagent(task: str) -> str:
 
 def estimate_size(msgs): return len(str(msgs))
 
+def _block_type(block):
+    return block.get("type") if isinstance(block, dict) else getattr(block, "type", None)
+
+
+def _message_has_tool_use(msg):
+    if msg.get("role") != "assistant":
+        return False
+    content = msg.get("content")
+    if not isinstance(content, list):
+        return False
+    return any(_block_type(block) == "tool_use" for block in content)
+
+
+def _is_tool_result_message(msg):
+    if msg.get("role") != "user":
+        return False
+    content = msg.get("content")
+    if not isinstance(content, list):
+        return False
+    return any(isinstance(block, dict) and block.get("type") == "tool_result"
+               for block in content)
+
 
 # L1: snipCompact — trim middle messages
 def snip_compact(messages, max_messages=50):
     if len(messages) <= max_messages: return messages
     keep_head, keep_tail = 3, max_messages - 3
-    snipped = len(messages) - keep_head - keep_tail
-    return messages[:keep_head] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] + messages[-keep_tail:]
+    head_end, tail_start = keep_head, len(messages) - keep_tail
+    if head_end > 0 and _message_has_tool_use(messages[head_end - 1]):
+        while head_end < len(messages) and _is_tool_result_message(messages[head_end]):
+            head_end += 1
+    if (tail_start > 0 and tail_start < len(messages)
+            and _is_tool_result_message(messages[tail_start])
+            and _message_has_tool_use(messages[tail_start - 1])):
+        tail_start -= 1
+    if head_end >= tail_start:
+        return messages
+    snipped = tail_start - head_end
+    return messages[:head_end] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] + messages[tail_start:]
 
 
 # L2: microCompact — old result placeholders
@@ -351,7 +383,12 @@ def compact_history(messages):
 def reactive_compact(messages):
     transcript = write_transcript(messages)
     summary = summarize_history(messages)
-    return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *messages[-5:]]
+    tail_start = max(0, len(messages) - 5)
+    if (tail_start > 0 and tail_start < len(messages)
+            and _is_tool_result_message(messages[tail_start])
+            and _message_has_tool_use(messages[tail_start - 1])):
+        tail_start -= 1
+    return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *messages[tail_start:]]
 
 
 # ═══════════════════════════════════════════════════════════
diff --git a/s09_memory/code.py b/s09_memory/code.py
index 2f660e769..f80a92636 100644
--- a/s09_memory/code.py
+++ b/s09_memory/code.py
@@ -449,9 +449,38 @@ def spawn_subagent(task: str) -> str:
 
 def estimate_size(msgs): return len(str(msgs))
 
+def _block_type(block):
+    return block.get("type") if isinstance(block, dict) else getattr(block, "type", None)
+
+def _message_has_tool_use(msg):
+    if msg.get("role") != "assistant":
+        return False
+    content = msg.get("content")
+    if not isinstance(content, list):
+        return False
+    return any(_block_type(block) == "tool_use" for block in content)
+
+def _is_tool_result_message(msg):
+    if msg.get("role") != "user":
+        return False
+    content = msg.get("content")
+    if not isinstance(content, list):
+        return False
+    return any(isinstance(block, dict) and block.get("type") == "tool_result" for block in content)
+
 def snip_compact(msgs, mx=50):
     if len(msgs) <= mx: return msgs
-    return msgs[:3] + [{"role": "user", "content": f"[snipped {len(msgs)-mx} msgs]"}] + msgs[-(mx-3):]
+    head_end, tail_start = 3, len(msgs) - (mx - 3)
+    if head_end > 0 and _message_has_tool_use(msgs[head_end - 1]):
+        while head_end < len(msgs) and _is_tool_result_message(msgs[head_end]):
+            head_end += 1
+    if (tail_start > 0 and tail_start < len(msgs)
+            and _is_tool_result_message(msgs[tail_start])
+            and _message_has_tool_use(msgs[tail_start - 1])):
+        tail_start -= 1
+    if head_end >= tail_start:
+        return msgs
+    return msgs[:head_end] + [{"role": "user", "content": f"[snipped {tail_start - head_end} msgs]"}] + msgs[tail_start:]
 
 def collect_tool_results(msgs):
     blocks = []
@@ -512,7 +541,12 @@ def compact_history(msgs):
 def reactive_compact(msgs):
     write_transcript(msgs)
     summary = summarize_history(msgs)
-    return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *msgs[-5:]]
+    tail_start = max(0, len(msgs) - 5)
+    if (tail_start > 0 and tail_start < len(msgs)
+            and _is_tool_result_message(msgs[tail_start])
+            and _message_has_tool_use(msgs[tail_start - 1])):
+        tail_start -= 1
+    return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *msgs[tail_start:]]
 
 
 # ═══════════════════════════════════════════════════════════
diff --git a/s20_comprehensive/code.py b/s20_comprehensive/code.py
index 12142e775..bd62553e0 100644
--- a/s20_comprehensive/code.py
+++ b/s20_comprehensive/code.py
@@ -1060,6 +1060,28 @@ def spawn_subagent(description: str) -> str:
 def estimate_size(messages: list) -> int:
     return len(json.dumps(messages, default=str))
 
+def block_type(block):
+    return block.get("type") if isinstance(block, dict) else getattr(block, "type", None)
+
+
+def message_has_tool_use(message: dict) -> bool:
+    if message.get("role") != "assistant":
+        return False
+    content = message.get("content")
+    if not isinstance(content, list):
+        return False
+    return any(block_type(block) == "tool_use" for block in content)
+
+
+def is_tool_result_message(message: dict) -> bool:
+    if message.get("role") != "user":
+        return False
+    content = message.get("content")
+    if not isinstance(content, list):
+        return False
+    return any(isinstance(block, dict) and block.get("type") == "tool_result"
+               for block in content)
+
 
 def collect_tool_results(messages: list):
     found = []
@@ -1111,11 +1133,20 @@ def tool_result_budget(messages: list, max_bytes: int = 200_000) -> list:
 def snip_compact(messages: list, max_messages: int = 50) -> list:
     if len(messages) <= max_messages:
         return messages
-    keep_head, keep_tail = 3, max_messages - 3
-    snipped = len(messages) - keep_head - keep_tail
-    return (messages[:keep_head]
+    head_end, tail_start = 3, len(messages) - (max_messages - 3)
+    if head_end > 0 and message_has_tool_use(messages[head_end - 1]):
+        while head_end < len(messages) and is_tool_result_message(messages[head_end]):
+            head_end += 1
+    if (tail_start > 0 and tail_start < len(messages)
+            and is_tool_result_message(messages[tail_start])
+            and message_has_tool_use(messages[tail_start - 1])):
+        tail_start -= 1
+    if head_end >= tail_start:
+        return messages
+    snipped = tail_start - head_end
+    return (messages[:head_end]
             + [{"role": "user", "content": f"[snipped {snipped} messages]"}]
-            + messages[-keep_tail:])
+            + messages[tail_start:])
 
 
 def micro_compact(messages: list) -> list:
@@ -1163,8 +1194,13 @@ def reactive_compact(messages: list) -> list:
         summary = summarize_history(messages)
     except Exception:
         summary = "Earlier conversation was trimmed after a prompt-too-long error."
+    tail_start = max(0, len(messages) - 5)
+    if (tail_start > 0 and tail_start < len(messages)
+            and is_tool_result_message(messages[tail_start])
+            and message_has_tool_use(messages[tail_start - 1])):
+        tail_start -= 1
     return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"},
-            *messages[-5:]]
+            *messages[tail_start:]]
 
 
 # ── Error Recovery ──
diff --git a/tests/test_compaction_tool_pairs.py b/tests/test_compaction_tool_pairs.py
new file mode 100644
index 000000000..e4f67d7b2
--- /dev/null
+++ b/tests/test_compaction_tool_pairs.py
@@ -0,0 +1,189 @@
+import importlib.util
+import os
+import sys
+import tempfile
+import types
+import unittest
+from pathlib import Path
+
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+MODULES = {
+    "s08": REPO_ROOT / "s08_context_compact" / "code.py",
+    "s09": REPO_ROOT / "s09_memory" / "code.py",
+    "s20": REPO_ROOT / "s20_comprehensive" / "code.py",
+}
+
+
+def load_module(name: str, path: Path, temp_cwd: Path):
+    fake_anthropic = types.ModuleType("anthropic")
+
+    class FakeAnthropic:
+        def __init__(self, *args, **kwargs):
+            self.messages = types.SimpleNamespace(create=None)
+
+    fake_dotenv = types.ModuleType("dotenv")
+    setattr(fake_anthropic, "Anthropic", FakeAnthropic)
+    setattr(fake_dotenv, "load_dotenv", lambda override=True: None)
+
+    previous_anthropic = sys.modules.get("anthropic")
+    previous_dotenv = sys.modules.get("dotenv")
+    previous_cwd = Path.cwd()
+    previous_model = os.environ.get("MODEL_ID")
+    previous_key = os.environ.get("ANTHROPIC_API_KEY")
+
+    spec = importlib.util.spec_from_file_location(name, path)
+    if spec is None or spec.loader is None:
+        raise RuntimeError(f"Unable to load {path}")
+    module = importlib.util.module_from_spec(spec)
+
+    sys.modules["anthropic"] = fake_anthropic
+    sys.modules["dotenv"] = fake_dotenv
+    os.environ["MODEL_ID"] = "test-model"
+    os.environ["ANTHROPIC_API_KEY"] = "test-key"
+    try:
+        os.chdir(temp_cwd)
+        spec.loader.exec_module(module)
+        return module
+    finally:
+        os.chdir(previous_cwd)
+        if previous_anthropic is None:
+            sys.modules.pop("anthropic", None)
+        else:
+            sys.modules["anthropic"] = previous_anthropic
+        if previous_dotenv is None:
+            sys.modules.pop("dotenv", None)
+        else:
+            sys.modules["dotenv"] = previous_dotenv
+        if previous_model is None:
+            os.environ.pop("MODEL_ID", None)
+        else:
+            os.environ["MODEL_ID"] = previous_model
+        if previous_key is None:
+            os.environ.pop("ANTHROPIC_API_KEY", None)
+        else:
+            os.environ["ANTHROPIC_API_KEY"] = previous_key
+
+
+def assistant_text():
+    return {"role": "assistant", "content": [types.SimpleNamespace(type="text", text="ok")]}
+
+
+def user_text():
+    return {"role": "user", "content": "continue"}
+
+
+def tool_use_message(tool_id="tool-1"):
+    return {
+        "role": "assistant",
+        "content": [types.SimpleNamespace(type="tool_use", id=tool_id, name="bash")],
+    }
+
+
+def tool_result_message(tool_id="tool-1"):
+    return {
+        "role": "user",
+        "content": [{"type": "tool_result", "tool_use_id": tool_id, "content": "ok"}],
+    }
+
+
+def message_has_tool_use(message):
+    content = message.get("content")
+    return (
+        message.get("role") == "assistant"
+        and isinstance(content, list)
+        and any(getattr(block, "type", None) == "tool_use" for block in content)
+    )
+
+
+def assert_no_orphan_tool_results(testcase, messages):
+    for idx, message in enumerate(messages):
+        content = message.get("content")
+        if message.get("role") != "user" or not isinstance(content, list):
+            continue
+        if not any(isinstance(block, dict) and block.get("type") == "tool_result" for block in content):
+            continue
+        testcase.assertGreater(idx, 0)
+        testcase.assertTrue(message_has_tool_use(messages[idx - 1]), messages)
+
+
+class CompactionToolPairTests(unittest.TestCase):
+    def test_snip_compact_keeps_head_tool_pair(self):
+        messages = [
+            user_text(),
+            assistant_text(),
+            tool_use_message("head-tool"),
+            tool_result_message("head-tool"),
+            assistant_text(),
+            user_text(),
+            assistant_text(),
+            user_text(),
+            assistant_text(),
+            user_text(),
+        ]
+
+        for name, path in MODULES.items():
+            with self.subTest(name=name), tempfile.TemporaryDirectory() as tmp:
+                module = load_module(f"{name}_head_under_test", path, Path(tmp))
+                if name == "s09":
+                    compacted = module.snip_compact(list(messages), mx=6)
+                else:
+                    compacted = module.snip_compact(list(messages), max_messages=6)
+                self.assertEqual(compacted[2], messages[2])
+                self.assertEqual(compacted[3], messages[3])
+                assert_no_orphan_tool_results(self, compacted)
+
+    def test_snip_compact_keeps_tail_tool_pair(self):
+        messages = [
+            user_text(),
+            assistant_text(),
+            user_text(),
+            assistant_text(),
+            user_text(),
+            assistant_text(),
+            tool_use_message("tail-tool"),
+            tool_result_message("tail-tool"),
+            assistant_text(),
+            user_text(),
+        ]
+
+        for name, path in MODULES.items():
+            with self.subTest(name=name), tempfile.TemporaryDirectory() as tmp:
+                module = load_module(f"{name}_under_test", path, Path(tmp))
+                if name == "s09":
+                    compacted = module.snip_compact(list(messages), mx=6)
+                else:
+                    compacted = module.snip_compact(list(messages), max_messages=6)
+                assert_no_orphan_tool_results(self, compacted)
+
+    def test_reactive_compact_keeps_tail_tool_pair(self):
+        messages = [
+            user_text(),
+            assistant_text(),
+            user_text(),
+            tool_use_message("reactive-tool"),
+            tool_result_message("reactive-tool"),
+            assistant_text(),
+            user_text(),
+            assistant_text(),
+            user_text(),
+        ]
+
+        for name, path in MODULES.items():
+            with self.subTest(name=name), tempfile.TemporaryDirectory() as tmp:
+                module = load_module(f"{name}_reactive_under_test", path, Path(tmp))
+                module.write_transcript = lambda _messages: Path("transcript.jsonl")
+                module.summarize_history = lambda _messages: "summary"
+                compacted = module.reactive_compact(list(messages))
+                self.assertEqual(compacted[1], messages[3])
+                assert_no_orphan_tool_results(self, compacted)
+
+    def test_s20_has_tool_use_still_accepts_content_blocks(self):
+        with tempfile.TemporaryDirectory() as tmp:
+            module = load_module("s20_has_tool_use_under_test", MODULES["s20"], Path(tmp))
+            self.assertTrue(module.has_tool_use([types.SimpleNamespace(type="tool_use")]))
+            self.assertFalse(module.has_tool_use([types.SimpleNamespace(type="text")]))
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/web/src/data/generated/docs.json b/web/src/data/generated/docs.json
index 3f6bb41c5..4e50b7380 100644
--- a/web/src/data/generated/docs.json
+++ b/web/src/data/generated/docs.json
@@ -129,19 +129,19 @@
     "version": "s08",
     "locale": "en",
     "title": "s08: Context Compact — Context Will Fill Up, Have a Way to Make Room",
-    "content": "# s08: Context Compact — Context Will Fill Up, Have a Way to Make Room\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/en/s09) → s10 → ... → s20\n> *\"Context will fill up — have a way to make room\"* — Four-layer compression pipeline: cheap first, expensive last.\n>\n> **Harness Layer**: Compression — clean memory, unlimited sessions.\n\n---\n\n## The Problem\n\nThe agent is running along, then freezes.\n\nIt has bash, read, write — all the capabilities it needs. But it read a 1000-line file (~4000 tokens), then read 30 more files, ran 20 commands. Every command's output, every file's contents, all pile up in the `messages` list.\n\nThe context window is finite. Once full, the API outright rejects the call: `prompt_too_long`.\n\nWithout compression, an agent simply cannot work on large projects.\n\n---\n\n## The Solution\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.en.svg)\n\nThe hook structure, skill loading, and sub-Agent from s07 are preserved, with some tools omitted to focus on compaction. The core change: insert three pre-processors (0 API calls) before each LLM call, trigger an LLM summary (1 API call) when tokens still exceed the threshold, and emergency-trim if the API throws an error.\n\nCore design: cheap first, expensive last.\n\n---\n\n## How It Works\n\n![Four-layer compression pipeline](/course-assets/s08_context_compact/compaction-layers.en.svg)\n\n### L1: snip_compact — Trim Irrelevant Old Conversation\n\nThe agent ran 80 turns of conversation, accumulating 160 `messages`. The very first \"help me create hello.py\" is barely relevant to current work, yet it still occupies space.\n\nMessage count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle:\n\n```python\ndef snip_compact(messages, max_messages=50):\n    if len(messages) <= max_messages:\n        return messages\n    keep_head, keep_tail = 3, max_messages - 3\n    snipped = len(messages) - keep_head - keep_tail\n    placeholder = {\"role\": \"user\",\n                   \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n    return messages[:keep_head] + [placeholder] + messages[-keep_tail:]\n```\n\nEntire messages are trimmed, but `tool_result` content within remaining messages keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.\n\n### L2: micro_compact — Placeholder for Old Tool Results\n\n![Old results placeholder](/course-assets/s08_context_compact/micro-compact.en.svg)\n\nThe agent read 10 files consecutively. The full contents of reads 1–7 are still sitting in context, no longer needed, but hogging large amounts of space.\n\nKeep only the 3 most recent `tool_result` entries intact; replace older ones with a one-line placeholder:\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n    tool_results = collect_tool_result_blocks(messages)\n    if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n        return messages\n    for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n        if len(block.get(\"content\", \"\")) > 120:\n            block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n    return messages\n```\n\nOld results are cleared, but a single new result can be 500KB — one `cat` of a large file can max out the context. → L3.\n\n### L3: tool_result_budget — Persist Large Results to Disk\n\n![Large results to disk](/course-assets/s08_context_compact/layer1-budget.en.svg)\n\nThe model read 5 large files in one go; all `tool_result` blocks in the last user message total 500KB.\n\nSum the size of all `tool_result` blocks in the last user message. If over 200KB → sort by size, starting from the largest, persist to `.task_outputs/tool-results/`, keeping only a `<persisted-output>` marker + a 2000-character preview in context. The model sees the marker and knows the full content is on disk, re-reading it when needed.\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n    last = messages[-1]\n    blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n              if b.get(\"type\") == \"tool_result\"]\n    total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n    if total <= max_bytes:\n        return messages\n    ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n    for idx, block in ranked:\n        if total <= max_bytes:\n            break\n        block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n        total = recalculate_total(blocks)\n    return messages\n```\n\nThe first three layers are all plain-text / structural operations — 0 API calls — but they cannot \"understand\" conversation content. Context may still be too large. → L4.\n\n### L4: compact_history — Full LLM Summary\n\n![Full LLM summary](/course-assets/s08_context_compact/auto-compact.en.svg)\n\nAll three previous layers have run, but after 30 minutes of continuous work on a huge project, tokens still exceed the threshold.\n\nThree-step process:\n\n1. **Save transcript**: Write the full conversation to `.transcripts/` in JSONL format. The transcript preserves a recoverable record, but the model's active context only contains the summary. For the model's current reasoning, the details are no longer in context. The teaching code does not provide a transcript retrieval tool.\n2. **LLM generates summary**: Send conversation history to the LLM, asking it to preserve key information: current goals, important findings, modified files, remaining work, user constraints, etc.\n3. **Replace message list**: All old messages are replaced with a single summary. The teaching version only keeps the summary; the real Claude Code re-attaches some recent files, plans, agent/skill/tool context after compaction.\n\n```python\ndef compact_history(messages):\n    transcript_path = write_transcript(messages)  # Save full conversation first\n    summary = summarize_history(messages)          # LLM generates summary\n    return [{\"role\": \"user\",\n             \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**Circuit breaker**: After 3 consecutive failures, stop retrying to prevent an infinite loop wasting API calls.\n\n### Reactive: reactive_compact\n\nSometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers.\n\nThis triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, trimming to an API-acceptable size with byte-level precision, keeping only the last 5 messages + summary.\n\n```python\ndef reactive_compact(messages):\n    transcript = write_transcript(messages)\n    summary = summarize_history(messages)\n    tail = messages[-5:]\n    return [{\"role\": \"user\",\n             \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *tail]\n```\n\nReactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11.\n\n### Putting It All Together\n\n```python\ndef agent_loop(messages):\n    reactive_retries = 0\n    while True:\n        # Three pre-processors (0 API calls)\n        # Order: budget first, so large content is persisted before placeholders\n        messages[:] = tool_result_budget(messages)    # L3: persist large results\n        messages[:] = snip_compact(messages)          # L1: trim middle\n        messages[:] = micro_compact(messages)         # L2: old result placeholders\n\n        # Still too much? LLM summary (1 API call)\n        if estimate_token_count(messages) > THRESHOLD:\n            messages[:] = compact_history(messages)\n\n        try:\n            response = client.messages.create(...)\n        except PromptTooLongError:\n            if reactive_retries < MAX_REACTIVE_RETRIES:\n                messages[:] = reactive_compact(messages)  # Emergency\n                reactive_retries += 1\n                continue\n            raise  # retry limit exceeded, raise exception\n        # ... tool execution ...\n\n        # compact tool: when the model actively calls it, triggers compact_history\n        if block.name == \"compact\":\n            messages[:] = compact_history(messages)\n            results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n            messages.append({\"role\": \"user\", \"content\": results})\n            break  # end current turn, start fresh with compacted context\n```\n\n**The order must not be swapped.** L3 (budget) runs before L2 (micro) because micro replaces old large tool_results with one-line placeholders — budget must persist the full content before that happens. This is why CC source puts `applyToolResultBudget` first.\n\n---\n\n## Changes From s07\n\n| Component | Before (s07) | After (s08) |\n|-----------|-------------|-------------|\n| Context management | None (context grows unbounded) | Four-layer compression pipeline + emergency |\n| New functions | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| Tools | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| Loop | LLM call → tool execution | Three pre-processors before each turn + threshold-triggered compact_history |\n| Design principle | — | Cheap first, expensive last |\n\n---\n\n## Try It\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\nTry these prompts:\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md` (read multiple files consecutively, observe L2 compressing old results)\n2. `Read every file in s08_context_compact/` (read a large amount of content at once, observe L3 persisting to disk)\n3. Chat for 20+ turns, observe whether `[auto compact]` or `[reactive compact]` appears\n\nWhat to watch for: After each tool execution, are old `tool_result` entries compressed? When tokens exceed the threshold after extended conversation, is summarization triggered automatically?\n\n---\n\n## What's Next\n\nContext compression lets an agent run for a long time without crashing. But after each compression, the preferences and constraints the user told it are also lost. Can we let the agent selectively remember important things?\n\ns09 Memory → three subsystems: choosing what to remember, extracting key information, consolidating and organizing. Across compressions, across sessions.\n\n<details>\n<summary>Deep Dive Into CC Source Code</summary>\n\n> The following is based on analysis of CC source code `compact.ts`, `autoCompact.ts`, `microCompact.ts`, and `query.ts`.\n\n### Execution Order Comparison\n\nThe teaching version labels layers L1/L2/L3/L4 for pedagogical clarity, but actual execution order does not match the numbering:\n\n| Dimension | Teaching Version | Claude Code |\n|-----------|-----------------|-------------|\n| Execution order | budget → snip → micro → auto | budget → snip → micro → collapse → auto (`query.ts:379-468`) |\n| snip_compact | Keep head 3 + tail 47 | CC only enables on main thread; implementation not in open-source repo (`HISTORY_SNIP` feature gate), but interface is visible: `snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`, also exposes `SnipTool` for model-initiated snipping. Teaching version's 3/47 are simplified parameters |\n| micro_compact | Text placeholder replacement | Two paths: time-based clears content directly, cached uses API `cache_edits` (legacy path removed) |\n| micro_compact whitelist | By position (most recent 3) | time-based triggers by time threshold; cached triggers by count (`microCompact.ts`) |\n| tool_result_budget | 200KB characters | 200,000 characters (`toolLimits.ts:49`) |\n| compact_history threshold | Character count estimate | Precise tokens: `contextWindow - maxOutputTokens - 13_000` |\n| Summary requirements | 5 categories of info | 9 sections + `<analysis>`/`<summary>` dual tags |\n| Compression prompt | Simple prompt | Double-ended hard guardrails forbidding tool calls |\n| PTL retry | Yes (simplified) | `truncateHeadForPTLRetry()` retreats by message groups (`compact.ts:243-290`) |\n| Post-compaction recovery | None (teaching version only keeps summary) | Auto re-read recent files, plans, agent/skill/tool context |\n| Circuit breaker | 3 times | 3 times (`autoCompact.ts:70`) |\n| Reactive retry | 1 time | CC has more granular tiered retries |\n\n### Execution Order Details\n\nThe real order in CC source `query.ts`:\n\n1. `applyToolResultBudget` (L379): persist large results first, ensuring full content is saved\n2. `snipCompact` (L403): trim middle messages\n3. `microcompact` (L414): old result placeholders\n4. `contextCollapse` (L441): independent context management system (not in teaching version)\n5. `autoCompact` (L454): LLM full summary\n\nThe teaching version's budget → snip → micro order matches this. The teaching version does not have the contextCollapse mechanism.\n\n### Full Constant Reference\n\n| Constant | Value | Source File |\n|----------|-------|-------------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| Time micro_compact interval | 60 minutes | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse and sessionMemoryCompact\n\nCC source code has two additional mechanisms not covered in this teaching version:\n\n- **contextCollapse**: An independent context management system that, when enabled, suppresses proactive autocompact (`autoCompact.ts:215-222`), with collapse's commit/blocking flow taking over context management. Manual `/compact` and reactive fallback remain independent paths, unaffected by contextCollapse.\n- **sessionMemoryCompact**: Before compact_history, CC first attempts a lightweight summary using existing session memory (covered in s09) without calling the LLM. This mechanism becomes clearer after learning s09.\n\n### What Does the Compression Prompt Look Like?\n\nCC's compression prompt has two hard requirements:\n\n1. **Absolutely no tool calls**: It begins with `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`, and appends another REMINDER at the end\n2. **Analyze first, then summarize**: The model must first reason in an `<analysis>` tag, then output the formal summary in a `<summary>` tag. The analysis is stripped during formatting\n\n### Teaching Version Simplifications Are Intentional\n\n- micro_compact uses text placeholders → we don't have API-level `cache_edits` access\n- Tokens estimated via character count → precise tokenizers are out of scope\n- Post-compaction recovery omitted → teaching version only keeps summary, does not auto re-attach files\n- Two auxiliary mechanisms not covered → they fall in the 10% detail category\n\nThe core design principle, cheap first, expensive last, is fully preserved.\n\n</details>\n\n<!-- translation-sync: zh@v1, en@v1, ja@v1 -->\n"
+    "content": "# s08: Context Compact — Context Will Fill Up, Have a Way to Make Room\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/en/s09) → s10 → ... → s20\n> *\"Context will fill up — have a way to make room\"* — Four-layer compression pipeline: cheap first, expensive last.\n>\n> **Harness Layer**: Compression — clean memory, unlimited sessions.\n\n---\n\n## The Problem\n\nThe agent is running along, then freezes.\n\nIt has bash, read, write — all the capabilities it needs. But it read a 1000-line file (~4000 tokens), then read 30 more files, ran 20 commands. Every command's output, every file's contents, all pile up in the `messages` list.\n\nThe context window is finite. Once full, the API outright rejects the call: `prompt_too_long`.\n\nWithout compression, an agent simply cannot work on large projects.\n\n---\n\n## The Solution\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.en.svg)\n\nThe hook structure, skill loading, and sub-Agent from s07 are preserved, with some tools omitted to focus on compaction. The core change: insert three pre-processors (0 API calls) before each LLM call, trigger an LLM summary (1 API call) when tokens still exceed the threshold, and emergency-trim if the API throws an error.\n\nCore design: cheap first, expensive last.\n\n---\n\n## How It Works\n\n![Four-layer compression pipeline](/course-assets/s08_context_compact/compaction-layers.en.svg)\n\n### L1: snip_compact — Trim Irrelevant Old Conversation\n\nThe agent ran 80 turns of conversation, accumulating 160 `messages`. The very first \"help me create hello.py\" is barely relevant to current work, yet it still occupies space.\n\nMessage count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle; the only extra boundary rule is that `assistant(tool_use)` must not be separated from the following `user(tool_result)`:\n\n```python\ndef snip_compact(messages, max_messages=50):\n    if len(messages) <= max_messages:\n        return messages\n    head_end, tail_start = 3, len(messages) - (max_messages - 3)\n    if _message_has_tool_use(messages[head_end - 1]):\n        while head_end < len(messages) and _is_tool_result_message(messages[head_end]):\n            head_end += 1\n    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n        tail_start -= 1\n    snipped = tail_start - head_end\n    placeholder = {\"role\": \"user\", \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n    return messages[:head_end] + [placeholder] + messages[tail_start:]\n```\n\nMessages are still trimmed directly; this just adds one boundary guard. `tool_result` content within remaining messages still keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.\n\n### L2: micro_compact — Placeholder for Old Tool Results\n\n![Old results placeholder](/course-assets/s08_context_compact/micro-compact.en.svg)\n\nThe agent read 10 files consecutively. The full contents of reads 1–7 are still sitting in context, no longer needed, but hogging large amounts of space.\n\nKeep only the 3 most recent `tool_result` entries intact; replace older ones with a one-line placeholder:\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n    tool_results = collect_tool_result_blocks(messages)\n    if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n        return messages\n    for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n        if len(block.get(\"content\", \"\")) > 120:\n            block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n    return messages\n```\n\nOld results are cleared, but a single new result can be 500KB — one `cat` of a large file can max out the context. → L3.\n\n### L3: tool_result_budget — Persist Large Results to Disk\n\n![Large results to disk](/course-assets/s08_context_compact/layer1-budget.en.svg)\n\nThe model read 5 large files in one go; all `tool_result` blocks in the last user message total 500KB.\n\nSum the size of all `tool_result` blocks in the last user message. If over 200KB → sort by size, starting from the largest, persist to `.task_outputs/tool-results/`, keeping only a `<persisted-output>` marker + a 2000-character preview in context. The model sees the marker and knows the full content is on disk, re-reading it when needed.\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n    last = messages[-1]\n    blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n              if b.get(\"type\") == \"tool_result\"]\n    total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n    if total <= max_bytes:\n        return messages\n    ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n    for idx, block in ranked:\n        if total <= max_bytes:\n            break\n        block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n        total = recalculate_total(blocks)\n    return messages\n```\n\nThe first three layers are all plain-text / structural operations — 0 API calls — but they cannot \"understand\" conversation content. Context may still be too large. → L4.\n\n### L4: compact_history — Full LLM Summary\n\n![Full LLM summary](/course-assets/s08_context_compact/auto-compact.en.svg)\n\nAll three previous layers have run, but after 30 minutes of continuous work on a huge project, tokens still exceed the threshold.\n\nThree-step process:\n\n1. **Save transcript**: Write the full conversation to `.transcripts/` in JSONL format. The transcript preserves a recoverable record, but the model's active context only contains the summary. For the model's current reasoning, the details are no longer in context. The teaching code does not provide a transcript retrieval tool.\n2. **LLM generates summary**: Send conversation history to the LLM, asking it to preserve key information: current goals, important findings, modified files, remaining work, user constraints, etc.\n3. **Replace message list**: All old messages are replaced with a single summary. The teaching version only keeps the summary; the real Claude Code re-attaches some recent files, plans, agent/skill/tool context after compaction.\n\n```python\ndef compact_history(messages):\n    transcript_path = write_transcript(messages)  # Save full conversation first\n    summary = summarize_history(messages)          # LLM generates summary\n    return [{\"role\": \"user\",\n             \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**Circuit breaker**: After 3 consecutive failures, stop retrying to prevent an infinite loop wasting API calls.\n\n### Reactive: reactive_compact\n\nSometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers.\n\nThis triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, but still avoids leaving an orphaned `tool_result`.\n\n```python\ndef reactive_compact(messages):\n    transcript = write_transcript(messages)\n    summary = summarize_history(messages)\n    tail_start = max(0, len(messages) - 5)\n    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n        tail_start -= 1\n    return [{\"role\": \"user\",\n             \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *messages[tail_start:]]\n```\n\nReactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11.\n\n### Putting It All Together\n\n```python\ndef agent_loop(messages):\n    reactive_retries = 0\n    while True:\n        # Three pre-processors (0 API calls)\n        # Order: budget first, so large content is persisted before placeholders\n        messages[:] = tool_result_budget(messages)    # L3: persist large results\n        messages[:] = snip_compact(messages)          # L1: trim middle\n        messages[:] = micro_compact(messages)         # L2: old result placeholders\n\n        # Still too much? LLM summary (1 API call)\n        if estimate_token_count(messages) > THRESHOLD:\n            messages[:] = compact_history(messages)\n\n        try:\n            response = client.messages.create(...)\n        except PromptTooLongError:\n            if reactive_retries < MAX_REACTIVE_RETRIES:\n                messages[:] = reactive_compact(messages)  # Emergency\n                reactive_retries += 1\n                continue\n            raise  # retry limit exceeded, raise exception\n        # ... tool execution ...\n\n        # compact tool: when the model actively calls it, triggers compact_history\n        if block.name == \"compact\":\n            messages[:] = compact_history(messages)\n            results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n            messages.append({\"role\": \"user\", \"content\": results})\n            break  # end current turn, start fresh with compacted context\n```\n\n**The order must not be swapped.** L3 (budget) runs before L2 (micro) because micro replaces old large tool_results with one-line placeholders — budget must persist the full content before that happens. This is why CC source puts `applyToolResultBudget` first.\n\n---\n\n## Changes From s07\n\n| Component | Before (s07) | After (s08) |\n|-----------|-------------|-------------|\n| Context management | None (context grows unbounded) | Four-layer compression pipeline + emergency |\n| New functions | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| Tools | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| Loop | LLM call → tool execution | Three pre-processors before each turn + threshold-triggered compact_history |\n| Design principle | — | Cheap first, expensive last |\n\n---\n\n## Try It\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\nTry these prompts:\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md` (read multiple files consecutively, observe L2 compressing old results)\n2. `Read every file in s08_context_compact/` (read a large amount of content at once, observe L3 persisting to disk)\n3. Chat for 20+ turns, observe whether `[auto compact]` or `[reactive compact]` appears\n\nWhat to watch for: After each tool execution, are old `tool_result` entries compressed? When tokens exceed the threshold after extended conversation, is summarization triggered automatically?\n\n---\n\n## What's Next\n\nContext compression lets an agent run for a long time without crashing. But after each compression, the preferences and constraints the user told it are also lost. Can we let the agent selectively remember important things?\n\ns09 Memory → three subsystems: choosing what to remember, extracting key information, consolidating and organizing. Across compressions, across sessions.\n\n<details>\n<summary>Deep Dive Into CC Source Code</summary>\n\n> The following is based on analysis of CC source code `compact.ts`, `autoCompact.ts`, `microCompact.ts`, and `query.ts`.\n\n### Execution Order Comparison\n\nThe teaching version labels layers L1/L2/L3/L4 for pedagogical clarity, but actual execution order does not match the numbering:\n\n| Dimension | Teaching Version | Claude Code |\n|-----------|-----------------|-------------|\n| Execution order | budget → snip → micro → auto | budget → snip → micro → collapse → auto (`query.ts:379-468`) |\n| snip_compact | Keep head 3 + tail 47 | CC only enables on main thread; implementation not in open-source repo (`HISTORY_SNIP` feature gate), but interface is visible: `snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`, also exposes `SnipTool` for model-initiated snipping. Teaching version's 3/47 are simplified parameters |\n| micro_compact | Text placeholder replacement | Two paths: time-based clears content directly, cached uses API `cache_edits` (legacy path removed) |\n| micro_compact whitelist | By position (most recent 3) | time-based triggers by time threshold; cached triggers by count (`microCompact.ts`) |\n| tool_result_budget | 200KB characters | 200,000 characters (`toolLimits.ts:49`) |\n| compact_history threshold | Character count estimate | Precise tokens: `contextWindow - maxOutputTokens - 13_000` |\n| Summary requirements | 5 categories of info | 9 sections + `<analysis>`/`<summary>` dual tags |\n| Compression prompt | Simple prompt | Double-ended hard guardrails forbidding tool calls |\n| PTL retry | Yes (simplified) | `truncateHeadForPTLRetry()` retreats by message groups (`compact.ts:243-290`) |\n| Post-compaction recovery | None (teaching version only keeps summary) | Auto re-read recent files, plans, agent/skill/tool context |\n| Circuit breaker | 3 times | 3 times (`autoCompact.ts:70`) |\n| Reactive retry | 1 time | CC has more granular tiered retries |\n\n### Execution Order Details\n\nThe real order in CC source `query.ts`:\n\n1. `applyToolResultBudget` (L379): persist large results first, ensuring full content is saved\n2. `snipCompact` (L403): trim middle messages\n3. `microcompact` (L414): old result placeholders\n4. `contextCollapse` (L441): independent context management system (not in teaching version)\n5. `autoCompact` (L454): LLM full summary\n\nThe teaching version's budget → snip → micro order matches this. The teaching version does not have the contextCollapse mechanism.\n\n### Full Constant Reference\n\n| Constant | Value | Source File |\n|----------|-------|-------------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| Time micro_compact interval | 60 minutes | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse and sessionMemoryCompact\n\nCC source code has two additional mechanisms not covered in this teaching version:\n\n- **contextCollapse**: An independent context management system that, when enabled, suppresses proactive autocompact (`autoCompact.ts:215-222`), with collapse's commit/blocking flow taking over context management. Manual `/compact` and reactive fallback remain independent paths, unaffected by contextCollapse.\n- **sessionMemoryCompact**: Before compact_history, CC first attempts a lightweight summary using existing session memory (covered in s09) without calling the LLM. This mechanism becomes clearer after learning s09.\n\n### What Does the Compression Prompt Look Like?\n\nCC's compression prompt has two hard requirements:\n\n1. **Absolutely no tool calls**: It begins with `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`, and appends another REMINDER at the end\n2. **Analyze first, then summarize**: The model must first reason in an `<analysis>` tag, then output the formal summary in a `<summary>` tag. The analysis is stripped during formatting\n\n### Teaching Version Simplifications Are Intentional\n\n- micro_compact uses text placeholders → we don't have API-level `cache_edits` access\n- Tokens estimated via character count → precise tokenizers are out of scope\n- Post-compaction recovery omitted → teaching version only keeps summary, does not auto re-attach files\n- Two auxiliary mechanisms not covered → they fall in the 10% detail category\n\nThe core design principle, cheap first, expensive last, is fully preserved.\n\n</details>\n\n<!-- translation-sync: zh@v1, en@v1, ja@v1 -->\n"
   },
   {
     "version": "s08",
     "locale": "zh",
     "title": "s08: Context Compact — 上下文总会满，要有办法腾地方",
-    "content": "# s08: Context Compact — 上下文总会满，要有办法腾地方\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/zh/s09) → s10 → ... → s20\n> *\"上下文总会满, 要有办法腾地方\"* — 四层压缩策略, 便宜的先跑贵的后跑。\n>\n> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。\n\n---\n\n## 问题\n\nAgent 跑着跑着，不动了。\n\n手里有 bash、有 read、有 write，能力是够的。但它读了一个 1000 行的文件（~4000 token），又读了 30 个文件，跑了 20 条命令。每条命令的输出、每个文件的内容，全都堆在 `messages` 列表里。\n\n上下文窗口是有限的。满了之后，API 直接拒绝：`prompt_too_long`。\n\n不压缩，Agent 根本没法在大项目里干活。\n\n---\n\n## 解决方案\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.svg)\n\n保留 s07 的 hook 结构、技能加载、子 Agent 等骨架，省略部分工具细节以聚焦压缩。核心变动：每轮 LLM 调用前插入三层预处理器（0 API），token 仍超阈值时触发 LLM 摘要（1 API），API 报错时应急裁剪。\n\n核心设计：便宜的先跑，贵的后跑。\n\n---\n\n## 工作原理\n\n![四层压缩管线](/course-assets/s08_context_compact/compaction-layers.svg)\n\n### L1: snip_compact — 裁掉无关的旧对话\n\nAgent 跑了 80 轮对话，`messages` 攒了 160 条。最前面的\"帮我创建 hello.py\"和当前工作几乎无关了，但全占着位置。\n\n消息数超过 50 条 → 保留头部 3 条（初始上下文）和尾部 47 条（当前工作），中间裁掉：\n\n```python\ndef snip_compact(messages, max_messages=50):\n    if len(messages) <= max_messages:\n        return messages\n    keep_head, keep_tail = 3, max_messages - 3\n    snipped = len(messages) - keep_head - keep_tail\n    placeholder = {\"role\": \"user\",\n                   \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n    return messages[:keep_head] + [placeholder] + messages[-keep_tail:]\n```\n\n裁掉了整条消息，但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。\n\n### L2: micro_compact — 旧工具结果占位\n\n![旧结果占位](/course-assets/s08_context_compact/micro-compact.svg)\n\nAgent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里，早就不需要了，但占着大量空间。\n\n只保留最近 3 条 `tool_result` 的完整内容，更旧的替换为一行占位符：\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n    tool_results = collect_tool_result_blocks(messages)\n    if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n        return messages\n    for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n        if len(block.get(\"content\", \"\")) > 120:\n            block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n    return messages\n```\n\n旧结果清掉了，但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。\n\n### L3: tool_result_budget — 大结果落盘\n\n![大结果落盘](/course-assets/s08_context_compact/layer1-budget.svg)\n\n模型一次读了 5 个大文件，单条 user 消息里所有 `tool_result` 加起来 500KB。\n\n统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序，从最大的开始落盘到 `.task_outputs/tool-results/`，上下文里只留 `<persisted-output>` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上，需要时可以重新读。\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n    last = messages[-1]\n    blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n              if b.get(\"type\") == \"tool_result\"]\n    total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n    if total <= max_bytes:\n        return messages\n    ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n    for idx, block in ranked:\n        if total <= max_bytes:\n            break\n        block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n        total = recalculate_total(blocks)\n    return messages\n```\n\n前三层都是纯文本/结构操作，0 API 调用，但也无法\"理解\"对话内容。上下文可能仍然太大。→ L4。\n\n### L4: compact_history — LLM 全量摘要\n\n![LLM 全量摘要](/course-assets/s08_context_compact/auto-compact.svg)\n\n前三层全跑完了，但在超大项目中连续工作 30 分钟后，token 仍然超过阈值。\n\n三步流程：\n\n1. **保存 transcript**：完整对话写入 `.transcripts/`，JSONL 格式。transcript 保留了可恢复记录，但模型的活跃上下文里只剩摘要。对模型当下推理来说，细节已经不在上下文中了。教学代码没有提供 transcript 检索工具。\n2. **LLM 生成摘要**：把对话历史发给 LLM，要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。\n3. **替换消息列表**：所有旧消息被替换为一条摘要。教学版只保留摘要；真实 Claude Code 会在 compact 后重新附加部分最近文件、计划、agent/skill/tool 等上下文。\n\n```python\ndef compact_history(messages):\n    transcript_path = write_transcript(messages)  # 先保存完整对话\n    summary = summarize_history(messages)          # LLM 生成摘要\n    return [{\"role\": \"user\",\n             \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**熔断器**：连续失败 3 次后停止重试，防止死循环浪费 API 调用。\n\n### 应急: reactive_compact\n\n有时候 API 还是返回 `prompt_too_long`（413），上下文增长速度快于压缩触发速度时。\n\n这时触发 **reactive_compact**：比 compact_history 更激进，从尾部回退，以字节级精度裁剪到 API 可接受的大小，只保留最后 5 条消息 + 摘要。\n\n```python\ndef reactive_compact(messages):\n    transcript = write_transcript(messages)\n    summary = summarize_history(messages)\n    tail = messages[-5:]\n    return [{\"role\": \"user\",\n             \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *tail]\n```\n\nreactive compact 有重试上限（默认 1 次）。再失败就抛出异常，不无限循环。完整的错误恢复逻辑留给 s11。\n\n### 合起来跑\n\n```python\ndef agent_loop(messages):\n    reactive_retries = 0\n    while True:\n        # 三个预处理器（0 API 调用）\n        # 顺序：budget 先跑，确保大内容落盘后再做占位和裁剪\n        messages[:] = tool_result_budget(messages)    # L3: 大结果落盘\n        messages[:] = snip_compact(messages)          # L1: 裁中间\n        messages[:] = micro_compact(messages)         # L2: 旧结果占位\n\n        # 还不够？LLM 摘要（1 API 调用）\n        if estimate_token_count(messages) > THRESHOLD:\n            messages[:] = compact_history(messages)\n\n        try:\n            response = client.messages.create(...)\n        except PromptTooLongError:\n            if reactive_retries < MAX_REACTIVE_RETRIES:\n                messages[:] = reactive_compact(messages)  # 应急\n                reactive_retries += 1\n                continue\n            raise  # 超过重试上限，抛出异常\n        # ... 工具执行 ...\n\n        # compact 工具：模型主动调用时触发 compact_history\n        if block.name == \"compact\":\n            messages[:] = compact_history(messages)\n            results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n            messages.append({\"role\": \"user\", \"content\": results})\n            break  # 结束当前 turn，用压缩后的上下文开始新一轮\n```\n\n**顺序不能换。** L3（budget）在 L2（micro）前面，因为 micro 会把旧的大 tool_result 替换成一行占位符，budget 必须在那之前把完整内容落盘。这也是为什么 CC 源码把 `applyToolResultBudget` 放在最前面。\n\n---\n\n## 相对 s07 的变更\n\n| 组件 | 之前 (s07) | 之后 (s08) |\n|------|-----------|-----------|\n| 上下文管理 | 无（上下文无限膨胀） | 四层压缩管线 + 应急 |\n| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history |\n| 设计原则 | — | 便宜的先跑，贵的后跑 |\n\n---\n\n## 试一下\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\n试试这些 prompt：\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`（连续读多个文件，观察 L2 压缩旧结果）\n2. `Read every file in s08_context_compact/`（一次性读大量内容，观察 L3 落盘）\n3. 反复对话 20+ 轮，观察是否出现 `[auto compact]` 或 `[reactive compact]`\n\n观察重点：每次工具执行后，旧 tool_result 是否被压缩？连续对话后 token 超阈值时，是否自动触发了摘要？\n\n---\n\n## 接下来\n\n上下文压缩让 Agent 能跑很久不会崩。但每次压缩后，用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent 有选择地记住重要的事？\n\ns09 Memory → 三个子系统：选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。\n\n<details>\n<summary>深入 CC 源码</summary>\n\n> 以下基于 CC 源码 `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` 的分析。\n\n### 执行顺序对照\n\n教学版为了讲解方便按 L1/L2/L3/L4 编号，但实际执行顺序和编号不完全对应：\n\n| 维度 | 教学版 | Claude Code |\n|------|--------|-------------|\n| 执行顺序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto（`query.ts:379-468`） |\n| snip_compact | 保留头 3 + 尾 47 | CC 仅主线程启用；实现不在开源仓库中（`HISTORY_SNIP` feature gate），但接口可见：`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`，还暴露了 `SnipTool` 工具让模型主动调用。教学版的 3/47 是简化参数 |\n| micro_compact | 文本占位符替换 | 两条路径：time-based 直接清内容，cached 走 API `cache_edits`（legacy path 已移除） |\n| micro_compact 白名单 | 按位置（最近 3 条） | time-based 按时间阈值触发；cached 按计数触发（`microCompact.ts`） |\n| tool_result_budget | 200KB 字符 | 200,000 字符（`toolLimits.ts:49`） |\n| compact_history 阈值 | 字符数估算 | 精确 token：`contextWindow - maxOutputTokens - 13_000` |\n| 摘要要求 | 5 类信息 | 9 个部分 + `<analysis>`/`<summary>` 双标签 |\n| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 |\n| PTL retry | 有（简化） | `truncateHeadForPTLRetry()` 按消息组回退（`compact.ts:243-290`） |\n| 后压缩恢复 | 无（教学版只保留摘要） | 自动重新读取最近文件、计划、agent/skill/tool 等 |\n| 熔断器 | 3 次 | 3 次（`autoCompact.ts:70`） |\n| reactive 重试 | 1 次 | CC 有更精细的分级重试 |\n\n### 执行顺序详解\n\nCC 源码 `query.ts` 中的真实顺序：\n\n1. `applyToolResultBudget`（L379）：先处理大结果，确保完整内容落盘\n2. `snipCompact`（L403）：裁中间消息\n3. `microcompact`（L414）：旧结果占位\n4. `contextCollapse`（L441）：独立的上下文管理系统（教学版无）\n5. `autoCompact`（L454）：LLM 全量摘要\n\n教学版的 budget → snip → micro 顺序与此一致。教学版没有 contextCollapse 机制。\n\n### 完整常量参考\n\n| 常量 | 值 | 源文件 |\n|------|-----|--------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse 和 sessionMemoryCompact\n\nCC 源码中还有两个机制本教学版没有展开：\n\n- **contextCollapse**：独立的上下文管理系统，启用时抑制 proactive autocompact（`autoCompact.ts:215-222`），由 collapse 的 commit/blocking 流程接管上下文管理。但 manual `/compact` 和 reactive fallback 仍是独立路径，不受 contextCollapse 影响。\n- **sessionMemoryCompact**：compact_history 之前，CC 会先尝试用已有的 session memory（s09 会讲到）做轻量摘要，不调 LLM。这个机制等学完 s09 之后回头看会更清楚。\n\n### 压缩 prompt 长什么样？\n\nCC 的压缩 prompt 有两个硬性要求：\n\n1. **绝对禁止调用工具**：开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`，末尾还会再 REMINDER 一次\n2. **先分析再总结**：模型需要先在 `<analysis>` 标签里理清思路，然后在 `<summary>` 标签里输出正式摘要。analysis 在格式化时被剥离\n\n### 教学版的简化是刻意的\n\n- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限\n- token 用字符数估算 → 精确 tokenizer 不在教学范围内\n- 后压缩恢复省略 → 教学版只保留摘要，不自动重新附加文件\n- 两个辅助机制不展开 → 属于 10% 的细节\n\n核心设计思想，便宜的先跑贵的后跑，完整保留。\n\n</details>\n\n<!-- translation-sync: zh@v1, en@v1, ja@v1 -->\n"
+    "content": "# s08: Context Compact — 上下文总会满，要有办法腾地方\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/zh/s09) → s10 → ... → s20\n> *\"上下文总会满, 要有办法腾地方\"* — 四层压缩策略, 便宜的先跑贵的后跑。\n>\n> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。\n\n---\n\n## 问题\n\nAgent 跑着跑着，不动了。\n\n手里有 bash、有 read、有 write，能力是够的。但它读了一个 1000 行的文件（~4000 token），又读了 30 个文件，跑了 20 条命令。每条命令的输出、每个文件的内容，全都堆在 `messages` 列表里。\n\n上下文窗口是有限的。满了之后，API 直接拒绝：`prompt_too_long`。\n\n不压缩，Agent 根本没法在大项目里干活。\n\n---\n\n## 解决方案\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.svg)\n\n保留 s07 的 hook 结构、技能加载、子 Agent 等骨架，省略部分工具细节以聚焦压缩。核心变动：每轮 LLM 调用前插入三层预处理器（0 API），token 仍超阈值时触发 LLM 摘要（1 API），API 报错时应急裁剪。\n\n核心设计：便宜的先跑，贵的后跑。\n\n---\n\n## 工作原理\n\n![四层压缩管线](/course-assets/s08_context_compact/compaction-layers.svg)\n\n### L1: snip_compact — 裁掉无关的旧对话\n\nAgent 跑了 80 轮对话，`messages` 攒了 160 条。最前面的\"帮我创建 hello.py\"和当前工作几乎无关了，但全占着位置。\n\n消息数超过 50 条 → 保留头部 3 条（初始上下文）和尾部 47 条（当前工作），中间裁掉；唯一额外边界条件是，不能把 `assistant(tool_use)` 和后面的 `user(tool_result)` 拆开：\n\n```python\ndef snip_compact(messages, max_messages=50):\n    if len(messages) <= max_messages:\n        return messages\n    head_end, tail_start = 3, len(messages) - (max_messages - 3)\n    if _message_has_tool_use(messages[head_end - 1]):\n        while head_end < len(messages) and _is_tool_result_message(messages[head_end]):\n            head_end += 1\n    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n        tail_start -= 1\n    snipped = tail_start - head_end\n    placeholder = {\"role\": \"user\", \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n    return messages[:head_end] + [placeholder] + messages[tail_start:]\n```\n\n裁掉的是消息本身，只是在切口处多做一步保护；剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。\n\n### L2: micro_compact — 旧工具结果占位\n\n![旧结果占位](/course-assets/s08_context_compact/micro-compact.svg)\n\nAgent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里，早就不需要了，但占着大量空间。\n\n只保留最近 3 条 `tool_result` 的完整内容，更旧的替换为一行占位符：\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n    tool_results = collect_tool_result_blocks(messages)\n    if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n        return messages\n    for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n        if len(block.get(\"content\", \"\")) > 120:\n            block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n    return messages\n```\n\n旧结果清掉了，但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。\n\n### L3: tool_result_budget — 大结果落盘\n\n![大结果落盘](/course-assets/s08_context_compact/layer1-budget.svg)\n\n模型一次读了 5 个大文件，单条 user 消息里所有 `tool_result` 加起来 500KB。\n\n统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序，从最大的开始落盘到 `.task_outputs/tool-results/`，上下文里只留 `<persisted-output>` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上，需要时可以重新读。\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n    last = messages[-1]\n    blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n              if b.get(\"type\") == \"tool_result\"]\n    total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n    if total <= max_bytes:\n        return messages\n    ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n    for idx, block in ranked:\n        if total <= max_bytes:\n            break\n        block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n        total = recalculate_total(blocks)\n    return messages\n```\n\n前三层都是纯文本/结构操作，0 API 调用，但也无法\"理解\"对话内容。上下文可能仍然太大。→ L4。\n\n### L4: compact_history — LLM 全量摘要\n\n![LLM 全量摘要](/course-assets/s08_context_compact/auto-compact.svg)\n\n前三层全跑完了，但在超大项目中连续工作 30 分钟后，token 仍然超过阈值。\n\n三步流程：\n\n1. **保存 transcript**：完整对话写入 `.transcripts/`，JSONL 格式。transcript 保留了可恢复记录，但模型的活跃上下文里只剩摘要。对模型当下推理来说，细节已经不在上下文中了。教学代码没有提供 transcript 检索工具。\n2. **LLM 生成摘要**：把对话历史发给 LLM，要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。\n3. **替换消息列表**：所有旧消息被替换为一条摘要。教学版只保留摘要；真实 Claude Code 会在 compact 后重新附加部分最近文件、计划、agent/skill/tool 等上下文。\n\n```python\ndef compact_history(messages):\n    transcript_path = write_transcript(messages)  # 先保存完整对话\n    summary = summarize_history(messages)          # LLM 生成摘要\n    return [{\"role\": \"user\",\n             \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**熔断器**：连续失败 3 次后停止重试，防止死循环浪费 API 调用。\n\n### 应急: reactive_compact\n\n有时候 API 还是返回 `prompt_too_long`（413），上下文增长速度快于压缩触发速度时。\n\n这时触发 **reactive_compact**：比 compact_history 更激进，从尾部回退，但仍要避免留下孤立 `tool_result`。\n\n```python\ndef reactive_compact(messages):\n    transcript = write_transcript(messages)\n    summary = summarize_history(messages)\n    tail_start = max(0, len(messages) - 5)\n    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n        tail_start -= 1\n    return [{\"role\": \"user\",\n             \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *messages[tail_start:]]\n```\n\nreactive compact 有重试上限（默认 1 次）。再失败就抛出异常，不无限循环。完整的错误恢复逻辑留给 s11。\n\n### 合起来跑\n\n```python\ndef agent_loop(messages):\n    reactive_retries = 0\n    while True:\n        # 三个预处理器（0 API 调用）\n        # 顺序：budget 先跑，确保大内容落盘后再做占位和裁剪\n        messages[:] = tool_result_budget(messages)    # L3: 大结果落盘\n        messages[:] = snip_compact(messages)          # L1: 裁中间\n        messages[:] = micro_compact(messages)         # L2: 旧结果占位\n\n        # 还不够？LLM 摘要（1 API 调用）\n        if estimate_token_count(messages) > THRESHOLD:\n            messages[:] = compact_history(messages)\n\n        try:\n            response = client.messages.create(...)\n        except PromptTooLongError:\n            if reactive_retries < MAX_REACTIVE_RETRIES:\n                messages[:] = reactive_compact(messages)  # 应急\n                reactive_retries += 1\n                continue\n            raise  # 超过重试上限，抛出异常\n        # ... 工具执行 ...\n\n        # compact 工具：模型主动调用时触发 compact_history\n        if block.name == \"compact\":\n            messages[:] = compact_history(messages)\n            results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n            messages.append({\"role\": \"user\", \"content\": results})\n            break  # 结束当前 turn，用压缩后的上下文开始新一轮\n```\n\n**顺序不能换。** L3（budget）在 L2（micro）前面，因为 micro 会把旧的大 tool_result 替换成一行占位符，budget 必须在那之前把完整内容落盘。这也是为什么 CC 源码把 `applyToolResultBudget` 放在最前面。\n\n---\n\n## 相对 s07 的变更\n\n| 组件 | 之前 (s07) | 之后 (s08) |\n|------|-----------|-----------|\n| 上下文管理 | 无（上下文无限膨胀） | 四层压缩管线 + 应急 |\n| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history |\n| 设计原则 | — | 便宜的先跑，贵的后跑 |\n\n---\n\n## 试一下\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\n试试这些 prompt：\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`（连续读多个文件，观察 L2 压缩旧结果）\n2. `Read every file in s08_context_compact/`（一次性读大量内容，观察 L3 落盘）\n3. 反复对话 20+ 轮，观察是否出现 `[auto compact]` 或 `[reactive compact]`\n\n观察重点：每次工具执行后，旧 tool_result 是否被压缩？连续对话后 token 超阈值时，是否自动触发了摘要？\n\n---\n\n## 接下来\n\n上下文压缩让 Agent 能跑很久不会崩。但每次压缩后，用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent 有选择地记住重要的事？\n\ns09 Memory → 三个子系统：选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。\n\n<details>\n<summary>深入 CC 源码</summary>\n\n> 以下基于 CC 源码 `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` 的分析。\n\n### 执行顺序对照\n\n教学版为了讲解方便按 L1/L2/L3/L4 编号，但实际执行顺序和编号不完全对应：\n\n| 维度 | 教学版 | Claude Code |\n|------|--------|-------------|\n| 执行顺序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto（`query.ts:379-468`） |\n| snip_compact | 保留头 3 + 尾 47 | CC 仅主线程启用；实现不在开源仓库中（`HISTORY_SNIP` feature gate），但接口可见：`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`，还暴露了 `SnipTool` 工具让模型主动调用。教学版的 3/47 是简化参数 |\n| micro_compact | 文本占位符替换 | 两条路径：time-based 直接清内容，cached 走 API `cache_edits`（legacy path 已移除） |\n| micro_compact 白名单 | 按位置（最近 3 条） | time-based 按时间阈值触发；cached 按计数触发（`microCompact.ts`） |\n| tool_result_budget | 200KB 字符 | 200,000 字符（`toolLimits.ts:49`） |\n| compact_history 阈值 | 字符数估算 | 精确 token：`contextWindow - maxOutputTokens - 13_000` |\n| 摘要要求 | 5 类信息 | 9 个部分 + `<analysis>`/`<summary>` 双标签 |\n| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 |\n| PTL retry | 有（简化） | `truncateHeadForPTLRetry()` 按消息组回退（`compact.ts:243-290`） |\n| 后压缩恢复 | 无（教学版只保留摘要） | 自动重新读取最近文件、计划、agent/skill/tool 等 |\n| 熔断器 | 3 次 | 3 次（`autoCompact.ts:70`） |\n| reactive 重试 | 1 次 | CC 有更精细的分级重试 |\n\n### 执行顺序详解\n\nCC 源码 `query.ts` 中的真实顺序：\n\n1. `applyToolResultBudget`（L379）：先处理大结果，确保完整内容落盘\n2. `snipCompact`（L403）：裁中间消息\n3. `microcompact`（L414）：旧结果占位\n4. `contextCollapse`（L441）：独立的上下文管理系统（教学版无）\n5. `autoCompact`（L454）：LLM 全量摘要\n\n教学版的 budget → snip → micro 顺序与此一致。教学版没有 contextCollapse 机制。\n\n### 完整常量参考\n\n| 常量 | 值 | 源文件 |\n|------|-----|--------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse 和 sessionMemoryCompact\n\nCC 源码中还有两个机制本教学版没有展开：\n\n- **contextCollapse**：独立的上下文管理系统，启用时抑制 proactive autocompact（`autoCompact.ts:215-222`），由 collapse 的 commit/blocking 流程接管上下文管理。但 manual `/compact` 和 reactive fallback 仍是独立路径，不受 contextCollapse 影响。\n- **sessionMemoryCompact**：compact_history 之前，CC 会先尝试用已有的 session memory（s09 会讲到）做轻量摘要，不调 LLM。这个机制等学完 s09 之后回头看会更清楚。\n\n### 压缩 prompt 长什么样？\n\nCC 的压缩 prompt 有两个硬性要求：\n\n1. **绝对禁止调用工具**：开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`，末尾还会再 REMINDER 一次\n2. **先分析再总结**：模型需要先在 `<analysis>` 标签里理清思路，然后在 `<summary>` 标签里输出正式摘要。analysis 在格式化时被剥离\n\n### 教学版的简化是刻意的\n\n- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限\n- token 用字符数估算 → 精确 tokenizer 不在教学范围内\n- 后压缩恢复省略 → 教学版只保留摘要，不自动重新附加文件\n- 两个辅助机制不展开 → 属于 10% 的细节\n\n核心设计思想，便宜的先跑贵的后跑，完整保留。\n\n</details>\n\n<!-- translation-sync: zh@v1, en@v1, ja@v1 -->\n"
   },
   {
     "version": "s08",
     "locale": "ja",
     "title": "s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要",
-    "content": "# s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/ja/s09) → s10 → ... → s20\n> *\"Context will fill up — have a way to make room\"* — 4層圧縮戦略、安価なものを先に、高価なものを後に実行。\n>\n> **Harness レイヤー**: 圧縮 — クリーンな記憶、無限のセッション。\n\n---\n\n## 課題\n\nAgent が動いている途中で、止まってしまう。\n\nbash、read、write は揃っており、能力は十分。しかし 1000 行のファイル（~4000 token）を読み、さらに 30 のファイルを読み、20 のコマンドを実行したとします。各コマンドの出力、各ファイルの内容がすべて `messages` リストに蓄積されます。\n\nコンテキストウィンドウには上限があります。満杯になると、API は即座に拒否します：`prompt_too_long`。\n\n圧縮しなければ、Agent は大規模プロジェクトではまともに動けません。\n\n---\n\n## ソリューション\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.ja.svg)\n\ns07 のフック構造、スキルロード、サブ Agent の骨格を維持し、圧縮に焦点を当てるため一部のツールは省略。コアの変更点：各 LLM 呼び出し前に 3 層のプリプロセッサ（0 API）を挿入し、token が閾値を超えた場合は LLM 要約（1 API）をトリガー、API エラー時には緊急トリムを実行。\n\nコア設計：安価なものを先に、高価なものを後に。\n\n---\n\n## 仕組み\n\n![4層圧縮パイプライン](/course-assets/s08_context_compact/compaction-layers.ja.svg)\n\n### L1: snip_compact — 無関係な古い会話を切り捨て\n\nAgent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。\n\nメッセージ数が 50 を超えた場合 → 先頭 3 件（初期コンテキスト）と末尾 47 件（現在の作業）を保持し、中間を切り捨て：\n\n```python\ndef snip_compact(messages, max_messages=50):\n    if len(messages) <= max_messages:\n        return messages\n    keep_head, keep_tail = 3, max_messages - 3\n    snipped = len(messages) - keep_head - keep_tail\n    placeholder = {\"role\": \"user\",\n                   \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n    return messages[:keep_head] + [placeholder] + messages[-keep_tail:]\n```\n\nメッセージ全体は切り捨てたが、残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。\n\n### L2: micro_compact — 古いツール結果をプレースホルダに置換\n\n![古い結果のプレースホルダ](/course-assets/s08_context_compact/micro-compact.ja.svg)\n\nAgent が連続して 10 個のファイルを読んだ。1〜7 回目の完全な内容はまだコンテキストに残っており、もう不要だが、大量のスペースを占有している。\n\n直近 3 件の `tool_result` の完全な内容のみを保持し、それより古いものは 1 行のプレースホルダに置換：\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n    tool_results = collect_tool_result_blocks(messages)\n    if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n        return messages\n    for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n        if len(block.get(\"content\", \"\")) > 120:\n            block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n    return messages\n```\n\n古い結果はクリーンアップされたが、1 件の新しい結果だけで 500KB の可能性がある。大きなファイルを `cat` するだけでコンテキストがいっぱいになる。→ L3。\n\n### L3: tool_result_budget — 大きな結果をディスクに退避\n\n![大きな結果のディスク退避](/course-assets/s08_context_compact/layer1-budget.ja.svg)\n\nモデルが一度に 5 つの大きなファイルを読み、1 つの user メッセージ内の全 `tool_result` の合計が 500KB に達した。\n\n最後の user メッセージ内のすべての `tool_result` の合計サイズを集計。200KB を超えた場合 → サイズ順にソートし、最大のものから順に `.task_outputs/tool-results/` に退避。コンテキストには `<persisted-output>` マーカー + 先頭 2000 文字のプレビューのみを残す。モデルはマーカーを見て完全な内容がディスク上にあることを認識し、必要に応じて再読み込みできる。\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n    last = messages[-1]\n    blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n              if b.get(\"type\") == \"tool_result\"]\n    total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n    if total <= max_bytes:\n        return messages\n    ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n    for idx, block in ranked:\n        if total <= max_bytes:\n            break\n        block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n        total = recalculate_total(blocks)\n    return messages\n```\n\n最初の 3 層はすべて純粋なテキスト/構造操作（0 API 呼び出し）だが、会話内容を「理解」することはできない。コンテキストがまだ大きすぎる可能性がある。→ L4。\n\n### L4: compact_history — LLM 全量要約\n\n![LLM 全量要約](/course-assets/s08_context_compact/auto-compact.ja.svg)\n\n最初の 3 層がすべて実行されたが、超大規模プロジェクトで 30 分間連続作業すると、token がまだ閾値を超えている。\n\n3 ステップのフロー：\n\n1. **transcript を保存**：完全な会話を `.transcripts/` に JSONL 形式で書き出す。transcript は回復可能な記録として保存されるが、モデルのアクティブなコンテキストには要約しか残らない。モデルの現在の推論にとって、詳細はすでにコンテキストにない。教学コードは transcript 検索ツールを提供しない。\n2. **LLM で要約を生成**：会話履歴を LLM に送り、現在の目標、重要な発見、変更済みファイル、残りの作業、ユーザーの制約などの重要な情報を保持するよう指示。\n3. **メッセージリストを置換**：すべての古いメッセージが 1 件の要約に置き換えられる。教学版は要約のみを保持する。実際の Claude Code は compact 後に直近のファイル、計画、agent/skill/tool などのコンテキストを再付加する。\n\n```python\ndef compact_history(messages):\n    transcript_path = write_transcript(messages)  # 先に完全な会話を保存\n    summary = summarize_history(messages)          # LLM で要約を生成\n    return [{\"role\": \"user\",\n             \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**サーキットブレーカー**：連続 3 回失敗したらリトライを停止し、無限ループによる API 呼び出しの浪費を防止。\n\n### 緊急: reactive_compact\n\nAPI がまだ `prompt_too_long`（413）を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。\n\nこの時 **reactive_compact** がトリガーされる：compact_history よりもさらに積極的で、末尾からバイト単位の精度で API が受け入れ可能なサイズまで切り詰め、最後の 5 件のメッセージ + 要約のみを保持。\n\n```python\ndef reactive_compact(messages):\n    transcript = write_transcript(messages)\n    summary = summarize_history(messages)\n    tail = messages[-5:]\n    return [{\"role\": \"user\",\n             \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *tail]\n```\n\nreactive compact にはリトライ上限がある（デフォルト 1 回）。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。\n\n### 合わせて実行\n\n```python\ndef agent_loop(messages):\n    reactive_retries = 0\n    while True:\n        # 3 つのプリプロセッサ（0 API 呼び出し）\n        # 順序：budget を先に実行し、大きな内容をプレースホルダ化する前に退避\n        messages[:] = tool_result_budget(messages)    # L3: 大きな結果を退避\n        messages[:] = snip_compact(messages)          # L1: 中間を切り捨て\n        messages[:] = micro_compact(messages)         # L2: 古い結果をプレースホルダに\n\n        # まだ足りない？LLM 要約（1 API 呼び出し）\n        if estimate_token_count(messages) > THRESHOLD:\n            messages[:] = compact_history(messages)\n\n        try:\n            response = client.messages.create(...)\n        except PromptTooLongError:\n            if reactive_retries < MAX_REACTIVE_RETRIES:\n                messages[:] = reactive_compact(messages)  # 緊急対応\n                reactive_retries += 1\n                continue\n            raise  # リトライ上限超過、例外をスロー\n        # ... ツール実行 ...\n\n        # compact ツール：モデルが能動的に呼び出した場合、compact_history をトリガー\n        if block.name == \"compact\":\n            messages[:] = compact_history(messages)\n            results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n            messages.append({\"role\": \"user\", \"content\": results})\n            break  # 現在のターンを終了し、圧縮後のコンテキストで新しく開始\n```\n\n**順序は変えられない。** L3（budget）が L2（micro）の前に実行される理由：micro は古い大きな tool_result を 1 行のプレースホルダに置換するため、budget はその前に完全な内容を退避させる必要がある。CC ソースが `applyToolResultBudget` を最初に配置する理由も同じ。\n\n---\n\n## s07 からの変更点\n\n| コンポーネント | 変更前 (s07) | 変更後 (s08) |\n|------|-----------|-----------|\n| コンテキスト管理 | なし（コンテキストが無限に膨張） | 4 層圧縮パイプライン + 緊急対応 |\n| 新規関数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| ツール | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| ループ | LLM 呼び出し → ツール実行 | 各ラウンド前に 3 層プリプロセッサを実行 + 閾値で compact_history をトリガー |\n| 設計原則 | — | 安価なものを先に、高価なものを後に |\n\n---\n\n## 試してみよう\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\n以下のプロンプトを試してみてください：\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`（連続して複数のファイルを読み、L2 の古い結果圧縮を観察）\n2. `Read every file in s08_context_compact/`（一度に大量の内容を読み込み、L3 のディスク退避を観察）\n3. 20+ ラウンドの対話を繰り返し、`[auto compact]` または `[reactive compact]` が表示されるか観察\n\n観察のポイント：ツール実行のたびに、古い tool_result は圧縮されているか？連続対話で token が閾値を超えたとき、要約が自動的にトリガーされたか？\n\n---\n\n## 次へ\n\nコンテキスト圧縮により、Agent は長時間クラッシュせずに動けるようになった。しかし、圧縮のたびにユーザーが以前に伝えた偏好や制約も一緒に失われてしまう。Agent が重要なことを選択的に記憶できるようにできないか？\n\ns09 Memory → 3 つのサブシステム：何を記憶するかの選択、重要情報の抽出、整理と統合。圧縮を越え、セッションを越えて。\n\n<details>\n<summary>CC ソースコードの詳細</summary>\n\n> 以下は CC ソースコード `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` の分析に基づく。\n\n### 実行順序の対応\n\n教学版は説明の便宜上 L1/L2/L3/L4 と番号を振っているが、実際の実行順序は番号と完全には一致しない：\n\n| 項目 | 教学版 | Claude Code |\n|------|--------|-------------|\n| 実行順序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto（`query.ts:379-468`） |\n| snip_compact | 先頭 3 + 末尾 47 を保持 | CC はメインスレッドのみ有効；実装はオープンソースリポジトリにない（`HISTORY_SNIP` feature gate）、インターフェースは確認可能：`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`、`SnipTool` もモデルが能動的に呼び出し可能。教学版の 3/47 は簡略パラメータ |\n| micro_compact | テキストプレースホルダで置換 | 2 つのパス：time-based は直接内容をクリア、cached は API の `cache_edits` を使用（legacy パスは削除済み） |\n| micro_compact ホワイトリスト | 位置による（直近 3 件） | time-based は時間閾値でトリガー、cached はカウントでトリガー（`microCompact.ts`） |\n| tool_result_budget | 200KB 文字 | 200,000 文字（`toolLimits.ts:49`） |\n| compact_history 閾値 | 文字数で推定 | 精密な token 数：`contextWindow - maxOutputTokens - 13_000` |\n| 要約の要求 | 5 種類の情報 | 9 つのセクション + `<analysis>`/`<summary>` デュアルタグ |\n| 圧縮プロンプト | シンプルなプロンプト | 先頭と末尾に二重の安全ガードでツール呼び出しを禁止 |\n| PTL retry | あり（簡略版） | `truncateHeadForPTLRetry()` がメッセージグループ単位でロールバック（`compact.ts:243-290`） |\n| 圧縮後のリカバリ | なし（教学版は要約のみ保持） | 直近のファイル、計画、agent/skill/tool などの自動再付加 |\n| サーキットブレーカー | 3 回 | 3 回（`autoCompact.ts:70`） |\n| reactive リトライ | 1 回 | CC にはより精緻な段階別リトライがある |\n\n### 実行順序の詳細\n\nCC ソース `query.ts` での実際の順序：\n\n1. `applyToolResultBudget`（L379）：まず大きな結果を処理し、完全な内容を退避\n2. `snipCompact`（L403）：中間メッセージを切り捨て\n3. `microcompact`（L414）：古い結果のプレースホルダ化\n4. `contextCollapse`（L441）：独立したコンテキスト管理システム（教学版にはなし）\n5. `autoCompact`（L454）：LLM 全量要約\n\n教学版の budget → snip → micro の順序はこれと一致する。教学版には contextCollapse メカニズムがない。\n\n### 完全な定数リファレンス\n\n| 定数 | 値 | ソースファイル |\n|------|-----|--------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| 時間ベース micro_compact 間隔 | 60 分 | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse と sessionMemoryCompact\n\nCC ソースコードには、この教学版では展開していない 2 つのメカニズムが存在する：\n\n- **contextCollapse**：独立したコンテキスト管理システム。有効時には proactive autocompact を抑制し（`autoCompact.ts:215-222`）、collapse の commit/blocking フローがコンテキスト管理を引き継ぐ。ただし manual `/compact` と reactive fallback は独立パスのままで、contextCollapse の影響を受けない。\n- **sessionMemoryCompact**：compact_history の前に、CC は既存の session memory（s09 で解説）を使った軽量要約を先に試みる。LLM を呼び出さない。このメカニズムは s09 を学んだ後に振り返るとより理解しやすい。\n\n### 圧縮プロンプトの中身\n\nCC の圧縮プロンプトには 2 つの厳格な要件がある：\n\n1. **ツール呼び出しの絶対禁止**：冒頭が `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.` で、末尾にも再度 REMINDER がある\n2. **先に分析してから要約**：モデルはまず `<analysis>` タグで思考を整理し、その後 `<summary>` タグで正式な要約を出力する。analysis はフォーマット時に除去される\n\n### 教学版の簡略化は意図的\n\n- micro_compact でテキストプレースホルダを使用 → API 層の `cache_edits` 権限がないため\n- token を文字数で推定 → 精密な tokenizer は教学の対象外\n- 圧縮後のリカバリを省略 → 教学版は要約のみを保持し、ファイルの自動再付加を行わない\n- 2 つの補助メカニズムを展開しない → 10% の細部に属する\n\nコア設計思想、安価なものを先に高価なものを後に、は完全に保持されている。\n\n</details>\n\n<!-- translation-sync: zh@v1, en@v1, ja@v1 -->\n"
+    "content": "# s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/ja/s09) → s10 → ... → s20\n> *\"Context will fill up — have a way to make room\"* — 4層圧縮戦略、安価なものを先に、高価なものを後に実行。\n>\n> **Harness レイヤー**: 圧縮 — クリーンな記憶、無限のセッション。\n\n---\n\n## 課題\n\nAgent が動いている途中で、止まってしまう。\n\nbash、read、write は揃っており、能力は十分。しかし 1000 行のファイル（~4000 token）を読み、さらに 30 のファイルを読み、20 のコマンドを実行したとします。各コマンドの出力、各ファイルの内容がすべて `messages` リストに蓄積されます。\n\nコンテキストウィンドウには上限があります。満杯になると、API は即座に拒否します：`prompt_too_long`。\n\n圧縮しなければ、Agent は大規模プロジェクトではまともに動けません。\n\n---\n\n## ソリューション\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.ja.svg)\n\ns07 のフック構造、スキルロード、サブ Agent の骨格を維持し、圧縮に焦点を当てるため一部のツールは省略。コアの変更点：各 LLM 呼び出し前に 3 層のプリプロセッサ（0 API）を挿入し、token が閾値を超えた場合は LLM 要約（1 API）をトリガー、API エラー時には緊急トリムを実行。\n\nコア設計：安価なものを先に、高価なものを後に。\n\n---\n\n## 仕組み\n\n![4層圧縮パイプライン](/course-assets/s08_context_compact/compaction-layers.ja.svg)\n\n### L1: snip_compact — 無関係な古い会話を切り捨て\n\nAgent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。\n\nメッセージ数が 50 を超えた場合 → 先頭 3 件（初期コンテキスト）と末尾 47 件（現在の作業）を保持して中間を切り詰める。ただし切れ目だけは調整し、`assistant(tool_use)` と後続の `user(tool_result)` を分断しない：\n\n```python\ndef snip_compact(messages, max_messages=50):\n    if len(messages) <= max_messages:\n        return messages\n    head_end, tail_start = 3, len(messages) - (max_messages - 3)\n    if _message_has_tool_use(messages[head_end - 1]):\n        while head_end < len(messages) and _is_tool_result_message(messages[head_end]):\n            head_end += 1\n    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n        tail_start -= 1\n    snipped = tail_start - head_end\n    placeholder = {\"role\": \"user\", \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n    return messages[:head_end] + [placeholder] + messages[tail_start:]\n```\n\n切り捨て自体は単純なままで、境界だけを保護する。残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。\n\n### L2: micro_compact — 古いツール結果をプレースホルダに置換\n\n![古い結果のプレースホルダ](/course-assets/s08_context_compact/micro-compact.ja.svg)\n\nAgent が連続して 10 個のファイルを読んだ。1〜7 回目の完全な内容はまだコンテキストに残っており、もう不要だが、大量のスペースを占有している。\n\n直近 3 件の `tool_result` の完全な内容のみを保持し、それより古いものは 1 行のプレースホルダに置換：\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n    tool_results = collect_tool_result_blocks(messages)\n    if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n        return messages\n    for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n        if len(block.get(\"content\", \"\")) > 120:\n            block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n    return messages\n```\n\n古い結果はクリーンアップされたが、1 件の新しい結果だけで 500KB の可能性がある。大きなファイルを `cat` するだけでコンテキストがいっぱいになる。→ L3。\n\n### L3: tool_result_budget — 大きな結果をディスクに退避\n\n![大きな結果のディスク退避](/course-assets/s08_context_compact/layer1-budget.ja.svg)\n\nモデルが一度に 5 つの大きなファイルを読み、1 つの user メッセージ内の全 `tool_result` の合計が 500KB に達した。\n\n最後の user メッセージ内のすべての `tool_result` の合計サイズを集計。200KB を超えた場合 → サイズ順にソートし、最大のものから順に `.task_outputs/tool-results/` に退避。コンテキストには `<persisted-output>` マーカー + 先頭 2000 文字のプレビューのみを残す。モデルはマーカーを見て完全な内容がディスク上にあることを認識し、必要に応じて再読み込みできる。\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n    last = messages[-1]\n    blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n              if b.get(\"type\") == \"tool_result\"]\n    total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n    if total <= max_bytes:\n        return messages\n    ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n    for idx, block in ranked:\n        if total <= max_bytes:\n            break\n        block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n        total = recalculate_total(blocks)\n    return messages\n```\n\n最初の 3 層はすべて純粋なテキスト/構造操作（0 API 呼び出し）だが、会話内容を「理解」することはできない。コンテキストがまだ大きすぎる可能性がある。→ L4。\n\n### L4: compact_history — LLM 全量要約\n\n![LLM 全量要約](/course-assets/s08_context_compact/auto-compact.ja.svg)\n\n最初の 3 層がすべて実行されたが、超大規模プロジェクトで 30 分間連続作業すると、token がまだ閾値を超えている。\n\n3 ステップのフロー：\n\n1. **transcript を保存**：完全な会話を `.transcripts/` に JSONL 形式で書き出す。transcript は回復可能な記録として保存されるが、モデルのアクティブなコンテキストには要約しか残らない。モデルの現在の推論にとって、詳細はすでにコンテキストにない。教学コードは transcript 検索ツールを提供しない。\n2. **LLM で要約を生成**：会話履歴を LLM に送り、現在の目標、重要な発見、変更済みファイル、残りの作業、ユーザーの制約などの重要な情報を保持するよう指示。\n3. **メッセージリストを置換**：すべての古いメッセージが 1 件の要約に置き換えられる。教学版は要約のみを保持する。実際の Claude Code は compact 後に直近のファイル、計画、agent/skill/tool などのコンテキストを再付加する。\n\n```python\ndef compact_history(messages):\n    transcript_path = write_transcript(messages)  # 先に完全な会話を保存\n    summary = summarize_history(messages)          # LLM で要約を生成\n    return [{\"role\": \"user\",\n             \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**サーキットブレーカー**：連続 3 回失敗したらリトライを停止し、無限ループによる API 呼び出しの浪費を防止。\n\n### 緊急: reactive_compact\n\nAPI がまだ `prompt_too_long`（413）を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。\n\nこの時 **reactive_compact** がトリガーされる：compact_history よりもさらに積極的だが、末尾を残す際も孤立した `tool_result` を残さないようにする。\n\n```python\ndef reactive_compact(messages):\n    transcript = write_transcript(messages)\n    summary = summarize_history(messages)\n    tail_start = max(0, len(messages) - 5)\n    if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n        tail_start -= 1\n    return [{\"role\": \"user\",\n             \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *messages[tail_start:]]\n```\n\nreactive compact にはリトライ上限がある（デフォルト 1 回）。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。\n\n### 合わせて実行\n\n```python\ndef agent_loop(messages):\n    reactive_retries = 0\n    while True:\n        # 3 つのプリプロセッサ（0 API 呼び出し）\n        # 順序：budget を先に実行し、大きな内容をプレースホルダ化する前に退避\n        messages[:] = tool_result_budget(messages)    # L3: 大きな結果を退避\n        messages[:] = snip_compact(messages)          # L1: 中間を切り捨て\n        messages[:] = micro_compact(messages)         # L2: 古い結果をプレースホルダに\n\n        # まだ足りない？LLM 要約（1 API 呼び出し）\n        if estimate_token_count(messages) > THRESHOLD:\n            messages[:] = compact_history(messages)\n\n        try:\n            response = client.messages.create(...)\n        except PromptTooLongError:\n            if reactive_retries < MAX_REACTIVE_RETRIES:\n                messages[:] = reactive_compact(messages)  # 緊急対応\n                reactive_retries += 1\n                continue\n            raise  # リトライ上限超過、例外をスロー\n        # ... ツール実行 ...\n\n        # compact ツール：モデルが能動的に呼び出した場合、compact_history をトリガー\n        if block.name == \"compact\":\n            messages[:] = compact_history(messages)\n            results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n            messages.append({\"role\": \"user\", \"content\": results})\n            break  # 現在のターンを終了し、圧縮後のコンテキストで新しく開始\n```\n\n**順序は変えられない。** L3（budget）が L2（micro）の前に実行される理由：micro は古い大きな tool_result を 1 行のプレースホルダに置換するため、budget はその前に完全な内容を退避させる必要がある。CC ソースが `applyToolResultBudget` を最初に配置する理由も同じ。\n\n---\n\n## s07 からの変更点\n\n| コンポーネント | 変更前 (s07) | 変更後 (s08) |\n|------|-----------|-----------|\n| コンテキスト管理 | なし（コンテキストが無限に膨張） | 4 層圧縮パイプライン + 緊急対応 |\n| 新規関数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| ツール | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| ループ | LLM 呼び出し → ツール実行 | 各ラウンド前に 3 層プリプロセッサを実行 + 閾値で compact_history をトリガー |\n| 設計原則 | — | 安価なものを先に、高価なものを後に |\n\n---\n\n## 試してみよう\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\n以下のプロンプトを試してみてください：\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`（連続して複数のファイルを読み、L2 の古い結果圧縮を観察）\n2. `Read every file in s08_context_compact/`（一度に大量の内容を読み込み、L3 のディスク退避を観察）\n3. 20+ ラウンドの対話を繰り返し、`[auto compact]` または `[reactive compact]` が表示されるか観察\n\n観察のポイント：ツール実行のたびに、古い tool_result は圧縮されているか？連続対話で token が閾値を超えたとき、要約が自動的にトリガーされたか？\n\n---\n\n## 次へ\n\nコンテキスト圧縮により、Agent は長時間クラッシュせずに動けるようになった。しかし、圧縮のたびにユーザーが以前に伝えた偏好や制約も一緒に失われてしまう。Agent が重要なことを選択的に記憶できるようにできないか？\n\ns09 Memory → 3 つのサブシステム：何を記憶するかの選択、重要情報の抽出、整理と統合。圧縮を越え、セッションを越えて。\n\n<details>\n<summary>CC ソースコードの詳細</summary>\n\n> 以下は CC ソースコード `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` の分析に基づく。\n\n### 実行順序の対応\n\n教学版は説明の便宜上 L1/L2/L3/L4 と番号を振っているが、実際の実行順序は番号と完全には一致しない：\n\n| 項目 | 教学版 | Claude Code |\n|------|--------|-------------|\n| 実行順序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto（`query.ts:379-468`） |\n| snip_compact | 先頭 3 + 末尾 47 を保持 | CC はメインスレッドのみ有効；実装はオープンソースリポジトリにない（`HISTORY_SNIP` feature gate）、インターフェースは確認可能：`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`、`SnipTool` もモデルが能動的に呼び出し可能。教学版の 3/47 は簡略パラメータ |\n| micro_compact | テキストプレースホルダで置換 | 2 つのパス：time-based は直接内容をクリア、cached は API の `cache_edits` を使用（legacy パスは削除済み） |\n| micro_compact ホワイトリスト | 位置による（直近 3 件） | time-based は時間閾値でトリガー、cached はカウントでトリガー（`microCompact.ts`） |\n| tool_result_budget | 200KB 文字 | 200,000 文字（`toolLimits.ts:49`） |\n| compact_history 閾値 | 文字数で推定 | 精密な token 数：`contextWindow - maxOutputTokens - 13_000` |\n| 要約の要求 | 5 種類の情報 | 9 つのセクション + `<analysis>`/`<summary>` デュアルタグ |\n| 圧縮プロンプト | シンプルなプロンプト | 先頭と末尾に二重の安全ガードでツール呼び出しを禁止 |\n| PTL retry | あり（簡略版） | `truncateHeadForPTLRetry()` がメッセージグループ単位でロールバック（`compact.ts:243-290`） |\n| 圧縮後のリカバリ | なし（教学版は要約のみ保持） | 直近のファイル、計画、agent/skill/tool などの自動再付加 |\n| サーキットブレーカー | 3 回 | 3 回（`autoCompact.ts:70`） |\n| reactive リトライ | 1 回 | CC にはより精緻な段階別リトライがある |\n\n### 実行順序の詳細\n\nCC ソース `query.ts` での実際の順序：\n\n1. `applyToolResultBudget`（L379）：まず大きな結果を処理し、完全な内容を退避\n2. `snipCompact`（L403）：中間メッセージを切り捨て\n3. `microcompact`（L414）：古い結果のプレースホルダ化\n4. `contextCollapse`（L441）：独立したコンテキスト管理システム（教学版にはなし）\n5. `autoCompact`（L454）：LLM 全量要約\n\n教学版の budget → snip → micro の順序はこれと一致する。教学版には contextCollapse メカニズムがない。\n\n### 完全な定数リファレンス\n\n| 定数 | 値 | ソースファイル |\n|------|-----|--------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| 時間ベース micro_compact 間隔 | 60 分 | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse と sessionMemoryCompact\n\nCC ソースコードには、この教学版では展開していない 2 つのメカニズムが存在する：\n\n- **contextCollapse**：独立したコンテキスト管理システム。有効時には proactive autocompact を抑制し（`autoCompact.ts:215-222`）、collapse の commit/blocking フローがコンテキスト管理を引き継ぐ。ただし manual `/compact` と reactive fallback は独立パスのままで、contextCollapse の影響を受けない。\n- **sessionMemoryCompact**：compact_history の前に、CC は既存の session memory（s09 で解説）を使った軽量要約を先に試みる。LLM を呼び出さない。このメカニズムは s09 を学んだ後に振り返るとより理解しやすい。\n\n### 圧縮プロンプトの中身\n\nCC の圧縮プロンプトには 2 つの厳格な要件がある：\n\n1. **ツール呼び出しの絶対禁止**：冒頭が `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.` で、末尾にも再度 REMINDER がある\n2. **先に分析してから要約**：モデルはまず `<analysis>` タグで思考を整理し、その後 `<summary>` タグで正式な要約を出力する。analysis はフォーマット時に除去される\n\n### 教学版の簡略化は意図的\n\n- micro_compact でテキストプレースホルダを使用 → API 層の `cache_edits` 権限がないため\n- token を文字数で推定 → 精密な tokenizer は教学の対象外\n- 圧縮後のリカバリを省略 → 教学版は要約のみを保持し、ファイルの自動再付加を行わない\n- 2 つの補助メカニズムを展開しない → 10% の細部に属する\n\nコア設計思想、安価なものを先に高価なものを後に、は完全に保持されている。\n\n</details>\n\n<!-- translation-sync: zh@v1, en@v1, ja@v1 -->\n"
   },
   {
     "version": "s09",