diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index c24c9b701..2f065c4fe 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -20,7 +20,7 @@ jobs: python-version: "3.11" - name: Install dependencies - run: pip install anthropic python-dotenv pytest + run: pip install -r requirements.txt pytest - name: Run Python smoke tests run: python -m pytest tests -q diff --git a/s08_context_compact/README.en.md b/s08_context_compact/README.en.md index 6c5941296..6b2f23593 100644 --- a/s08_context_compact/README.en.md +++ b/s08_context_compact/README.en.md @@ -39,20 +39,24 @@ Core design: cheap first, expensive last. The agent ran 80 turns of conversation, accumulating 160 `messages`. The very first "help me create hello.py" is barely relevant to current work, yet it still occupies space. -Message count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle: +Message count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle; the only extra boundary rule is that `assistant(tool_use)` must not be separated from the following `user(tool_result)`: ```python def snip_compact(messages, max_messages=50): if len(messages) <= max_messages: return messages - keep_head, keep_tail = 3, max_messages - 3 - snipped = len(messages) - keep_head - keep_tail - placeholder = {"role": "user", - "content": f"[snipped {snipped} messages from conversation middle]"} - return messages[:keep_head] + [placeholder] + messages[-keep_tail:] + head_end, tail_start = 3, len(messages) - (max_messages - 3) + if _message_has_tool_use(messages[head_end - 1]): + while head_end < len(messages) and _is_tool_result_message(messages[head_end]): + head_end += 1 + if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]): + tail_start -= 1 + snipped = tail_start - head_end + placeholder = {"role": "user", "content": f"[snipped {snipped} messages from conversation middle]"} + return messages[:head_end] + [placeholder] + messages[tail_start:] ``` -Entire messages are trimmed, but `tool_result` content within remaining messages keeps accumulating — message #34 may still hold 30KB of old file contents. → L2. +Messages are still trimmed directly; this just adds one boundary guard. `tool_result` content within remaining messages still keeps accumulating — message #34 may still hold 30KB of old file contents. → L2. ### L2: micro_compact — Placeholder for Old Tool Results @@ -130,15 +134,17 @@ def compact_history(messages): Sometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers. -This triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, trimming to an API-acceptable size with byte-level precision, keeping only the last 5 messages + summary. +This triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, but still avoids leaving an orphaned `tool_result`. ```python def reactive_compact(messages): transcript = write_transcript(messages) summary = summarize_history(messages) - tail = messages[-5:] + tail_start = max(0, len(messages) - 5) + if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]): + tail_start -= 1 return [{"role": "user", - "content": f"[Reactive compact]\n\n{summary}"}, *tail] + "content": f"[Reactive compact]\n\n{summary}"}, *messages[tail_start:]] ``` Reactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11. diff --git a/s08_context_compact/README.ja.md b/s08_context_compact/README.ja.md index 934ae5564..84bfb381a 100644 --- a/s08_context_compact/README.ja.md +++ b/s08_context_compact/README.ja.md @@ -39,20 +39,24 @@ s07 のフック構造、スキルロード、サブ Agent の骨格を維持し Agent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。 -メッセージ数が 50 を超えた場合 → 先頭 3 件(初期コンテキスト)と末尾 47 件(現在の作業)を保持し、中間を切り捨て: +メッセージ数が 50 を超えた場合 → 先頭 3 件(初期コンテキスト)と末尾 47 件(現在の作業)を保持して中間を切り詰める。ただし切れ目だけは調整し、`assistant(tool_use)` と後続の `user(tool_result)` を分断しない: ```python def snip_compact(messages, max_messages=50): if len(messages) <= max_messages: return messages - keep_head, keep_tail = 3, max_messages - 3 - snipped = len(messages) - keep_head - keep_tail - placeholder = {"role": "user", - "content": f"[snipped {snipped} messages from conversation middle]"} - return messages[:keep_head] + [placeholder] + messages[-keep_tail:] + head_end, tail_start = 3, len(messages) - (max_messages - 3) + if _message_has_tool_use(messages[head_end - 1]): + while head_end < len(messages) and _is_tool_result_message(messages[head_end]): + head_end += 1 + if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]): + tail_start -= 1 + snipped = tail_start - head_end + placeholder = {"role": "user", "content": f"[snipped {snipped} messages from conversation middle]"} + return messages[:head_end] + [placeholder] + messages[tail_start:] ``` -メッセージ全体は切り捨てたが、残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。 +切り捨て自体は単純なままで、境界だけを保護する。残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。 ### L2: micro_compact — 古いツール結果をプレースホルダに置換 @@ -130,15 +134,17 @@ def compact_history(messages): API がまだ `prompt_too_long`(413)を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。 -この時 **reactive_compact** がトリガーされる:compact_history よりもさらに積極的で、末尾からバイト単位の精度で API が受け入れ可能なサイズまで切り詰め、最後の 5 件のメッセージ + 要約のみを保持。 +この時 **reactive_compact** がトリガーされる:compact_history よりもさらに積極的だが、末尾を残す際も孤立した `tool_result` を残さないようにする。 ```python def reactive_compact(messages): transcript = write_transcript(messages) summary = summarize_history(messages) - tail = messages[-5:] + tail_start = max(0, len(messages) - 5) + if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]): + tail_start -= 1 return [{"role": "user", - "content": f"[Reactive compact]\n\n{summary}"}, *tail] + "content": f"[Reactive compact]\n\n{summary}"}, *messages[tail_start:]] ``` reactive compact にはリトライ上限がある(デフォルト 1 回)。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。 diff --git a/s08_context_compact/README.md b/s08_context_compact/README.md index c8e3c1cb4..22d967156 100644 --- a/s08_context_compact/README.md +++ b/s08_context_compact/README.md @@ -39,20 +39,24 @@ Agent 跑着跑着,不动了。 Agent 跑了 80 轮对话,`messages` 攒了 160 条。最前面的"帮我创建 hello.py"和当前工作几乎无关了,但全占着位置。 -消息数超过 50 条 → 保留头部 3 条(初始上下文)和尾部 47 条(当前工作),中间裁掉: +消息数超过 50 条 → 保留头部 3 条(初始上下文)和尾部 47 条(当前工作),中间裁掉;唯一额外边界条件是,不能把 `assistant(tool_use)` 和后面的 `user(tool_result)` 拆开: ```python def snip_compact(messages, max_messages=50): if len(messages) <= max_messages: return messages - keep_head, keep_tail = 3, max_messages - 3 - snipped = len(messages) - keep_head - keep_tail - placeholder = {"role": "user", - "content": f"[snipped {snipped} messages from conversation middle]"} - return messages[:keep_head] + [placeholder] + messages[-keep_tail:] + head_end, tail_start = 3, len(messages) - (max_messages - 3) + if _message_has_tool_use(messages[head_end - 1]): + while head_end < len(messages) and _is_tool_result_message(messages[head_end]): + head_end += 1 + if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]): + tail_start -= 1 + snipped = tail_start - head_end + placeholder = {"role": "user", "content": f"[snipped {snipped} messages from conversation middle]"} + return messages[:head_end] + [placeholder] + messages[tail_start:] ``` -裁掉了整条消息,但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。 +裁掉的是消息本身,只是在切口处多做一步保护;剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。 ### L2: micro_compact — 旧工具结果占位 @@ -130,15 +134,17 @@ def compact_history(messages): 有时候 API 还是返回 `prompt_too_long`(413),上下文增长速度快于压缩触发速度时。 -这时触发 **reactive_compact**:比 compact_history 更激进,从尾部回退,以字节级精度裁剪到 API 可接受的大小,只保留最后 5 条消息 + 摘要。 +这时触发 **reactive_compact**:比 compact_history 更激进,从尾部回退,但仍要避免留下孤立 `tool_result`。 ```python def reactive_compact(messages): transcript = write_transcript(messages) summary = summarize_history(messages) - tail = messages[-5:] + tail_start = max(0, len(messages) - 5) + if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]): + tail_start -= 1 return [{"role": "user", - "content": f"[Reactive compact]\n\n{summary}"}, *tail] + "content": f"[Reactive compact]\n\n{summary}"}, *messages[tail_start:]] ``` reactive compact 有重试上限(默认 1 次)。再失败就抛出异常,不无限循环。完整的错误恢复逻辑留给 s11。 diff --git a/s08_context_compact/code.py b/s08_context_compact/code.py index a9cc3092b..b9d78d425 100644 --- a/s08_context_compact/code.py +++ b/s08_context_compact/code.py @@ -268,13 +268,45 @@ def spawn_subagent(task: str) -> str: def estimate_size(msgs): return len(str(msgs)) +def _block_type(block): + return block.get("type") if isinstance(block, dict) else getattr(block, "type", None) + + +def _message_has_tool_use(msg): + if msg.get("role") != "assistant": + return False + content = msg.get("content") + if not isinstance(content, list): + return False + return any(_block_type(block) == "tool_use" for block in content) + + +def _is_tool_result_message(msg): + if msg.get("role") != "user": + return False + content = msg.get("content") + if not isinstance(content, list): + return False + return any(isinstance(block, dict) and block.get("type") == "tool_result" + for block in content) + # L1: snipCompact — trim middle messages def snip_compact(messages, max_messages=50): if len(messages) <= max_messages: return messages keep_head, keep_tail = 3, max_messages - 3 - snipped = len(messages) - keep_head - keep_tail - return messages[:keep_head] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] + messages[-keep_tail:] + head_end, tail_start = keep_head, len(messages) - keep_tail + if head_end > 0 and _message_has_tool_use(messages[head_end - 1]): + while head_end < len(messages) and _is_tool_result_message(messages[head_end]): + head_end += 1 + if (tail_start > 0 and tail_start < len(messages) + and _is_tool_result_message(messages[tail_start]) + and _message_has_tool_use(messages[tail_start - 1])): + tail_start -= 1 + if head_end >= tail_start: + return messages + snipped = tail_start - head_end + return messages[:head_end] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] + messages[tail_start:] # L2: microCompact — old result placeholders @@ -351,7 +383,12 @@ def compact_history(messages): def reactive_compact(messages): transcript = write_transcript(messages) summary = summarize_history(messages) - return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *messages[-5:]] + tail_start = max(0, len(messages) - 5) + if (tail_start > 0 and tail_start < len(messages) + and _is_tool_result_message(messages[tail_start]) + and _message_has_tool_use(messages[tail_start - 1])): + tail_start -= 1 + return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *messages[tail_start:]] # ═══════════════════════════════════════════════════════════ diff --git a/s09_memory/code.py b/s09_memory/code.py index 2f660e769..f80a92636 100644 --- a/s09_memory/code.py +++ b/s09_memory/code.py @@ -449,9 +449,38 @@ def spawn_subagent(task: str) -> str: def estimate_size(msgs): return len(str(msgs)) +def _block_type(block): + return block.get("type") if isinstance(block, dict) else getattr(block, "type", None) + +def _message_has_tool_use(msg): + if msg.get("role") != "assistant": + return False + content = msg.get("content") + if not isinstance(content, list): + return False + return any(_block_type(block) == "tool_use" for block in content) + +def _is_tool_result_message(msg): + if msg.get("role") != "user": + return False + content = msg.get("content") + if not isinstance(content, list): + return False + return any(isinstance(block, dict) and block.get("type") == "tool_result" for block in content) + def snip_compact(msgs, mx=50): if len(msgs) <= mx: return msgs - return msgs[:3] + [{"role": "user", "content": f"[snipped {len(msgs)-mx} msgs]"}] + msgs[-(mx-3):] + head_end, tail_start = 3, len(msgs) - (mx - 3) + if head_end > 0 and _message_has_tool_use(msgs[head_end - 1]): + while head_end < len(msgs) and _is_tool_result_message(msgs[head_end]): + head_end += 1 + if (tail_start > 0 and tail_start < len(msgs) + and _is_tool_result_message(msgs[tail_start]) + and _message_has_tool_use(msgs[tail_start - 1])): + tail_start -= 1 + if head_end >= tail_start: + return msgs + return msgs[:head_end] + [{"role": "user", "content": f"[snipped {tail_start - head_end} msgs]"}] + msgs[tail_start:] def collect_tool_results(msgs): blocks = [] @@ -512,7 +541,12 @@ def compact_history(msgs): def reactive_compact(msgs): write_transcript(msgs) summary = summarize_history(msgs) - return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *msgs[-5:]] + tail_start = max(0, len(msgs) - 5) + if (tail_start > 0 and tail_start < len(msgs) + and _is_tool_result_message(msgs[tail_start]) + and _message_has_tool_use(msgs[tail_start - 1])): + tail_start -= 1 + return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *msgs[tail_start:]] # ═══════════════════════════════════════════════════════════ diff --git a/s20_comprehensive/code.py b/s20_comprehensive/code.py index 12142e775..bd62553e0 100644 --- a/s20_comprehensive/code.py +++ b/s20_comprehensive/code.py @@ -1060,6 +1060,28 @@ def spawn_subagent(description: str) -> str: def estimate_size(messages: list) -> int: return len(json.dumps(messages, default=str)) +def block_type(block): + return block.get("type") if isinstance(block, dict) else getattr(block, "type", None) + + +def message_has_tool_use(message: dict) -> bool: + if message.get("role") != "assistant": + return False + content = message.get("content") + if not isinstance(content, list): + return False + return any(block_type(block) == "tool_use" for block in content) + + +def is_tool_result_message(message: dict) -> bool: + if message.get("role") != "user": + return False + content = message.get("content") + if not isinstance(content, list): + return False + return any(isinstance(block, dict) and block.get("type") == "tool_result" + for block in content) + def collect_tool_results(messages: list): found = [] @@ -1111,11 +1133,20 @@ def tool_result_budget(messages: list, max_bytes: int = 200_000) -> list: def snip_compact(messages: list, max_messages: int = 50) -> list: if len(messages) <= max_messages: return messages - keep_head, keep_tail = 3, max_messages - 3 - snipped = len(messages) - keep_head - keep_tail - return (messages[:keep_head] + head_end, tail_start = 3, len(messages) - (max_messages - 3) + if head_end > 0 and message_has_tool_use(messages[head_end - 1]): + while head_end < len(messages) and is_tool_result_message(messages[head_end]): + head_end += 1 + if (tail_start > 0 and tail_start < len(messages) + and is_tool_result_message(messages[tail_start]) + and message_has_tool_use(messages[tail_start - 1])): + tail_start -= 1 + if head_end >= tail_start: + return messages + snipped = tail_start - head_end + return (messages[:head_end] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] - + messages[-keep_tail:]) + + messages[tail_start:]) def micro_compact(messages: list) -> list: @@ -1163,8 +1194,13 @@ def reactive_compact(messages: list) -> list: summary = summarize_history(messages) except Exception: summary = "Earlier conversation was trimmed after a prompt-too-long error." + tail_start = max(0, len(messages) - 5) + if (tail_start > 0 and tail_start < len(messages) + and is_tool_result_message(messages[tail_start]) + and message_has_tool_use(messages[tail_start - 1])): + tail_start -= 1 return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, - *messages[-5:]] + *messages[tail_start:]] # ── Error Recovery ── diff --git a/tests/test_compaction_tool_pairs.py b/tests/test_compaction_tool_pairs.py new file mode 100644 index 000000000..e4f67d7b2 --- /dev/null +++ b/tests/test_compaction_tool_pairs.py @@ -0,0 +1,189 @@ +import importlib.util +import os +import sys +import tempfile +import types +import unittest +from pathlib import Path + + +REPO_ROOT = Path(__file__).resolve().parents[1] +MODULES = { + "s08": REPO_ROOT / "s08_context_compact" / "code.py", + "s09": REPO_ROOT / "s09_memory" / "code.py", + "s20": REPO_ROOT / "s20_comprehensive" / "code.py", +} + + +def load_module(name: str, path: Path, temp_cwd: Path): + fake_anthropic = types.ModuleType("anthropic") + + class FakeAnthropic: + def __init__(self, *args, **kwargs): + self.messages = types.SimpleNamespace(create=None) + + fake_dotenv = types.ModuleType("dotenv") + setattr(fake_anthropic, "Anthropic", FakeAnthropic) + setattr(fake_dotenv, "load_dotenv", lambda override=True: None) + + previous_anthropic = sys.modules.get("anthropic") + previous_dotenv = sys.modules.get("dotenv") + previous_cwd = Path.cwd() + previous_model = os.environ.get("MODEL_ID") + previous_key = os.environ.get("ANTHROPIC_API_KEY") + + spec = importlib.util.spec_from_file_location(name, path) + if spec is None or spec.loader is None: + raise RuntimeError(f"Unable to load {path}") + module = importlib.util.module_from_spec(spec) + + sys.modules["anthropic"] = fake_anthropic + sys.modules["dotenv"] = fake_dotenv + os.environ["MODEL_ID"] = "test-model" + os.environ["ANTHROPIC_API_KEY"] = "test-key" + try: + os.chdir(temp_cwd) + spec.loader.exec_module(module) + return module + finally: + os.chdir(previous_cwd) + if previous_anthropic is None: + sys.modules.pop("anthropic", None) + else: + sys.modules["anthropic"] = previous_anthropic + if previous_dotenv is None: + sys.modules.pop("dotenv", None) + else: + sys.modules["dotenv"] = previous_dotenv + if previous_model is None: + os.environ.pop("MODEL_ID", None) + else: + os.environ["MODEL_ID"] = previous_model + if previous_key is None: + os.environ.pop("ANTHROPIC_API_KEY", None) + else: + os.environ["ANTHROPIC_API_KEY"] = previous_key + + +def assistant_text(): + return {"role": "assistant", "content": [types.SimpleNamespace(type="text", text="ok")]} + + +def user_text(): + return {"role": "user", "content": "continue"} + + +def tool_use_message(tool_id="tool-1"): + return { + "role": "assistant", + "content": [types.SimpleNamespace(type="tool_use", id=tool_id, name="bash")], + } + + +def tool_result_message(tool_id="tool-1"): + return { + "role": "user", + "content": [{"type": "tool_result", "tool_use_id": tool_id, "content": "ok"}], + } + + +def message_has_tool_use(message): + content = message.get("content") + return ( + message.get("role") == "assistant" + and isinstance(content, list) + and any(getattr(block, "type", None) == "tool_use" for block in content) + ) + + +def assert_no_orphan_tool_results(testcase, messages): + for idx, message in enumerate(messages): + content = message.get("content") + if message.get("role") != "user" or not isinstance(content, list): + continue + if not any(isinstance(block, dict) and block.get("type") == "tool_result" for block in content): + continue + testcase.assertGreater(idx, 0) + testcase.assertTrue(message_has_tool_use(messages[idx - 1]), messages) + + +class CompactionToolPairTests(unittest.TestCase): + def test_snip_compact_keeps_head_tool_pair(self): + messages = [ + user_text(), + assistant_text(), + tool_use_message("head-tool"), + tool_result_message("head-tool"), + assistant_text(), + user_text(), + assistant_text(), + user_text(), + assistant_text(), + user_text(), + ] + + for name, path in MODULES.items(): + with self.subTest(name=name), tempfile.TemporaryDirectory() as tmp: + module = load_module(f"{name}_head_under_test", path, Path(tmp)) + if name == "s09": + compacted = module.snip_compact(list(messages), mx=6) + else: + compacted = module.snip_compact(list(messages), max_messages=6) + self.assertEqual(compacted[2], messages[2]) + self.assertEqual(compacted[3], messages[3]) + assert_no_orphan_tool_results(self, compacted) + + def test_snip_compact_keeps_tail_tool_pair(self): + messages = [ + user_text(), + assistant_text(), + user_text(), + assistant_text(), + user_text(), + assistant_text(), + tool_use_message("tail-tool"), + tool_result_message("tail-tool"), + assistant_text(), + user_text(), + ] + + for name, path in MODULES.items(): + with self.subTest(name=name), tempfile.TemporaryDirectory() as tmp: + module = load_module(f"{name}_under_test", path, Path(tmp)) + if name == "s09": + compacted = module.snip_compact(list(messages), mx=6) + else: + compacted = module.snip_compact(list(messages), max_messages=6) + assert_no_orphan_tool_results(self, compacted) + + def test_reactive_compact_keeps_tail_tool_pair(self): + messages = [ + user_text(), + assistant_text(), + user_text(), + tool_use_message("reactive-tool"), + tool_result_message("reactive-tool"), + assistant_text(), + user_text(), + assistant_text(), + user_text(), + ] + + for name, path in MODULES.items(): + with self.subTest(name=name), tempfile.TemporaryDirectory() as tmp: + module = load_module(f"{name}_reactive_under_test", path, Path(tmp)) + module.write_transcript = lambda _messages: Path("transcript.jsonl") + module.summarize_history = lambda _messages: "summary" + compacted = module.reactive_compact(list(messages)) + self.assertEqual(compacted[1], messages[3]) + assert_no_orphan_tool_results(self, compacted) + + def test_s20_has_tool_use_still_accepts_content_blocks(self): + with tempfile.TemporaryDirectory() as tmp: + module = load_module("s20_has_tool_use_under_test", MODULES["s20"], Path(tmp)) + self.assertTrue(module.has_tool_use([types.SimpleNamespace(type="tool_use")])) + self.assertFalse(module.has_tool_use([types.SimpleNamespace(type="text")])) + + +if __name__ == "__main__": + unittest.main() diff --git a/web/src/data/generated/docs.json b/web/src/data/generated/docs.json index 3f6bb41c5..4e50b7380 100644 --- a/web/src/data/generated/docs.json +++ b/web/src/data/generated/docs.json @@ -129,19 +129,19 @@ "version": "s08", "locale": "en", "title": "s08: Context Compact — Context Will Fill Up, Have a Way to Make Room", - "content": "# s08: Context Compact — Context Will Fill Up, Have a Way to Make Room\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/en/s09) → s10 → ... → s20\n> *\"Context will fill up — have a way to make room\"* — Four-layer compression pipeline: cheap first, expensive last.\n>\n> **Harness Layer**: Compression — clean memory, unlimited sessions.\n\n---\n\n## The Problem\n\nThe agent is running along, then freezes.\n\nIt has bash, read, write — all the capabilities it needs. But it read a 1000-line file (~4000 tokens), then read 30 more files, ran 20 commands. Every command's output, every file's contents, all pile up in the `messages` list.\n\nThe context window is finite. Once full, the API outright rejects the call: `prompt_too_long`.\n\nWithout compression, an agent simply cannot work on large projects.\n\n---\n\n## The Solution\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.en.svg)\n\nThe hook structure, skill loading, and sub-Agent from s07 are preserved, with some tools omitted to focus on compaction. The core change: insert three pre-processors (0 API calls) before each LLM call, trigger an LLM summary (1 API call) when tokens still exceed the threshold, and emergency-trim if the API throws an error.\n\nCore design: cheap first, expensive last.\n\n---\n\n## How It Works\n\n![Four-layer compression pipeline](/course-assets/s08_context_compact/compaction-layers.en.svg)\n\n### L1: snip_compact — Trim Irrelevant Old Conversation\n\nThe agent ran 80 turns of conversation, accumulating 160 `messages`. The very first \"help me create hello.py\" is barely relevant to current work, yet it still occupies space.\n\nMessage count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle:\n\n```python\ndef snip_compact(messages, max_messages=50):\n if len(messages) <= max_messages:\n return messages\n keep_head, keep_tail = 3, max_messages - 3\n snipped = len(messages) - keep_head - keep_tail\n placeholder = {\"role\": \"user\",\n \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n return messages[:keep_head] + [placeholder] + messages[-keep_tail:]\n```\n\nEntire messages are trimmed, but `tool_result` content within remaining messages keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.\n\n### L2: micro_compact — Placeholder for Old Tool Results\n\n![Old results placeholder](/course-assets/s08_context_compact/micro-compact.en.svg)\n\nThe agent read 10 files consecutively. The full contents of reads 1–7 are still sitting in context, no longer needed, but hogging large amounts of space.\n\nKeep only the 3 most recent `tool_result` entries intact; replace older ones with a one-line placeholder:\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n tool_results = collect_tool_result_blocks(messages)\n if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n return messages\n for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n if len(block.get(\"content\", \"\")) > 120:\n block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n return messages\n```\n\nOld results are cleared, but a single new result can be 500KB — one `cat` of a large file can max out the context. → L3.\n\n### L3: tool_result_budget — Persist Large Results to Disk\n\n![Large results to disk](/course-assets/s08_context_compact/layer1-budget.en.svg)\n\nThe model read 5 large files in one go; all `tool_result` blocks in the last user message total 500KB.\n\nSum the size of all `tool_result` blocks in the last user message. If over 200KB → sort by size, starting from the largest, persist to `.task_outputs/tool-results/`, keeping only a `` marker + a 2000-character preview in context. The model sees the marker and knows the full content is on disk, re-reading it when needed.\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n last = messages[-1]\n blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n if b.get(\"type\") == \"tool_result\"]\n total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n if total <= max_bytes:\n return messages\n ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n for idx, block in ranked:\n if total <= max_bytes:\n break\n block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n total = recalculate_total(blocks)\n return messages\n```\n\nThe first three layers are all plain-text / structural operations — 0 API calls — but they cannot \"understand\" conversation content. Context may still be too large. → L4.\n\n### L4: compact_history — Full LLM Summary\n\n![Full LLM summary](/course-assets/s08_context_compact/auto-compact.en.svg)\n\nAll three previous layers have run, but after 30 minutes of continuous work on a huge project, tokens still exceed the threshold.\n\nThree-step process:\n\n1. **Save transcript**: Write the full conversation to `.transcripts/` in JSONL format. The transcript preserves a recoverable record, but the model's active context only contains the summary. For the model's current reasoning, the details are no longer in context. The teaching code does not provide a transcript retrieval tool.\n2. **LLM generates summary**: Send conversation history to the LLM, asking it to preserve key information: current goals, important findings, modified files, remaining work, user constraints, etc.\n3. **Replace message list**: All old messages are replaced with a single summary. The teaching version only keeps the summary; the real Claude Code re-attaches some recent files, plans, agent/skill/tool context after compaction.\n\n```python\ndef compact_history(messages):\n transcript_path = write_transcript(messages) # Save full conversation first\n summary = summarize_history(messages) # LLM generates summary\n return [{\"role\": \"user\",\n \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**Circuit breaker**: After 3 consecutive failures, stop retrying to prevent an infinite loop wasting API calls.\n\n### Reactive: reactive_compact\n\nSometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers.\n\nThis triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, trimming to an API-acceptable size with byte-level precision, keeping only the last 5 messages + summary.\n\n```python\ndef reactive_compact(messages):\n transcript = write_transcript(messages)\n summary = summarize_history(messages)\n tail = messages[-5:]\n return [{\"role\": \"user\",\n \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *tail]\n```\n\nReactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11.\n\n### Putting It All Together\n\n```python\ndef agent_loop(messages):\n reactive_retries = 0\n while True:\n # Three pre-processors (0 API calls)\n # Order: budget first, so large content is persisted before placeholders\n messages[:] = tool_result_budget(messages) # L3: persist large results\n messages[:] = snip_compact(messages) # L1: trim middle\n messages[:] = micro_compact(messages) # L2: old result placeholders\n\n # Still too much? LLM summary (1 API call)\n if estimate_token_count(messages) > THRESHOLD:\n messages[:] = compact_history(messages)\n\n try:\n response = client.messages.create(...)\n except PromptTooLongError:\n if reactive_retries < MAX_REACTIVE_RETRIES:\n messages[:] = reactive_compact(messages) # Emergency\n reactive_retries += 1\n continue\n raise # retry limit exceeded, raise exception\n # ... tool execution ...\n\n # compact tool: when the model actively calls it, triggers compact_history\n if block.name == \"compact\":\n messages[:] = compact_history(messages)\n results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n messages.append({\"role\": \"user\", \"content\": results})\n break # end current turn, start fresh with compacted context\n```\n\n**The order must not be swapped.** L3 (budget) runs before L2 (micro) because micro replaces old large tool_results with one-line placeholders — budget must persist the full content before that happens. This is why CC source puts `applyToolResultBudget` first.\n\n---\n\n## Changes From s07\n\n| Component | Before (s07) | After (s08) |\n|-----------|-------------|-------------|\n| Context management | None (context grows unbounded) | Four-layer compression pipeline + emergency |\n| New functions | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| Tools | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| Loop | LLM call → tool execution | Three pre-processors before each turn + threshold-triggered compact_history |\n| Design principle | — | Cheap first, expensive last |\n\n---\n\n## Try It\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\nTry these prompts:\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md` (read multiple files consecutively, observe L2 compressing old results)\n2. `Read every file in s08_context_compact/` (read a large amount of content at once, observe L3 persisting to disk)\n3. Chat for 20+ turns, observe whether `[auto compact]` or `[reactive compact]` appears\n\nWhat to watch for: After each tool execution, are old `tool_result` entries compressed? When tokens exceed the threshold after extended conversation, is summarization triggered automatically?\n\n---\n\n## What's Next\n\nContext compression lets an agent run for a long time without crashing. But after each compression, the preferences and constraints the user told it are also lost. Can we let the agent selectively remember important things?\n\ns09 Memory → three subsystems: choosing what to remember, extracting key information, consolidating and organizing. Across compressions, across sessions.\n\n
\nDeep Dive Into CC Source Code\n\n> The following is based on analysis of CC source code `compact.ts`, `autoCompact.ts`, `microCompact.ts`, and `query.ts`.\n\n### Execution Order Comparison\n\nThe teaching version labels layers L1/L2/L3/L4 for pedagogical clarity, but actual execution order does not match the numbering:\n\n| Dimension | Teaching Version | Claude Code |\n|-----------|-----------------|-------------|\n| Execution order | budget → snip → micro → auto | budget → snip → micro → collapse → auto (`query.ts:379-468`) |\n| snip_compact | Keep head 3 + tail 47 | CC only enables on main thread; implementation not in open-source repo (`HISTORY_SNIP` feature gate), but interface is visible: `snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`, also exposes `SnipTool` for model-initiated snipping. Teaching version's 3/47 are simplified parameters |\n| micro_compact | Text placeholder replacement | Two paths: time-based clears content directly, cached uses API `cache_edits` (legacy path removed) |\n| micro_compact whitelist | By position (most recent 3) | time-based triggers by time threshold; cached triggers by count (`microCompact.ts`) |\n| tool_result_budget | 200KB characters | 200,000 characters (`toolLimits.ts:49`) |\n| compact_history threshold | Character count estimate | Precise tokens: `contextWindow - maxOutputTokens - 13_000` |\n| Summary requirements | 5 categories of info | 9 sections + ``/`` dual tags |\n| Compression prompt | Simple prompt | Double-ended hard guardrails forbidding tool calls |\n| PTL retry | Yes (simplified) | `truncateHeadForPTLRetry()` retreats by message groups (`compact.ts:243-290`) |\n| Post-compaction recovery | None (teaching version only keeps summary) | Auto re-read recent files, plans, agent/skill/tool context |\n| Circuit breaker | 3 times | 3 times (`autoCompact.ts:70`) |\n| Reactive retry | 1 time | CC has more granular tiered retries |\n\n### Execution Order Details\n\nThe real order in CC source `query.ts`:\n\n1. `applyToolResultBudget` (L379): persist large results first, ensuring full content is saved\n2. `snipCompact` (L403): trim middle messages\n3. `microcompact` (L414): old result placeholders\n4. `contextCollapse` (L441): independent context management system (not in teaching version)\n5. `autoCompact` (L454): LLM full summary\n\nThe teaching version's budget → snip → micro order matches this. The teaching version does not have the contextCollapse mechanism.\n\n### Full Constant Reference\n\n| Constant | Value | Source File |\n|----------|-------|-------------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| Time micro_compact interval | 60 minutes | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse and sessionMemoryCompact\n\nCC source code has two additional mechanisms not covered in this teaching version:\n\n- **contextCollapse**: An independent context management system that, when enabled, suppresses proactive autocompact (`autoCompact.ts:215-222`), with collapse's commit/blocking flow taking over context management. Manual `/compact` and reactive fallback remain independent paths, unaffected by contextCollapse.\n- **sessionMemoryCompact**: Before compact_history, CC first attempts a lightweight summary using existing session memory (covered in s09) without calling the LLM. This mechanism becomes clearer after learning s09.\n\n### What Does the Compression Prompt Look Like?\n\nCC's compression prompt has two hard requirements:\n\n1. **Absolutely no tool calls**: It begins with `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`, and appends another REMINDER at the end\n2. **Analyze first, then summarize**: The model must first reason in an `` tag, then output the formal summary in a `` tag. The analysis is stripped during formatting\n\n### Teaching Version Simplifications Are Intentional\n\n- micro_compact uses text placeholders → we don't have API-level `cache_edits` access\n- Tokens estimated via character count → precise tokenizers are out of scope\n- Post-compaction recovery omitted → teaching version only keeps summary, does not auto re-attach files\n- Two auxiliary mechanisms not covered → they fall in the 10% detail category\n\nThe core design principle, cheap first, expensive last, is fully preserved.\n\n
\n\n\n" + "content": "# s08: Context Compact — Context Will Fill Up, Have a Way to Make Room\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/en/s09) → s10 → ... → s20\n> *\"Context will fill up — have a way to make room\"* — Four-layer compression pipeline: cheap first, expensive last.\n>\n> **Harness Layer**: Compression — clean memory, unlimited sessions.\n\n---\n\n## The Problem\n\nThe agent is running along, then freezes.\n\nIt has bash, read, write — all the capabilities it needs. But it read a 1000-line file (~4000 tokens), then read 30 more files, ran 20 commands. Every command's output, every file's contents, all pile up in the `messages` list.\n\nThe context window is finite. Once full, the API outright rejects the call: `prompt_too_long`.\n\nWithout compression, an agent simply cannot work on large projects.\n\n---\n\n## The Solution\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.en.svg)\n\nThe hook structure, skill loading, and sub-Agent from s07 are preserved, with some tools omitted to focus on compaction. The core change: insert three pre-processors (0 API calls) before each LLM call, trigger an LLM summary (1 API call) when tokens still exceed the threshold, and emergency-trim if the API throws an error.\n\nCore design: cheap first, expensive last.\n\n---\n\n## How It Works\n\n![Four-layer compression pipeline](/course-assets/s08_context_compact/compaction-layers.en.svg)\n\n### L1: snip_compact — Trim Irrelevant Old Conversation\n\nThe agent ran 80 turns of conversation, accumulating 160 `messages`. The very first \"help me create hello.py\" is barely relevant to current work, yet it still occupies space.\n\nMessage count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle; the only extra boundary rule is that `assistant(tool_use)` must not be separated from the following `user(tool_result)`:\n\n```python\ndef snip_compact(messages, max_messages=50):\n if len(messages) <= max_messages:\n return messages\n head_end, tail_start = 3, len(messages) - (max_messages - 3)\n if _message_has_tool_use(messages[head_end - 1]):\n while head_end < len(messages) and _is_tool_result_message(messages[head_end]):\n head_end += 1\n if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n tail_start -= 1\n snipped = tail_start - head_end\n placeholder = {\"role\": \"user\", \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n return messages[:head_end] + [placeholder] + messages[tail_start:]\n```\n\nMessages are still trimmed directly; this just adds one boundary guard. `tool_result` content within remaining messages still keeps accumulating — message #34 may still hold 30KB of old file contents. → L2.\n\n### L2: micro_compact — Placeholder for Old Tool Results\n\n![Old results placeholder](/course-assets/s08_context_compact/micro-compact.en.svg)\n\nThe agent read 10 files consecutively. The full contents of reads 1–7 are still sitting in context, no longer needed, but hogging large amounts of space.\n\nKeep only the 3 most recent `tool_result` entries intact; replace older ones with a one-line placeholder:\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n tool_results = collect_tool_result_blocks(messages)\n if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n return messages\n for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n if len(block.get(\"content\", \"\")) > 120:\n block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n return messages\n```\n\nOld results are cleared, but a single new result can be 500KB — one `cat` of a large file can max out the context. → L3.\n\n### L3: tool_result_budget — Persist Large Results to Disk\n\n![Large results to disk](/course-assets/s08_context_compact/layer1-budget.en.svg)\n\nThe model read 5 large files in one go; all `tool_result` blocks in the last user message total 500KB.\n\nSum the size of all `tool_result` blocks in the last user message. If over 200KB → sort by size, starting from the largest, persist to `.task_outputs/tool-results/`, keeping only a `` marker + a 2000-character preview in context. The model sees the marker and knows the full content is on disk, re-reading it when needed.\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n last = messages[-1]\n blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n if b.get(\"type\") == \"tool_result\"]\n total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n if total <= max_bytes:\n return messages\n ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n for idx, block in ranked:\n if total <= max_bytes:\n break\n block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n total = recalculate_total(blocks)\n return messages\n```\n\nThe first three layers are all plain-text / structural operations — 0 API calls — but they cannot \"understand\" conversation content. Context may still be too large. → L4.\n\n### L4: compact_history — Full LLM Summary\n\n![Full LLM summary](/course-assets/s08_context_compact/auto-compact.en.svg)\n\nAll three previous layers have run, but after 30 minutes of continuous work on a huge project, tokens still exceed the threshold.\n\nThree-step process:\n\n1. **Save transcript**: Write the full conversation to `.transcripts/` in JSONL format. The transcript preserves a recoverable record, but the model's active context only contains the summary. For the model's current reasoning, the details are no longer in context. The teaching code does not provide a transcript retrieval tool.\n2. **LLM generates summary**: Send conversation history to the LLM, asking it to preserve key information: current goals, important findings, modified files, remaining work, user constraints, etc.\n3. **Replace message list**: All old messages are replaced with a single summary. The teaching version only keeps the summary; the real Claude Code re-attaches some recent files, plans, agent/skill/tool context after compaction.\n\n```python\ndef compact_history(messages):\n transcript_path = write_transcript(messages) # Save full conversation first\n summary = summarize_history(messages) # LLM generates summary\n return [{\"role\": \"user\",\n \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**Circuit breaker**: After 3 consecutive failures, stop retrying to prevent an infinite loop wasting API calls.\n\n### Reactive: reactive_compact\n\nSometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers.\n\nThis triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, but still avoids leaving an orphaned `tool_result`.\n\n```python\ndef reactive_compact(messages):\n transcript = write_transcript(messages)\n summary = summarize_history(messages)\n tail_start = max(0, len(messages) - 5)\n if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n tail_start -= 1\n return [{\"role\": \"user\",\n \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *messages[tail_start:]]\n```\n\nReactive compact has a retry limit (default 1). If it still fails, an exception is raised instead of looping forever. Full error recovery is deferred to s11.\n\n### Putting It All Together\n\n```python\ndef agent_loop(messages):\n reactive_retries = 0\n while True:\n # Three pre-processors (0 API calls)\n # Order: budget first, so large content is persisted before placeholders\n messages[:] = tool_result_budget(messages) # L3: persist large results\n messages[:] = snip_compact(messages) # L1: trim middle\n messages[:] = micro_compact(messages) # L2: old result placeholders\n\n # Still too much? LLM summary (1 API call)\n if estimate_token_count(messages) > THRESHOLD:\n messages[:] = compact_history(messages)\n\n try:\n response = client.messages.create(...)\n except PromptTooLongError:\n if reactive_retries < MAX_REACTIVE_RETRIES:\n messages[:] = reactive_compact(messages) # Emergency\n reactive_retries += 1\n continue\n raise # retry limit exceeded, raise exception\n # ... tool execution ...\n\n # compact tool: when the model actively calls it, triggers compact_history\n if block.name == \"compact\":\n messages[:] = compact_history(messages)\n results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n messages.append({\"role\": \"user\", \"content\": results})\n break # end current turn, start fresh with compacted context\n```\n\n**The order must not be swapped.** L3 (budget) runs before L2 (micro) because micro replaces old large tool_results with one-line placeholders — budget must persist the full content before that happens. This is why CC source puts `applyToolResultBudget` first.\n\n---\n\n## Changes From s07\n\n| Component | Before (s07) | After (s08) |\n|-----------|-------------|-------------|\n| Context management | None (context grows unbounded) | Four-layer compression pipeline + emergency |\n| New functions | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| Tools | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| Loop | LLM call → tool execution | Three pre-processors before each turn + threshold-triggered compact_history |\n| Design principle | — | Cheap first, expensive last |\n\n---\n\n## Try It\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\nTry these prompts:\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md` (read multiple files consecutively, observe L2 compressing old results)\n2. `Read every file in s08_context_compact/` (read a large amount of content at once, observe L3 persisting to disk)\n3. Chat for 20+ turns, observe whether `[auto compact]` or `[reactive compact]` appears\n\nWhat to watch for: After each tool execution, are old `tool_result` entries compressed? When tokens exceed the threshold after extended conversation, is summarization triggered automatically?\n\n---\n\n## What's Next\n\nContext compression lets an agent run for a long time without crashing. But after each compression, the preferences and constraints the user told it are also lost. Can we let the agent selectively remember important things?\n\ns09 Memory → three subsystems: choosing what to remember, extracting key information, consolidating and organizing. Across compressions, across sessions.\n\n
\nDeep Dive Into CC Source Code\n\n> The following is based on analysis of CC source code `compact.ts`, `autoCompact.ts`, `microCompact.ts`, and `query.ts`.\n\n### Execution Order Comparison\n\nThe teaching version labels layers L1/L2/L3/L4 for pedagogical clarity, but actual execution order does not match the numbering:\n\n| Dimension | Teaching Version | Claude Code |\n|-----------|-----------------|-------------|\n| Execution order | budget → snip → micro → auto | budget → snip → micro → collapse → auto (`query.ts:379-468`) |\n| snip_compact | Keep head 3 + tail 47 | CC only enables on main thread; implementation not in open-source repo (`HISTORY_SNIP` feature gate), but interface is visible: `snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`, also exposes `SnipTool` for model-initiated snipping. Teaching version's 3/47 are simplified parameters |\n| micro_compact | Text placeholder replacement | Two paths: time-based clears content directly, cached uses API `cache_edits` (legacy path removed) |\n| micro_compact whitelist | By position (most recent 3) | time-based triggers by time threshold; cached triggers by count (`microCompact.ts`) |\n| tool_result_budget | 200KB characters | 200,000 characters (`toolLimits.ts:49`) |\n| compact_history threshold | Character count estimate | Precise tokens: `contextWindow - maxOutputTokens - 13_000` |\n| Summary requirements | 5 categories of info | 9 sections + ``/`` dual tags |\n| Compression prompt | Simple prompt | Double-ended hard guardrails forbidding tool calls |\n| PTL retry | Yes (simplified) | `truncateHeadForPTLRetry()` retreats by message groups (`compact.ts:243-290`) |\n| Post-compaction recovery | None (teaching version only keeps summary) | Auto re-read recent files, plans, agent/skill/tool context |\n| Circuit breaker | 3 times | 3 times (`autoCompact.ts:70`) |\n| Reactive retry | 1 time | CC has more granular tiered retries |\n\n### Execution Order Details\n\nThe real order in CC source `query.ts`:\n\n1. `applyToolResultBudget` (L379): persist large results first, ensuring full content is saved\n2. `snipCompact` (L403): trim middle messages\n3. `microcompact` (L414): old result placeholders\n4. `contextCollapse` (L441): independent context management system (not in teaching version)\n5. `autoCompact` (L454): LLM full summary\n\nThe teaching version's budget → snip → micro order matches this. The teaching version does not have the contextCollapse mechanism.\n\n### Full Constant Reference\n\n| Constant | Value | Source File |\n|----------|-------|-------------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| Time micro_compact interval | 60 minutes | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse and sessionMemoryCompact\n\nCC source code has two additional mechanisms not covered in this teaching version:\n\n- **contextCollapse**: An independent context management system that, when enabled, suppresses proactive autocompact (`autoCompact.ts:215-222`), with collapse's commit/blocking flow taking over context management. Manual `/compact` and reactive fallback remain independent paths, unaffected by contextCollapse.\n- **sessionMemoryCompact**: Before compact_history, CC first attempts a lightweight summary using existing session memory (covered in s09) without calling the LLM. This mechanism becomes clearer after learning s09.\n\n### What Does the Compression Prompt Look Like?\n\nCC's compression prompt has two hard requirements:\n\n1. **Absolutely no tool calls**: It begins with `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`, and appends another REMINDER at the end\n2. **Analyze first, then summarize**: The model must first reason in an `` tag, then output the formal summary in a `` tag. The analysis is stripped during formatting\n\n### Teaching Version Simplifications Are Intentional\n\n- micro_compact uses text placeholders → we don't have API-level `cache_edits` access\n- Tokens estimated via character count → precise tokenizers are out of scope\n- Post-compaction recovery omitted → teaching version only keeps summary, does not auto re-attach files\n- Two auxiliary mechanisms not covered → they fall in the 10% detail category\n\nThe core design principle, cheap first, expensive last, is fully preserved.\n\n
\n\n\n" }, { "version": "s08", "locale": "zh", "title": "s08: Context Compact — 上下文总会满,要有办法腾地方", - "content": "# s08: Context Compact — 上下文总会满,要有办法腾地方\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/zh/s09) → s10 → ... → s20\n> *\"上下文总会满, 要有办法腾地方\"* — 四层压缩策略, 便宜的先跑贵的后跑。\n>\n> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。\n\n---\n\n## 问题\n\nAgent 跑着跑着,不动了。\n\n手里有 bash、有 read、有 write,能力是够的。但它读了一个 1000 行的文件(~4000 token),又读了 30 个文件,跑了 20 条命令。每条命令的输出、每个文件的内容,全都堆在 `messages` 列表里。\n\n上下文窗口是有限的。满了之后,API 直接拒绝:`prompt_too_long`。\n\n不压缩,Agent 根本没法在大项目里干活。\n\n---\n\n## 解决方案\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.svg)\n\n保留 s07 的 hook 结构、技能加载、子 Agent 等骨架,省略部分工具细节以聚焦压缩。核心变动:每轮 LLM 调用前插入三层预处理器(0 API),token 仍超阈值时触发 LLM 摘要(1 API),API 报错时应急裁剪。\n\n核心设计:便宜的先跑,贵的后跑。\n\n---\n\n## 工作原理\n\n![四层压缩管线](/course-assets/s08_context_compact/compaction-layers.svg)\n\n### L1: snip_compact — 裁掉无关的旧对话\n\nAgent 跑了 80 轮对话,`messages` 攒了 160 条。最前面的\"帮我创建 hello.py\"和当前工作几乎无关了,但全占着位置。\n\n消息数超过 50 条 → 保留头部 3 条(初始上下文)和尾部 47 条(当前工作),中间裁掉:\n\n```python\ndef snip_compact(messages, max_messages=50):\n if len(messages) <= max_messages:\n return messages\n keep_head, keep_tail = 3, max_messages - 3\n snipped = len(messages) - keep_head - keep_tail\n placeholder = {\"role\": \"user\",\n \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n return messages[:keep_head] + [placeholder] + messages[-keep_tail:]\n```\n\n裁掉了整条消息,但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。\n\n### L2: micro_compact — 旧工具结果占位\n\n![旧结果占位](/course-assets/s08_context_compact/micro-compact.svg)\n\nAgent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里,早就不需要了,但占着大量空间。\n\n只保留最近 3 条 `tool_result` 的完整内容,更旧的替换为一行占位符:\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n tool_results = collect_tool_result_blocks(messages)\n if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n return messages\n for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n if len(block.get(\"content\", \"\")) > 120:\n block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n return messages\n```\n\n旧结果清掉了,但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。\n\n### L3: tool_result_budget — 大结果落盘\n\n![大结果落盘](/course-assets/s08_context_compact/layer1-budget.svg)\n\n模型一次读了 5 个大文件,单条 user 消息里所有 `tool_result` 加起来 500KB。\n\n统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序,从最大的开始落盘到 `.task_outputs/tool-results/`,上下文里只留 `` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上,需要时可以重新读。\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n last = messages[-1]\n blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n if b.get(\"type\") == \"tool_result\"]\n total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n if total <= max_bytes:\n return messages\n ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n for idx, block in ranked:\n if total <= max_bytes:\n break\n block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n total = recalculate_total(blocks)\n return messages\n```\n\n前三层都是纯文本/结构操作,0 API 调用,但也无法\"理解\"对话内容。上下文可能仍然太大。→ L4。\n\n### L4: compact_history — LLM 全量摘要\n\n![LLM 全量摘要](/course-assets/s08_context_compact/auto-compact.svg)\n\n前三层全跑完了,但在超大项目中连续工作 30 分钟后,token 仍然超过阈值。\n\n三步流程:\n\n1. **保存 transcript**:完整对话写入 `.transcripts/`,JSONL 格式。transcript 保留了可恢复记录,但模型的活跃上下文里只剩摘要。对模型当下推理来说,细节已经不在上下文中了。教学代码没有提供 transcript 检索工具。\n2. **LLM 生成摘要**:把对话历史发给 LLM,要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。\n3. **替换消息列表**:所有旧消息被替换为一条摘要。教学版只保留摘要;真实 Claude Code 会在 compact 后重新附加部分最近文件、计划、agent/skill/tool 等上下文。\n\n```python\ndef compact_history(messages):\n transcript_path = write_transcript(messages) # 先保存完整对话\n summary = summarize_history(messages) # LLM 生成摘要\n return [{\"role\": \"user\",\n \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**熔断器**:连续失败 3 次后停止重试,防止死循环浪费 API 调用。\n\n### 应急: reactive_compact\n\n有时候 API 还是返回 `prompt_too_long`(413),上下文增长速度快于压缩触发速度时。\n\n这时触发 **reactive_compact**:比 compact_history 更激进,从尾部回退,以字节级精度裁剪到 API 可接受的大小,只保留最后 5 条消息 + 摘要。\n\n```python\ndef reactive_compact(messages):\n transcript = write_transcript(messages)\n summary = summarize_history(messages)\n tail = messages[-5:]\n return [{\"role\": \"user\",\n \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *tail]\n```\n\nreactive compact 有重试上限(默认 1 次)。再失败就抛出异常,不无限循环。完整的错误恢复逻辑留给 s11。\n\n### 合起来跑\n\n```python\ndef agent_loop(messages):\n reactive_retries = 0\n while True:\n # 三个预处理器(0 API 调用)\n # 顺序:budget 先跑,确保大内容落盘后再做占位和裁剪\n messages[:] = tool_result_budget(messages) # L3: 大结果落盘\n messages[:] = snip_compact(messages) # L1: 裁中间\n messages[:] = micro_compact(messages) # L2: 旧结果占位\n\n # 还不够?LLM 摘要(1 API 调用)\n if estimate_token_count(messages) > THRESHOLD:\n messages[:] = compact_history(messages)\n\n try:\n response = client.messages.create(...)\n except PromptTooLongError:\n if reactive_retries < MAX_REACTIVE_RETRIES:\n messages[:] = reactive_compact(messages) # 应急\n reactive_retries += 1\n continue\n raise # 超过重试上限,抛出异常\n # ... 工具执行 ...\n\n # compact 工具:模型主动调用时触发 compact_history\n if block.name == \"compact\":\n messages[:] = compact_history(messages)\n results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n messages.append({\"role\": \"user\", \"content\": results})\n break # 结束当前 turn,用压缩后的上下文开始新一轮\n```\n\n**顺序不能换。** L3(budget)在 L2(micro)前面,因为 micro 会把旧的大 tool_result 替换成一行占位符,budget 必须在那之前把完整内容落盘。这也是为什么 CC 源码把 `applyToolResultBudget` 放在最前面。\n\n---\n\n## 相对 s07 的变更\n\n| 组件 | 之前 (s07) | 之后 (s08) |\n|------|-----------|-----------|\n| 上下文管理 | 无(上下文无限膨胀) | 四层压缩管线 + 应急 |\n| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history |\n| 设计原则 | — | 便宜的先跑,贵的后跑 |\n\n---\n\n## 试一下\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\n试试这些 prompt:\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(连续读多个文件,观察 L2 压缩旧结果)\n2. `Read every file in s08_context_compact/`(一次性读大量内容,观察 L3 落盘)\n3. 反复对话 20+ 轮,观察是否出现 `[auto compact]` 或 `[reactive compact]`\n\n观察重点:每次工具执行后,旧 tool_result 是否被压缩?连续对话后 token 超阈值时,是否自动触发了摘要?\n\n---\n\n## 接下来\n\n上下文压缩让 Agent 能跑很久不会崩。但每次压缩后,用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent 有选择地记住重要的事?\n\ns09 Memory → 三个子系统:选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。\n\n
\n深入 CC 源码\n\n> 以下基于 CC 源码 `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` 的分析。\n\n### 执行顺序对照\n\n教学版为了讲解方便按 L1/L2/L3/L4 编号,但实际执行顺序和编号不完全对应:\n\n| 维度 | 教学版 | Claude Code |\n|------|--------|-------------|\n| 执行顺序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto(`query.ts:379-468`) |\n| snip_compact | 保留头 3 + 尾 47 | CC 仅主线程启用;实现不在开源仓库中(`HISTORY_SNIP` feature gate),但接口可见:`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`,还暴露了 `SnipTool` 工具让模型主动调用。教学版的 3/47 是简化参数 |\n| micro_compact | 文本占位符替换 | 两条路径:time-based 直接清内容,cached 走 API `cache_edits`(legacy path 已移除) |\n| micro_compact 白名单 | 按位置(最近 3 条) | time-based 按时间阈值触发;cached 按计数触发(`microCompact.ts`) |\n| tool_result_budget | 200KB 字符 | 200,000 字符(`toolLimits.ts:49`) |\n| compact_history 阈值 | 字符数估算 | 精确 token:`contextWindow - maxOutputTokens - 13_000` |\n| 摘要要求 | 5 类信息 | 9 个部分 + ``/`` 双标签 |\n| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 |\n| PTL retry | 有(简化) | `truncateHeadForPTLRetry()` 按消息组回退(`compact.ts:243-290`) |\n| 后压缩恢复 | 无(教学版只保留摘要) | 自动重新读取最近文件、计划、agent/skill/tool 等 |\n| 熔断器 | 3 次 | 3 次(`autoCompact.ts:70`) |\n| reactive 重试 | 1 次 | CC 有更精细的分级重试 |\n\n### 执行顺序详解\n\nCC 源码 `query.ts` 中的真实顺序:\n\n1. `applyToolResultBudget`(L379):先处理大结果,确保完整内容落盘\n2. `snipCompact`(L403):裁中间消息\n3. `microcompact`(L414):旧结果占位\n4. `contextCollapse`(L441):独立的上下文管理系统(教学版无)\n5. `autoCompact`(L454):LLM 全量摘要\n\n教学版的 budget → snip → micro 顺序与此一致。教学版没有 contextCollapse 机制。\n\n### 完整常量参考\n\n| 常量 | 值 | 源文件 |\n|------|-----|--------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse 和 sessionMemoryCompact\n\nCC 源码中还有两个机制本教学版没有展开:\n\n- **contextCollapse**:独立的上下文管理系统,启用时抑制 proactive autocompact(`autoCompact.ts:215-222`),由 collapse 的 commit/blocking 流程接管上下文管理。但 manual `/compact` 和 reactive fallback 仍是独立路径,不受 contextCollapse 影响。\n- **sessionMemoryCompact**:compact_history 之前,CC 会先尝试用已有的 session memory(s09 会讲到)做轻量摘要,不调 LLM。这个机制等学完 s09 之后回头看会更清楚。\n\n### 压缩 prompt 长什么样?\n\nCC 的压缩 prompt 有两个硬性要求:\n\n1. **绝对禁止调用工具**:开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`,末尾还会再 REMINDER 一次\n2. **先分析再总结**:模型需要先在 `` 标签里理清思路,然后在 `` 标签里输出正式摘要。analysis 在格式化时被剥离\n\n### 教学版的简化是刻意的\n\n- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限\n- token 用字符数估算 → 精确 tokenizer 不在教学范围内\n- 后压缩恢复省略 → 教学版只保留摘要,不自动重新附加文件\n- 两个辅助机制不展开 → 属于 10% 的细节\n\n核心设计思想,便宜的先跑贵的后跑,完整保留。\n\n
\n\n\n" + "content": "# s08: Context Compact — 上下文总会满,要有办法腾地方\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/zh/s09) → s10 → ... → s20\n> *\"上下文总会满, 要有办法腾地方\"* — 四层压缩策略, 便宜的先跑贵的后跑。\n>\n> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。\n\n---\n\n## 问题\n\nAgent 跑着跑着,不动了。\n\n手里有 bash、有 read、有 write,能力是够的。但它读了一个 1000 行的文件(~4000 token),又读了 30 个文件,跑了 20 条命令。每条命令的输出、每个文件的内容,全都堆在 `messages` 列表里。\n\n上下文窗口是有限的。满了之后,API 直接拒绝:`prompt_too_long`。\n\n不压缩,Agent 根本没法在大项目里干活。\n\n---\n\n## 解决方案\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.svg)\n\n保留 s07 的 hook 结构、技能加载、子 Agent 等骨架,省略部分工具细节以聚焦压缩。核心变动:每轮 LLM 调用前插入三层预处理器(0 API),token 仍超阈值时触发 LLM 摘要(1 API),API 报错时应急裁剪。\n\n核心设计:便宜的先跑,贵的后跑。\n\n---\n\n## 工作原理\n\n![四层压缩管线](/course-assets/s08_context_compact/compaction-layers.svg)\n\n### L1: snip_compact — 裁掉无关的旧对话\n\nAgent 跑了 80 轮对话,`messages` 攒了 160 条。最前面的\"帮我创建 hello.py\"和当前工作几乎无关了,但全占着位置。\n\n消息数超过 50 条 → 保留头部 3 条(初始上下文)和尾部 47 条(当前工作),中间裁掉;唯一额外边界条件是,不能把 `assistant(tool_use)` 和后面的 `user(tool_result)` 拆开:\n\n```python\ndef snip_compact(messages, max_messages=50):\n if len(messages) <= max_messages:\n return messages\n head_end, tail_start = 3, len(messages) - (max_messages - 3)\n if _message_has_tool_use(messages[head_end - 1]):\n while head_end < len(messages) and _is_tool_result_message(messages[head_end]):\n head_end += 1\n if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n tail_start -= 1\n snipped = tail_start - head_end\n placeholder = {\"role\": \"user\", \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n return messages[:head_end] + [placeholder] + messages[tail_start:]\n```\n\n裁掉的是消息本身,只是在切口处多做一步保护;剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。\n\n### L2: micro_compact — 旧工具结果占位\n\n![旧结果占位](/course-assets/s08_context_compact/micro-compact.svg)\n\nAgent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里,早就不需要了,但占着大量空间。\n\n只保留最近 3 条 `tool_result` 的完整内容,更旧的替换为一行占位符:\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n tool_results = collect_tool_result_blocks(messages)\n if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n return messages\n for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n if len(block.get(\"content\", \"\")) > 120:\n block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n return messages\n```\n\n旧结果清掉了,但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。\n\n### L3: tool_result_budget — 大结果落盘\n\n![大结果落盘](/course-assets/s08_context_compact/layer1-budget.svg)\n\n模型一次读了 5 个大文件,单条 user 消息里所有 `tool_result` 加起来 500KB。\n\n统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序,从最大的开始落盘到 `.task_outputs/tool-results/`,上下文里只留 `` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上,需要时可以重新读。\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n last = messages[-1]\n blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n if b.get(\"type\") == \"tool_result\"]\n total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n if total <= max_bytes:\n return messages\n ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n for idx, block in ranked:\n if total <= max_bytes:\n break\n block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n total = recalculate_total(blocks)\n return messages\n```\n\n前三层都是纯文本/结构操作,0 API 调用,但也无法\"理解\"对话内容。上下文可能仍然太大。→ L4。\n\n### L4: compact_history — LLM 全量摘要\n\n![LLM 全量摘要](/course-assets/s08_context_compact/auto-compact.svg)\n\n前三层全跑完了,但在超大项目中连续工作 30 分钟后,token 仍然超过阈值。\n\n三步流程:\n\n1. **保存 transcript**:完整对话写入 `.transcripts/`,JSONL 格式。transcript 保留了可恢复记录,但模型的活跃上下文里只剩摘要。对模型当下推理来说,细节已经不在上下文中了。教学代码没有提供 transcript 检索工具。\n2. **LLM 生成摘要**:把对话历史发给 LLM,要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。\n3. **替换消息列表**:所有旧消息被替换为一条摘要。教学版只保留摘要;真实 Claude Code 会在 compact 后重新附加部分最近文件、计划、agent/skill/tool 等上下文。\n\n```python\ndef compact_history(messages):\n transcript_path = write_transcript(messages) # 先保存完整对话\n summary = summarize_history(messages) # LLM 生成摘要\n return [{\"role\": \"user\",\n \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**熔断器**:连续失败 3 次后停止重试,防止死循环浪费 API 调用。\n\n### 应急: reactive_compact\n\n有时候 API 还是返回 `prompt_too_long`(413),上下文增长速度快于压缩触发速度时。\n\n这时触发 **reactive_compact**:比 compact_history 更激进,从尾部回退,但仍要避免留下孤立 `tool_result`。\n\n```python\ndef reactive_compact(messages):\n transcript = write_transcript(messages)\n summary = summarize_history(messages)\n tail_start = max(0, len(messages) - 5)\n if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n tail_start -= 1\n return [{\"role\": \"user\",\n \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *messages[tail_start:]]\n```\n\nreactive compact 有重试上限(默认 1 次)。再失败就抛出异常,不无限循环。完整的错误恢复逻辑留给 s11。\n\n### 合起来跑\n\n```python\ndef agent_loop(messages):\n reactive_retries = 0\n while True:\n # 三个预处理器(0 API 调用)\n # 顺序:budget 先跑,确保大内容落盘后再做占位和裁剪\n messages[:] = tool_result_budget(messages) # L3: 大结果落盘\n messages[:] = snip_compact(messages) # L1: 裁中间\n messages[:] = micro_compact(messages) # L2: 旧结果占位\n\n # 还不够?LLM 摘要(1 API 调用)\n if estimate_token_count(messages) > THRESHOLD:\n messages[:] = compact_history(messages)\n\n try:\n response = client.messages.create(...)\n except PromptTooLongError:\n if reactive_retries < MAX_REACTIVE_RETRIES:\n messages[:] = reactive_compact(messages) # 应急\n reactive_retries += 1\n continue\n raise # 超过重试上限,抛出异常\n # ... 工具执行 ...\n\n # compact 工具:模型主动调用时触发 compact_history\n if block.name == \"compact\":\n messages[:] = compact_history(messages)\n results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n messages.append({\"role\": \"user\", \"content\": results})\n break # 结束当前 turn,用压缩后的上下文开始新一轮\n```\n\n**顺序不能换。** L3(budget)在 L2(micro)前面,因为 micro 会把旧的大 tool_result 替换成一行占位符,budget 必须在那之前把完整内容落盘。这也是为什么 CC 源码把 `applyToolResultBudget` 放在最前面。\n\n---\n\n## 相对 s07 的变更\n\n| 组件 | 之前 (s07) | 之后 (s08) |\n|------|-----------|-----------|\n| 上下文管理 | 无(上下文无限膨胀) | 四层压缩管线 + 应急 |\n| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| 工具 | bash, read, write, edit, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history |\n| 设计原则 | — | 便宜的先跑,贵的后跑 |\n\n---\n\n## 试一下\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\n试试这些 prompt:\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(连续读多个文件,观察 L2 压缩旧结果)\n2. `Read every file in s08_context_compact/`(一次性读大量内容,观察 L3 落盘)\n3. 反复对话 20+ 轮,观察是否出现 `[auto compact]` 或 `[reactive compact]`\n\n观察重点:每次工具执行后,旧 tool_result 是否被压缩?连续对话后 token 超阈值时,是否自动触发了摘要?\n\n---\n\n## 接下来\n\n上下文压缩让 Agent 能跑很久不会崩。但每次压缩后,用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent 有选择地记住重要的事?\n\ns09 Memory → 三个子系统:选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。\n\n
\n深入 CC 源码\n\n> 以下基于 CC 源码 `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` 的分析。\n\n### 执行顺序对照\n\n教学版为了讲解方便按 L1/L2/L3/L4 编号,但实际执行顺序和编号不完全对应:\n\n| 维度 | 教学版 | Claude Code |\n|------|--------|-------------|\n| 执行顺序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto(`query.ts:379-468`) |\n| snip_compact | 保留头 3 + 尾 47 | CC 仅主线程启用;实现不在开源仓库中(`HISTORY_SNIP` feature gate),但接口可见:`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`,还暴露了 `SnipTool` 工具让模型主动调用。教学版的 3/47 是简化参数 |\n| micro_compact | 文本占位符替换 | 两条路径:time-based 直接清内容,cached 走 API `cache_edits`(legacy path 已移除) |\n| micro_compact 白名单 | 按位置(最近 3 条) | time-based 按时间阈值触发;cached 按计数触发(`microCompact.ts`) |\n| tool_result_budget | 200KB 字符 | 200,000 字符(`toolLimits.ts:49`) |\n| compact_history 阈值 | 字符数估算 | 精确 token:`contextWindow - maxOutputTokens - 13_000` |\n| 摘要要求 | 5 类信息 | 9 个部分 + ``/`` 双标签 |\n| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 |\n| PTL retry | 有(简化) | `truncateHeadForPTLRetry()` 按消息组回退(`compact.ts:243-290`) |\n| 后压缩恢复 | 无(教学版只保留摘要) | 自动重新读取最近文件、计划、agent/skill/tool 等 |\n| 熔断器 | 3 次 | 3 次(`autoCompact.ts:70`) |\n| reactive 重试 | 1 次 | CC 有更精细的分级重试 |\n\n### 执行顺序详解\n\nCC 源码 `query.ts` 中的真实顺序:\n\n1. `applyToolResultBudget`(L379):先处理大结果,确保完整内容落盘\n2. `snipCompact`(L403):裁中间消息\n3. `microcompact`(L414):旧结果占位\n4. `contextCollapse`(L441):独立的上下文管理系统(教学版无)\n5. `autoCompact`(L454):LLM 全量摘要\n\n教学版的 budget → snip → micro 顺序与此一致。教学版没有 contextCollapse 机制。\n\n### 完整常量参考\n\n| 常量 | 值 | 源文件 |\n|------|-----|--------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse 和 sessionMemoryCompact\n\nCC 源码中还有两个机制本教学版没有展开:\n\n- **contextCollapse**:独立的上下文管理系统,启用时抑制 proactive autocompact(`autoCompact.ts:215-222`),由 collapse 的 commit/blocking 流程接管上下文管理。但 manual `/compact` 和 reactive fallback 仍是独立路径,不受 contextCollapse 影响。\n- **sessionMemoryCompact**:compact_history 之前,CC 会先尝试用已有的 session memory(s09 会讲到)做轻量摘要,不调 LLM。这个机制等学完 s09 之后回头看会更清楚。\n\n### 压缩 prompt 长什么样?\n\nCC 的压缩 prompt 有两个硬性要求:\n\n1. **绝对禁止调用工具**:开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`,末尾还会再 REMINDER 一次\n2. **先分析再总结**:模型需要先在 `` 标签里理清思路,然后在 `` 标签里输出正式摘要。analysis 在格式化时被剥离\n\n### 教学版的简化是刻意的\n\n- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限\n- token 用字符数估算 → 精确 tokenizer 不在教学范围内\n- 后压缩恢复省略 → 教学版只保留摘要,不自动重新附加文件\n- 两个辅助机制不展开 → 属于 10% 的细节\n\n核心设计思想,便宜的先跑贵的后跑,完整保留。\n\n
\n\n\n" }, { "version": "s08", "locale": "ja", "title": "s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要", - "content": "# s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/ja/s09) → s10 → ... → s20\n> *\"Context will fill up — have a way to make room\"* — 4層圧縮戦略、安価なものを先に、高価なものを後に実行。\n>\n> **Harness レイヤー**: 圧縮 — クリーンな記憶、無限のセッション。\n\n---\n\n## 課題\n\nAgent が動いている途中で、止まってしまう。\n\nbash、read、write は揃っており、能力は十分。しかし 1000 行のファイル(~4000 token)を読み、さらに 30 のファイルを読み、20 のコマンドを実行したとします。各コマンドの出力、各ファイルの内容がすべて `messages` リストに蓄積されます。\n\nコンテキストウィンドウには上限があります。満杯になると、API は即座に拒否します:`prompt_too_long`。\n\n圧縮しなければ、Agent は大規模プロジェクトではまともに動けません。\n\n---\n\n## ソリューション\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.ja.svg)\n\ns07 のフック構造、スキルロード、サブ Agent の骨格を維持し、圧縮に焦点を当てるため一部のツールは省略。コアの変更点:各 LLM 呼び出し前に 3 層のプリプロセッサ(0 API)を挿入し、token が閾値を超えた場合は LLM 要約(1 API)をトリガー、API エラー時には緊急トリムを実行。\n\nコア設計:安価なものを先に、高価なものを後に。\n\n---\n\n## 仕組み\n\n![4層圧縮パイプライン](/course-assets/s08_context_compact/compaction-layers.ja.svg)\n\n### L1: snip_compact — 無関係な古い会話を切り捨て\n\nAgent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。\n\nメッセージ数が 50 を超えた場合 → 先頭 3 件(初期コンテキスト)と末尾 47 件(現在の作業)を保持し、中間を切り捨て:\n\n```python\ndef snip_compact(messages, max_messages=50):\n if len(messages) <= max_messages:\n return messages\n keep_head, keep_tail = 3, max_messages - 3\n snipped = len(messages) - keep_head - keep_tail\n placeholder = {\"role\": \"user\",\n \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n return messages[:keep_head] + [placeholder] + messages[-keep_tail:]\n```\n\nメッセージ全体は切り捨てたが、残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。\n\n### L2: micro_compact — 古いツール結果をプレースホルダに置換\n\n![古い結果のプレースホルダ](/course-assets/s08_context_compact/micro-compact.ja.svg)\n\nAgent が連続して 10 個のファイルを読んだ。1〜7 回目の完全な内容はまだコンテキストに残っており、もう不要だが、大量のスペースを占有している。\n\n直近 3 件の `tool_result` の完全な内容のみを保持し、それより古いものは 1 行のプレースホルダに置換:\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n tool_results = collect_tool_result_blocks(messages)\n if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n return messages\n for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n if len(block.get(\"content\", \"\")) > 120:\n block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n return messages\n```\n\n古い結果はクリーンアップされたが、1 件の新しい結果だけで 500KB の可能性がある。大きなファイルを `cat` するだけでコンテキストがいっぱいになる。→ L3。\n\n### L3: tool_result_budget — 大きな結果をディスクに退避\n\n![大きな結果のディスク退避](/course-assets/s08_context_compact/layer1-budget.ja.svg)\n\nモデルが一度に 5 つの大きなファイルを読み、1 つの user メッセージ内の全 `tool_result` の合計が 500KB に達した。\n\n最後の user メッセージ内のすべての `tool_result` の合計サイズを集計。200KB を超えた場合 → サイズ順にソートし、最大のものから順に `.task_outputs/tool-results/` に退避。コンテキストには `` マーカー + 先頭 2000 文字のプレビューのみを残す。モデルはマーカーを見て完全な内容がディスク上にあることを認識し、必要に応じて再読み込みできる。\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n last = messages[-1]\n blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n if b.get(\"type\") == \"tool_result\"]\n total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n if total <= max_bytes:\n return messages\n ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n for idx, block in ranked:\n if total <= max_bytes:\n break\n block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n total = recalculate_total(blocks)\n return messages\n```\n\n最初の 3 層はすべて純粋なテキスト/構造操作(0 API 呼び出し)だが、会話内容を「理解」することはできない。コンテキストがまだ大きすぎる可能性がある。→ L4。\n\n### L4: compact_history — LLM 全量要約\n\n![LLM 全量要約](/course-assets/s08_context_compact/auto-compact.ja.svg)\n\n最初の 3 層がすべて実行されたが、超大規模プロジェクトで 30 分間連続作業すると、token がまだ閾値を超えている。\n\n3 ステップのフロー:\n\n1. **transcript を保存**:完全な会話を `.transcripts/` に JSONL 形式で書き出す。transcript は回復可能な記録として保存されるが、モデルのアクティブなコンテキストには要約しか残らない。モデルの現在の推論にとって、詳細はすでにコンテキストにない。教学コードは transcript 検索ツールを提供しない。\n2. **LLM で要約を生成**:会話履歴を LLM に送り、現在の目標、重要な発見、変更済みファイル、残りの作業、ユーザーの制約などの重要な情報を保持するよう指示。\n3. **メッセージリストを置換**:すべての古いメッセージが 1 件の要約に置き換えられる。教学版は要約のみを保持する。実際の Claude Code は compact 後に直近のファイル、計画、agent/skill/tool などのコンテキストを再付加する。\n\n```python\ndef compact_history(messages):\n transcript_path = write_transcript(messages) # 先に完全な会話を保存\n summary = summarize_history(messages) # LLM で要約を生成\n return [{\"role\": \"user\",\n \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**サーキットブレーカー**:連続 3 回失敗したらリトライを停止し、無限ループによる API 呼び出しの浪費を防止。\n\n### 緊急: reactive_compact\n\nAPI がまだ `prompt_too_long`(413)を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。\n\nこの時 **reactive_compact** がトリガーされる:compact_history よりもさらに積極的で、末尾からバイト単位の精度で API が受け入れ可能なサイズまで切り詰め、最後の 5 件のメッセージ + 要約のみを保持。\n\n```python\ndef reactive_compact(messages):\n transcript = write_transcript(messages)\n summary = summarize_history(messages)\n tail = messages[-5:]\n return [{\"role\": \"user\",\n \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *tail]\n```\n\nreactive compact にはリトライ上限がある(デフォルト 1 回)。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。\n\n### 合わせて実行\n\n```python\ndef agent_loop(messages):\n reactive_retries = 0\n while True:\n # 3 つのプリプロセッサ(0 API 呼び出し)\n # 順序:budget を先に実行し、大きな内容をプレースホルダ化する前に退避\n messages[:] = tool_result_budget(messages) # L3: 大きな結果を退避\n messages[:] = snip_compact(messages) # L1: 中間を切り捨て\n messages[:] = micro_compact(messages) # L2: 古い結果をプレースホルダに\n\n # まだ足りない?LLM 要約(1 API 呼び出し)\n if estimate_token_count(messages) > THRESHOLD:\n messages[:] = compact_history(messages)\n\n try:\n response = client.messages.create(...)\n except PromptTooLongError:\n if reactive_retries < MAX_REACTIVE_RETRIES:\n messages[:] = reactive_compact(messages) # 緊急対応\n reactive_retries += 1\n continue\n raise # リトライ上限超過、例外をスロー\n # ... ツール実行 ...\n\n # compact ツール:モデルが能動的に呼び出した場合、compact_history をトリガー\n if block.name == \"compact\":\n messages[:] = compact_history(messages)\n results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n messages.append({\"role\": \"user\", \"content\": results})\n break # 現在のターンを終了し、圧縮後のコンテキストで新しく開始\n```\n\n**順序は変えられない。** L3(budget)が L2(micro)の前に実行される理由:micro は古い大きな tool_result を 1 行のプレースホルダに置換するため、budget はその前に完全な内容を退避させる必要がある。CC ソースが `applyToolResultBudget` を最初に配置する理由も同じ。\n\n---\n\n## s07 からの変更点\n\n| コンポーネント | 変更前 (s07) | 変更後 (s08) |\n|------|-----------|-----------|\n| コンテキスト管理 | なし(コンテキストが無限に膨張) | 4 層圧縮パイプライン + 緊急対応 |\n| 新規関数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| ツール | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| ループ | LLM 呼び出し → ツール実行 | 各ラウンド前に 3 層プリプロセッサを実行 + 閾値で compact_history をトリガー |\n| 設計原則 | — | 安価なものを先に、高価なものを後に |\n\n---\n\n## 試してみよう\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\n以下のプロンプトを試してみてください:\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(連続して複数のファイルを読み、L2 の古い結果圧縮を観察)\n2. `Read every file in s08_context_compact/`(一度に大量の内容を読み込み、L3 のディスク退避を観察)\n3. 20+ ラウンドの対話を繰り返し、`[auto compact]` または `[reactive compact]` が表示されるか観察\n\n観察のポイント:ツール実行のたびに、古い tool_result は圧縮されているか?連続対話で token が閾値を超えたとき、要約が自動的にトリガーされたか?\n\n---\n\n## 次へ\n\nコンテキスト圧縮により、Agent は長時間クラッシュせずに動けるようになった。しかし、圧縮のたびにユーザーが以前に伝えた偏好や制約も一緒に失われてしまう。Agent が重要なことを選択的に記憶できるようにできないか?\n\ns09 Memory → 3 つのサブシステム:何を記憶するかの選択、重要情報の抽出、整理と統合。圧縮を越え、セッションを越えて。\n\n
\nCC ソースコードの詳細\n\n> 以下は CC ソースコード `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` の分析に基づく。\n\n### 実行順序の対応\n\n教学版は説明の便宜上 L1/L2/L3/L4 と番号を振っているが、実際の実行順序は番号と完全には一致しない:\n\n| 項目 | 教学版 | Claude Code |\n|------|--------|-------------|\n| 実行順序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto(`query.ts:379-468`) |\n| snip_compact | 先頭 3 + 末尾 47 を保持 | CC はメインスレッドのみ有効;実装はオープンソースリポジトリにない(`HISTORY_SNIP` feature gate)、インターフェースは確認可能:`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`、`SnipTool` もモデルが能動的に呼び出し可能。教学版の 3/47 は簡略パラメータ |\n| micro_compact | テキストプレースホルダで置換 | 2 つのパス:time-based は直接内容をクリア、cached は API の `cache_edits` を使用(legacy パスは削除済み) |\n| micro_compact ホワイトリスト | 位置による(直近 3 件) | time-based は時間閾値でトリガー、cached はカウントでトリガー(`microCompact.ts`) |\n| tool_result_budget | 200KB 文字 | 200,000 文字(`toolLimits.ts:49`) |\n| compact_history 閾値 | 文字数で推定 | 精密な token 数:`contextWindow - maxOutputTokens - 13_000` |\n| 要約の要求 | 5 種類の情報 | 9 つのセクション + ``/`` デュアルタグ |\n| 圧縮プロンプト | シンプルなプロンプト | 先頭と末尾に二重の安全ガードでツール呼び出しを禁止 |\n| PTL retry | あり(簡略版) | `truncateHeadForPTLRetry()` がメッセージグループ単位でロールバック(`compact.ts:243-290`) |\n| 圧縮後のリカバリ | なし(教学版は要約のみ保持) | 直近のファイル、計画、agent/skill/tool などの自動再付加 |\n| サーキットブレーカー | 3 回 | 3 回(`autoCompact.ts:70`) |\n| reactive リトライ | 1 回 | CC にはより精緻な段階別リトライがある |\n\n### 実行順序の詳細\n\nCC ソース `query.ts` での実際の順序:\n\n1. `applyToolResultBudget`(L379):まず大きな結果を処理し、完全な内容を退避\n2. `snipCompact`(L403):中間メッセージを切り捨て\n3. `microcompact`(L414):古い結果のプレースホルダ化\n4. `contextCollapse`(L441):独立したコンテキスト管理システム(教学版にはなし)\n5. `autoCompact`(L454):LLM 全量要約\n\n教学版の budget → snip → micro の順序はこれと一致する。教学版には contextCollapse メカニズムがない。\n\n### 完全な定数リファレンス\n\n| 定数 | 値 | ソースファイル |\n|------|-----|--------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| 時間ベース micro_compact 間隔 | 60 分 | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse と sessionMemoryCompact\n\nCC ソースコードには、この教学版では展開していない 2 つのメカニズムが存在する:\n\n- **contextCollapse**:独立したコンテキスト管理システム。有効時には proactive autocompact を抑制し(`autoCompact.ts:215-222`)、collapse の commit/blocking フローがコンテキスト管理を引き継ぐ。ただし manual `/compact` と reactive fallback は独立パスのままで、contextCollapse の影響を受けない。\n- **sessionMemoryCompact**:compact_history の前に、CC は既存の session memory(s09 で解説)を使った軽量要約を先に試みる。LLM を呼び出さない。このメカニズムは s09 を学んだ後に振り返るとより理解しやすい。\n\n### 圧縮プロンプトの中身\n\nCC の圧縮プロンプトには 2 つの厳格な要件がある:\n\n1. **ツール呼び出しの絶対禁止**:冒頭が `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.` で、末尾にも再度 REMINDER がある\n2. **先に分析してから要約**:モデルはまず `` タグで思考を整理し、その後 `` タグで正式な要約を出力する。analysis はフォーマット時に除去される\n\n### 教学版の簡略化は意図的\n\n- micro_compact でテキストプレースホルダを使用 → API 層の `cache_edits` 権限がないため\n- token を文字数で推定 → 精密な tokenizer は教学の対象外\n- 圧縮後のリカバリを省略 → 教学版は要約のみを保持し、ファイルの自動再付加を行わない\n- 2 つの補助メカニズムを展開しない → 10% の細部に属する\n\nコア設計思想、安価なものを先に高価なものを後に、は完全に保持されている。\n\n
\n\n\n" + "content": "# s08: Context Compact — コンテキストはいつか満杯になる、場所を空ける方法が必要\n\ns01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](/ja/s09) → s10 → ... → s20\n> *\"Context will fill up — have a way to make room\"* — 4層圧縮戦略、安価なものを先に、高価なものを後に実行。\n>\n> **Harness レイヤー**: 圧縮 — クリーンな記憶、無限のセッション。\n\n---\n\n## 課題\n\nAgent が動いている途中で、止まってしまう。\n\nbash、read、write は揃っており、能力は十分。しかし 1000 行のファイル(~4000 token)を読み、さらに 30 のファイルを読み、20 のコマンドを実行したとします。各コマンドの出力、各ファイルの内容がすべて `messages` リストに蓄積されます。\n\nコンテキストウィンドウには上限があります。満杯になると、API は即座に拒否します:`prompt_too_long`。\n\n圧縮しなければ、Agent は大規模プロジェクトではまともに動けません。\n\n---\n\n## ソリューション\n\n![Compact Overview](/course-assets/s08_context_compact/compact-overview.ja.svg)\n\ns07 のフック構造、スキルロード、サブ Agent の骨格を維持し、圧縮に焦点を当てるため一部のツールは省略。コアの変更点:各 LLM 呼び出し前に 3 層のプリプロセッサ(0 API)を挿入し、token が閾値を超えた場合は LLM 要約(1 API)をトリガー、API エラー時には緊急トリムを実行。\n\nコア設計:安価なものを先に、高価なものを後に。\n\n---\n\n## 仕組み\n\n![4層圧縮パイプライン](/course-assets/s08_context_compact/compaction-layers.ja.svg)\n\n### L1: snip_compact — 無関係な古い会話を切り捨て\n\nAgent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。\n\nメッセージ数が 50 を超えた場合 → 先頭 3 件(初期コンテキスト)と末尾 47 件(現在の作業)を保持して中間を切り詰める。ただし切れ目だけは調整し、`assistant(tool_use)` と後続の `user(tool_result)` を分断しない:\n\n```python\ndef snip_compact(messages, max_messages=50):\n if len(messages) <= max_messages:\n return messages\n head_end, tail_start = 3, len(messages) - (max_messages - 3)\n if _message_has_tool_use(messages[head_end - 1]):\n while head_end < len(messages) and _is_tool_result_message(messages[head_end]):\n head_end += 1\n if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n tail_start -= 1\n snipped = tail_start - head_end\n placeholder = {\"role\": \"user\", \"content\": f\"[snipped {snipped} messages from conversation middle]\"}\n return messages[:head_end] + [placeholder] + messages[tail_start:]\n```\n\n切り捨て自体は単純なままで、境界だけを保護する。残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている。34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。\n\n### L2: micro_compact — 古いツール結果をプレースホルダに置換\n\n![古い結果のプレースホルダ](/course-assets/s08_context_compact/micro-compact.ja.svg)\n\nAgent が連続して 10 個のファイルを読んだ。1〜7 回目の完全な内容はまだコンテキストに残っており、もう不要だが、大量のスペースを占有している。\n\n直近 3 件の `tool_result` の完全な内容のみを保持し、それより古いものは 1 行のプレースホルダに置換:\n\n```python\nKEEP_RECENT_TOOL_RESULTS = 3\n\ndef micro_compact(messages):\n tool_results = collect_tool_result_blocks(messages)\n if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS:\n return messages\n for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]:\n if len(block.get(\"content\", \"\")) > 120:\n block[\"content\"] = \"[Earlier tool result compacted. Re-run if needed.]\"\n return messages\n```\n\n古い結果はクリーンアップされたが、1 件の新しい結果だけで 500KB の可能性がある。大きなファイルを `cat` するだけでコンテキストがいっぱいになる。→ L3。\n\n### L3: tool_result_budget — 大きな結果をディスクに退避\n\n![大きな結果のディスク退避](/course-assets/s08_context_compact/layer1-budget.ja.svg)\n\nモデルが一度に 5 つの大きなファイルを読み、1 つの user メッセージ内の全 `tool_result` の合計が 500KB に達した。\n\n最後の user メッセージ内のすべての `tool_result` の合計サイズを集計。200KB を超えた場合 → サイズ順にソートし、最大のものから順に `.task_outputs/tool-results/` に退避。コンテキストには `` マーカー + 先頭 2000 文字のプレビューのみを残す。モデルはマーカーを見て完全な内容がディスク上にあることを認識し、必要に応じて再読み込みできる。\n\n```python\ndef tool_result_budget(messages, max_bytes=200_000):\n last = messages[-1]\n blocks = [(i, b) for i, b in enumerate(last[\"content\"])\n if b.get(\"type\") == \"tool_result\"]\n total = sum(len(str(b.get(\"content\", \"\"))) for _, b in blocks)\n if total <= max_bytes:\n return messages\n ranked = sorted(blocks, key=lambda p: len(str(p[1].get(\"content\", \"\"))), reverse=True)\n for idx, block in ranked:\n if total <= max_bytes:\n break\n block[\"content\"] = persist_large_output(block[\"tool_use_id\"], str(block[\"content\"]))\n total = recalculate_total(blocks)\n return messages\n```\n\n最初の 3 層はすべて純粋なテキスト/構造操作(0 API 呼び出し)だが、会話内容を「理解」することはできない。コンテキストがまだ大きすぎる可能性がある。→ L4。\n\n### L4: compact_history — LLM 全量要約\n\n![LLM 全量要約](/course-assets/s08_context_compact/auto-compact.ja.svg)\n\n最初の 3 層がすべて実行されたが、超大規模プロジェクトで 30 分間連続作業すると、token がまだ閾値を超えている。\n\n3 ステップのフロー:\n\n1. **transcript を保存**:完全な会話を `.transcripts/` に JSONL 形式で書き出す。transcript は回復可能な記録として保存されるが、モデルのアクティブなコンテキストには要約しか残らない。モデルの現在の推論にとって、詳細はすでにコンテキストにない。教学コードは transcript 検索ツールを提供しない。\n2. **LLM で要約を生成**:会話履歴を LLM に送り、現在の目標、重要な発見、変更済みファイル、残りの作業、ユーザーの制約などの重要な情報を保持するよう指示。\n3. **メッセージリストを置換**:すべての古いメッセージが 1 件の要約に置き換えられる。教学版は要約のみを保持する。実際の Claude Code は compact 後に直近のファイル、計画、agent/skill/tool などのコンテキストを再付加する。\n\n```python\ndef compact_history(messages):\n transcript_path = write_transcript(messages) # 先に完全な会話を保存\n summary = summarize_history(messages) # LLM で要約を生成\n return [{\"role\": \"user\",\n \"content\": f\"[Compacted]\\n\\n{summary}\"}]\n```\n\n**サーキットブレーカー**:連続 3 回失敗したらリトライを停止し、無限ループによる API 呼び出しの浪費を防止。\n\n### 緊急: reactive_compact\n\nAPI がまだ `prompt_too_long`(413)を返すことがある。コンテキストの増加速度が圧縮のトリガー速度を上回る場合。\n\nこの時 **reactive_compact** がトリガーされる:compact_history よりもさらに積極的だが、末尾を残す際も孤立した `tool_result` を残さないようにする。\n\n```python\ndef reactive_compact(messages):\n transcript = write_transcript(messages)\n summary = summarize_history(messages)\n tail_start = max(0, len(messages) - 5)\n if _is_tool_result_message(messages[tail_start]) and _message_has_tool_use(messages[tail_start - 1]):\n tail_start -= 1\n return [{\"role\": \"user\",\n \"content\": f\"[Reactive compact]\\n\\n{summary}\"}, *messages[tail_start:]]\n```\n\nreactive compact にはリトライ上限がある(デフォルト 1 回)。さらに失敗した場合は例外をスローし、無限ループしない。完全なエラー回復ロジックは s11 に委ねる。\n\n### 合わせて実行\n\n```python\ndef agent_loop(messages):\n reactive_retries = 0\n while True:\n # 3 つのプリプロセッサ(0 API 呼び出し)\n # 順序:budget を先に実行し、大きな内容をプレースホルダ化する前に退避\n messages[:] = tool_result_budget(messages) # L3: 大きな結果を退避\n messages[:] = snip_compact(messages) # L1: 中間を切り捨て\n messages[:] = micro_compact(messages) # L2: 古い結果をプレースホルダに\n\n # まだ足りない?LLM 要約(1 API 呼び出し)\n if estimate_token_count(messages) > THRESHOLD:\n messages[:] = compact_history(messages)\n\n try:\n response = client.messages.create(...)\n except PromptTooLongError:\n if reactive_retries < MAX_REACTIVE_RETRIES:\n messages[:] = reactive_compact(messages) # 緊急対応\n reactive_retries += 1\n continue\n raise # リトライ上限超過、例外をスロー\n # ... ツール実行 ...\n\n # compact ツール:モデルが能動的に呼び出した場合、compact_history をトリガー\n if block.name == \"compact\":\n messages[:] = compact_history(messages)\n results.append({..., \"content\": \"[Compacted. History summarized.]\"})\n messages.append({\"role\": \"user\", \"content\": results})\n break # 現在のターンを終了し、圧縮後のコンテキストで新しく開始\n```\n\n**順序は変えられない。** L3(budget)が L2(micro)の前に実行される理由:micro は古い大きな tool_result を 1 行のプレースホルダに置換するため、budget はその前に完全な内容を退避させる必要がある。CC ソースが `applyToolResultBudget` を最初に配置する理由も同じ。\n\n---\n\n## s07 からの変更点\n\n| コンポーネント | 変更前 (s07) | 変更後 (s08) |\n|------|-----------|-----------|\n| コンテキスト管理 | なし(コンテキストが無限に膨張) | 4 層圧縮パイプライン + 緊急対応 |\n| 新規関数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact |\n| ツール | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | 8 + compact (9) |\n| ループ | LLM 呼び出し → ツール実行 | 各ラウンド前に 3 層プリプロセッサを実行 + 閾値で compact_history をトリガー |\n| 設計原則 | — | 安価なものを先に、高価なものを後に |\n\n---\n\n## 試してみよう\n\n```sh\ncd learn-claude-code\npython s08_context_compact/code.py\n```\n\n以下のプロンプトを試してみてください:\n\n1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(連続して複数のファイルを読み、L2 の古い結果圧縮を観察)\n2. `Read every file in s08_context_compact/`(一度に大量の内容を読み込み、L3 のディスク退避を観察)\n3. 20+ ラウンドの対話を繰り返し、`[auto compact]` または `[reactive compact]` が表示されるか観察\n\n観察のポイント:ツール実行のたびに、古い tool_result は圧縮されているか?連続対話で token が閾値を超えたとき、要約が自動的にトリガーされたか?\n\n---\n\n## 次へ\n\nコンテキスト圧縮により、Agent は長時間クラッシュせずに動けるようになった。しかし、圧縮のたびにユーザーが以前に伝えた偏好や制約も一緒に失われてしまう。Agent が重要なことを選択的に記憶できるようにできないか?\n\ns09 Memory → 3 つのサブシステム:何を記憶するかの選択、重要情報の抽出、整理と統合。圧縮を越え、セッションを越えて。\n\n
\nCC ソースコードの詳細\n\n> 以下は CC ソースコード `compact.ts`、`autoCompact.ts`、`microCompact.ts`、`query.ts` の分析に基づく。\n\n### 実行順序の対応\n\n教学版は説明の便宜上 L1/L2/L3/L4 と番号を振っているが、実際の実行順序は番号と完全には一致しない:\n\n| 項目 | 教学版 | Claude Code |\n|------|--------|-------------|\n| 実行順序 | budget → snip → micro → auto | budget → snip → micro → collapse → auto(`query.ts:379-468`) |\n| snip_compact | 先頭 3 + 末尾 47 を保持 | CC はメインスレッドのみ有効;実装はオープンソースリポジトリにない(`HISTORY_SNIP` feature gate)、インターフェースは確認可能:`snipCompactIfNeeded(messages)` → `{ messages, tokensFreed, boundaryMessage? }`、`SnipTool` もモデルが能動的に呼び出し可能。教学版の 3/47 は簡略パラメータ |\n| micro_compact | テキストプレースホルダで置換 | 2 つのパス:time-based は直接内容をクリア、cached は API の `cache_edits` を使用(legacy パスは削除済み) |\n| micro_compact ホワイトリスト | 位置による(直近 3 件) | time-based は時間閾値でトリガー、cached はカウントでトリガー(`microCompact.ts`) |\n| tool_result_budget | 200KB 文字 | 200,000 文字(`toolLimits.ts:49`) |\n| compact_history 閾値 | 文字数で推定 | 精密な token 数:`contextWindow - maxOutputTokens - 13_000` |\n| 要約の要求 | 5 種類の情報 | 9 つのセクション + ``/`` デュアルタグ |\n| 圧縮プロンプト | シンプルなプロンプト | 先頭と末尾に二重の安全ガードでツール呼び出しを禁止 |\n| PTL retry | あり(簡略版) | `truncateHeadForPTLRetry()` がメッセージグループ単位でロールバック(`compact.ts:243-290`) |\n| 圧縮後のリカバリ | なし(教学版は要約のみ保持) | 直近のファイル、計画、agent/skill/tool などの自動再付加 |\n| サーキットブレーカー | 3 回 | 3 回(`autoCompact.ts:70`) |\n| reactive リトライ | 1 回 | CC にはより精緻な段階別リトライがある |\n\n### 実行順序の詳細\n\nCC ソース `query.ts` での実際の順序:\n\n1. `applyToolResultBudget`(L379):まず大きな結果を処理し、完全な内容を退避\n2. `snipCompact`(L403):中間メッセージを切り捨て\n3. `microcompact`(L414):古い結果のプレースホルダ化\n4. `contextCollapse`(L441):独立したコンテキスト管理システム(教学版にはなし)\n5. `autoCompact`(L454):LLM 全量要約\n\n教学版の budget → snip → micro の順序はこれと一致する。教学版には contextCollapse メカニズムがない。\n\n### 完全な定数リファレンス\n\n| 定数 | 値 | ソースファイル |\n|------|-----|--------|\n| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` |\n| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` |\n| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` |\n| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` |\n| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` |\n| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` |\n| 時間ベース micro_compact 間隔 | 60 分 | `timeBasedMCConfig.ts` |\n| `MAX_COMPACT_STREAMING_RETRIES` | 2 | `compact.ts:131` |\n\n### contextCollapse と sessionMemoryCompact\n\nCC ソースコードには、この教学版では展開していない 2 つのメカニズムが存在する:\n\n- **contextCollapse**:独立したコンテキスト管理システム。有効時には proactive autocompact を抑制し(`autoCompact.ts:215-222`)、collapse の commit/blocking フローがコンテキスト管理を引き継ぐ。ただし manual `/compact` と reactive fallback は独立パスのままで、contextCollapse の影響を受けない。\n- **sessionMemoryCompact**:compact_history の前に、CC は既存の session memory(s09 で解説)を使った軽量要約を先に試みる。LLM を呼び出さない。このメカニズムは s09 を学んだ後に振り返るとより理解しやすい。\n\n### 圧縮プロンプトの中身\n\nCC の圧縮プロンプトには 2 つの厳格な要件がある:\n\n1. **ツール呼び出しの絶対禁止**:冒頭が `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.` で、末尾にも再度 REMINDER がある\n2. **先に分析してから要約**:モデルはまず `` タグで思考を整理し、その後 `` タグで正式な要約を出力する。analysis はフォーマット時に除去される\n\n### 教学版の簡略化は意図的\n\n- micro_compact でテキストプレースホルダを使用 → API 層の `cache_edits` 権限がないため\n- token を文字数で推定 → 精密な tokenizer は教学の対象外\n- 圧縮後のリカバリを省略 → 教学版は要約のみを保持し、ファイルの自動再付加を行わない\n- 2 つの補助メカニズムを展開しない → 10% の細部に属する\n\nコア設計思想、安価なものを先に高価なものを後に、は完全に保持されている。\n\n
\n\n\n" }, { "version": "s09",