hashintel · kostandinang · Jun 18, 2026 · Jun 18, 2026 · Jun 18, 2026 · Jun 18, 2026
diff --git a/memory/CARDS.md b/memory/CARDS.md
@@ -1,107 +1,179 @@
-# Scope cards — cook-artifact-lifecycle (FE-883)
+# Scope cards — epic-verify-recovery (FE-884)
+
+Execution queue for `epic-verify-recovery` (FE-884, branch
+`ka/fe-884-epic-verify-recovery`, stacked on FE-883's `ka/fe-883-worktree-gc`).
+
+**Core problem:** the orchestrator's two verification tiers are asymmetric. The
+slice tier is recoverable (`failing-tests → code-agent → run-tests`, in-net
+retry budget). The epic tier is terminal: `epic-verify:<epic>:fail` routes
+straight to `epicHaltedPlace` via `attach-halt-reason` (`net-compiler.ts`
+~458–535). So the one place cross-slice defects surface — epic integration — is
+the one place the harness cannot act on what it found. A failed epic halts the
+whole run and promotes nothing, discarding the diagnosis, the folded worktree,
+and every passing epic.
+
+**Worked example:** run `59100820-...` (spec 49, 3 epics / 11 slices, 60m26s).
+11/11 slices + 2/3 epics passed; `route-integration` failed on a real bug (view
+toggle wrote `?view=graph` but the sibling `useViewParam()` never resynced —
+`pushState` doesn't emit `popstate`). The verify agent named the exact fix; the
+run halted anyway. The fix (`brunch:viewparamchange` event, ~10 lines) was
+applied by hand afterward — on a worktree the harness already had, from a
+diagnosis it already produced.
+
+**Builds on FE-883:** the epic verify already composes the folded
+`__epic__/<epicId>/` tree (`materializeEpicVerifyTree`), and promotion folds
+slice commits (`harvestCookRun`, idempotent `commitSliceWorktree`). FE-884 makes
+a failed epic *recoverable* rather than *terminal* — substrate-free, distinct
+from Arc-2 `interactive-recovery`/`adaptive-replan` (which need the parked
+semantic substrate).
+
+**Owed reconciliation:** FE-884 is not yet a frontier in `memory/PLAN.md`, and
+the oracle strategy below is not yet folded into SPEC §Verification Design.
+Reconcile via ln-plan + ln-sync when FE-884 registers / lands.
 
-Execution queue for `cook-artifact-lifecycle` (FE-883, branch
-`ka/fe-883-orchestrator-improvements`, on FE-864).
+---
 
-**Reality check (corrected after basing on FE-864, the current seam):** the
-brownfield git-merge composer already exists — `run-artifact.ts` (commit
-871ef087): `commitSliceWorktree` + `foldSliceBranches` do a real `git merge-tree`
-3-way fold of per-slice branches in dependency order, fail-closed on conflicts,
-pure plumbing (I135-K preserved). It was deliberately left **unwired** pending "a
-live-run check of the dependency-seed interaction". So FE-883 is *wire the
-existing composer*, not *build it*.
+## Slice A — recoverable epic verification (the missing green step)
 
-This matches the Slice-1 spike decision (2026-06-18): git-merge for brownfield
-(common ancestor → real 3-way), file-copy union for greenfield (no common
-ancestor), elevate collisions to a first-class outcome.
+Status: **done** (2026-06-18). All 7 acceptance criteria proven (run-59100820
+by analog; real-agent dogfood is outer-loop, not run). Gate green: check 0
+errors, build pass, orchestrator + epic-recovery e2e 104 tests pass (full suite
+2101 pass / 2 skip; the single `build-boundary` failure is the pre-existing
+dev-worktree `node_modules` symlink artifact documented on FE-883's PR).
 
----
+**Design finding (round-trip assumption — VALIDATED with caveat):** the naive
+assumption was false — `harvestCookRun` folds only *slice* worktrees; the
+`__epic__/<id>/` tree is detached and discarded. A remediation fix made in the
+folded tree must be **diff-transferred and committed to the representative slice
+branch** (`transferFoldedFixToSlice` in `run-artifact.ts`) to be folded into the
+promoted artifact. → record as a SPEC decision + invariant on canonical
+reconciliation (owed).
 
-## Slice 1 — wire the run-artifact composer into the live path
+Full scope card — structural (changes the epic-verify topology; establishes the
+invariant *a failed epic is recoverable, not terminal*).
 
-Status: **in progress.**
+### Target Behavior
 
-### Sub-steps
+A failed epic verification dispatches a remediation code agent against the folded
+epic tree and re-verifies, reaching the halt sink only after the epic's
+remediation budget is exhausted.
+
+### Boundary Crossings
 
 ```
-✓ 1a (done, commit 2357f941) — composer correct under dependency-seeding. The
-  deferred "live-run check" failed: a dependent slice extending a dep-seeded file
-  false-conflicted because slice branches share no inter-slice ancestry. Fix:
-  commit each slice recording its dependency commits as parents, so the fold's
-  merge-base is the dependency. Regression test added; unfaithful happy-path test
-  corrected. (epic-sandbox-merge.ts file-copy untouched.)
-
-✓ mechanism (commits fadb1b52, 5e1d8d32) — proved + factored the fold so both
-  1b and 1c can use it: foldToCommit (fold N slice commits onto a base, fail-closed,
-  no ref write) + materializeFoldedWorktree (fold + `git worktree add --detach`,
-  rework-safe). Tests pin: 3-way merge of different-hunk edits to one file keeps
-  both; the fold materializes on disk in a verify worktree.
-
-✓ 1c DECISION (2026-06-18): verify against the folded tree (option i). One
-  composition path → the tree verified == the tree shipped; no verify≠ship gap on
-  same-file edits. The worktree-checkout unknown is de-risked by materializeFoldedWorktree.
-
-✓ 1b/1c INTEGRATION (done, commit d92ce38b) — engine wired end-to-end:
-  - net-compiler verify-epic: brownfield uses materializeEpicVerifyTree (commit
-    slices dep-order → fold → detached worktree at __epic__/<epicId>/ → relink
-    node_modules); fold conflict → fail the epic (passed:false report → fail sibling).
-    Greenfield keeps the file-copy union.
-  - cook-cli promotion: brownfield calls harvestCookRun; fold conflicts → fatal run
-    outcome. I135-K preserved (all plumbing).
-  - commitSliceWorktree made idempotent so promotion reuses the commits verify made.
-  - Stale epic-sandbox-merge.ts TODO updated; SPEC I124-K amended (plan.mode fork).
-  - Full orchestrator suite green (672). Single-slice brownfield-smoke exercises the
-    engine plumbing; a *multi-slice* end-to-end engine test is still a gap to add.
-
-○ 1d (remaining) — retire the now-dead promoteBrownfieldRun + BrownfieldPromoteOptions.
-  Blocked on rewriting the landCookBranch test fixture (repoWithPromotedCook uses
-  promoteBrownfieldRun to build a promoted branch — rebuild it via harvestCookRun or
-  a plain commit). mergeSlicesIntoEpicSandbox STAYS (it is the greenfield composer).
+→ epic-verify:<epic>:fail        (report.passed falsy — today's dead-end sibling)
+→ epic-remediate:<epic>:dispatch → epic-remediate:running   (new; mirrors the slice dispatch/running split)
+→ code agent in __epic__/<epic>/ folded worktree (FE-883), fed the verify diagnosis
+→ detect-and-reject guard: post-attempt git diff touches the epic integration test path → discard, count against budget
+→ commit fix into the owning slice branch via idempotent commitSliceWorktree (FE-883)
+→ epic-remediate:<epic>:complete → back to verifyPlace   (re-run verify-epic + slice suites on the folded tree)
+→ epic-retry-budget place: decrement; on exhaustion → epicHaltedPlace (attach-halt-reason, honest cause)
 ```
 
-### Acceptance Criteria (slice-level)
+### Risks and Assumptions
 
 ```
-✓ dep-seed — a dependent slice extending a dep-seeded file folds clean (done, 1a)
-○ brownfield-3way — two brownfield slices editing different hunks of the same
-  pre-existing file both survive promotion (the file-copy union drops one)
-○ brownfield-conflict — a true overlapping-hunk conflict surfaces as a fatal run
-  outcome, not a buried event field
-○ checkout-untouched — promotion still never touches the user's branch / tree /
-  index (I135-K)
-○ greenfield-unchanged — serial-greenfield shared-tree + parallel-greenfield
-  file-copy paths preserved
+- RISK: a remediation agent greens the epic by editing the integration test, not product code
+    → MITIGATION: detect-and-reject (git diff touches the epic test path → discard + budget); dual re-verify (slice suites must also pass)
+- RISK: a fix in the detached folded tree never reaches promotion
+    → MITIGATION: round-trip through commitSliceWorktree onto the owning slice branch so harvestCookRun folds it
+- ASSUMPTION: an epic-level fix can be attributed to one slice's branch (vs a synthetic "integration slice" commit)
+    → VALIDATE: trace harvestCookRun's fold over an added commit on a representative slice → [→ memory/SPEC.md §Assumptions]
+- ASSUMPTION: the slice-loop retry-budget machinery generalizes to the epic lane unchanged
+    → VALIDATE: epic-retry-budget place + dispatch/complete siblings reuse the existing in-net retry pattern
 ```
 
-### Verification Approach
+### Acceptance Criteria
 
 ```
-- Inner: run-artifact.test.ts (done), promote-run.test.ts, epic-sandbox-merge.test.ts
-- Middle: brownfield-smoke.integration.test.ts — seeded repo, overlapping slices
-- Outer: dogfood a multi-slice brownfield cook with an intentional file overlap
+✓ epic-remediation-fires — a falsy verify report routes to epic-remediate, not directly to halt
+✓ re-verify-loop — remediate:complete returns to verifyPlace and re-runs verify-epic
+✓ dual-re-verify — remediation is accepted only if the epic integration test AND the slice suites pass on the folded tree
+✓ budget-exhaustion-halts — after N failed attempts the epic reaches epicHaltedPlace with an honest reason
+✓ oracle-integrity — an attempt that modifies the epic integration test file is rejected and counts against budget
+✓ fix-promotes — a remediation commit is folded by harvestCookRun (the fix survives into the promoted artifact)
+✓ run-59100820-closes — replaying the example run, the route-integration epic self-heals within budget (outer)
 ```
 
----
+### Verification Approach (oracle strategy)
+
+```
+- Inner:
+  · topology golden/adapter — :fail routes → epic-remediate → verifyPlace; budget decrement; exhaustion → halt
+  · negative-space test-path guard — post-attempt git diff touching the epic test path → reject + budget
+  · engine contract suite stays green (runtime equivalence on the unchanged paths)
+- Middle:
+  · scripted-agent integration (model-based) over the SYNTHETIC broken-then-fixable epic fixture:
+      (fail → edit product code → pass) reaches `done`; (fail → edit test) is rejected
+  · dual re-verify (invariant) — epic integration test + slice suites both green on the folded tree
+  · promotion round-trip (differential) — the remediation commit appears in the harvested tree
+- Outer:
+  · real-agent dogfood replay of run 59100820 — epic self-heals unattended (one-shot confidence, human-observed)
+```
 
-## Slice 2 — worktree + branch GC / lifecycle (light) — `done`
+### Acknowledged blind spots
 
-Branch `ka/fe-883-worktree-gc` (stacked on FE-883). `gcCookRun` (run-refs.ts,
-commit bf43477f) reclaims the run's worktrees (run + nested slice/__epic__,
-deepest-first) + the intermediate `brunch/slice/<runId>/*` branches, keeping the
-`brunch/run/<runId>` artifact branch and every other run untouched; realpath-safe
-(macOS /var→/private/var). Wired into cook-cli: auto-GC on a **completed +
-promoted** brownfield run, best-effort (never fails a good run); halted/conflicted
-runs return earlier and keep their worktrees for inspection (keep-on-failure).
-Decision: auto-GC (no flag) — "no leaks by default". Tests: run-refs.test.ts
-(reclaim + unrelated-run-untouched). Gap: no end-to-end runCook test exercises the
-auto-GC call (same gap as the promotion wiring).
+```
+- LLM remediation COMPETENCE is not oracle-able — only loop mechanics are. Mitigation: budget + honest halt.
+    Revisit: dogfood shows low fix-rate.
+- detect-and-reject guards only the EPIC test path; an agent could weaken a SLICE test instead.
+    Mitigation: dual re-verify (slice suites must pass). Revisit: a remediation greens by editing a slice test → freeze all *.test.* under the epic.
+- a flaky epic test (the original ETIMEDOUT) misread as a logic fail → deferred to Slice B.
+- wall-clock cost of extra agent round-trips — no time budget gate. Accept for now.
+```
+
+---
+
+## Slice B — infra/timeout classification at the epic verdict — `done` (2026-06-18)
+
+All 4 acceptance criteria met; gate green (check 0 errors, build ✓, full suite
+2110 pass / 2 skip; the lone `build-boundary` failure is the pre-existing
+dev-worktree symlink artifact). **Correctness finding:** the prior verify
+subprocess timeout was `60_000`, and `spawnSync` timeout surfaces as
+`error.code === 'ETIMEDOUT'` — but only `ENOENT` was classified infra, so a
+**timeout was misclassified as `test`** and (with Slice A) would have wrongly
+fed the remediation code agent a non-bug. Fixed: `ETIMEDOUT → infra`
+(`isInfraSpawnError`) + raise `VERIFY_TIMEOUT_MS` to `180_000` (npx + code-split
+warmup, ~25s observed). Distinct from FE-864's pi *session* deadline. Infra
+re-verify is counted by a separate `Token.infraRetryCount` /
+`RunPolicy.maxInfraRetries` (defaults to `maxRetries`), so blips don't consume
+remediation attempts.
+
+Light-ish card (adds a small topology arm inside A's now-settled verify-epic seam).
+
+**Objective:** at the epic verdict, route on `failureKind` (already computed by
+`runVerification`, FE-872) so an infra/timeout failure is retried as a toolchain
+blip, not fed to the remediation code agent or silently halted.
+
+**Design decisions:**
+- Split the verify-epic fail-sibling by `failureKind`: `infra` → a bounded
+  **infra-retry** chain (re-dispatch verify; **no** code agent — nothing for an
+  agent to fix) → `verifyPlace`; exhaustion → `epicHaltedPlace` with an honest
+  *infra* reason. `test`/logic → the Slice-A remediation loop (unchanged).
+- A **separate `epic-infra-budget`** distinct from A's `epic-retry-budget`, so a
+  toolchain blip doesn't consume remediation attempts (and vice versa).
+- **Timeout sizing:** size the verify subprocess (`spawnSync`) timeout to the
+  target's real cost so `npx` resolution + code-split warmup doesn't spuriously
+  `ETIMEDOUT` (the `graph-route-wiring` test alone ran 25s). Coordinate with
+  FE-864's pi-timeout work; do not regress it.
+
+**Acceptance:**
+```
+✓ infra-retries — an infra/timeout verdict re-runs verify (bounded), not the code agent
+✓ infra-exhaustion-halts-honestly — exhausted infra retries halt with an infra reason (not "tests failed"/"remediation attempts")
+✓ logic-still-remediates — a test/logic failure still routes to the Slice-A remediation loop
+✓ timeout-sized — the verify subprocess timeout accommodates code-split warmup (ETIMEDOUT-class regression)
+```
 
-## Slice 3 — per-slice build-cache write isolation (candidate)
+**Verification:** topology goldens (fail-sibling splits on failureKind; infra-retry
+chain + budget; exhaustion→halt reason); engine-contract green; e2e scenario where
+verify returns `failureKind:'infra'` once then passes (retries, not remediated).
 
-May instead be an FE-879 follow-on (FE-879 owns `SHAREABLE_TOP_LEVEL_ENTRIES`).
-Decide ownership before scoping.
+Independent of A's logic path; lands on the same branch.
 
-## Out of scope (noted)
+## Slice C — partial promotion / salvage — deferred (not pre-carded)
 
-- Sync `git worktree add` serialization (`epic-sandbox-merge.ts:288`) — perf, not
-  correctness; FE-879 laziness already bounds worktree count.
+Extend `harvestCookRun` to promote passing epics and hand back the folded
+worktree + the failing epic's diagnosis instead of `nothing promoted`. Shape
+depends on A's commit-round-trip topology and FE-883's GC ref-set, so do **not**
+pre-card it until A lands.