diff --git a/memory/PLAN.md b/memory/PLAN.md index 80731af69..3d4a72964 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -19,6 +19,8 @@ The next product arc is the **Conversational Workspace Runtime** umbrella (`docs The **orchestrator / Petri-net execution substrate** is committed (2026-05-21) to Petri as the forward execution model, justified by parallelism, simulation, and resume value claims. Phases 0–2 are done: the dual-engine PoC (Phase 0, FE-730) validated the substrate and extracted the compiler/interpreter; Phase 1 (FE-738) added two-lane mechanical+semantic subnets, the compiler topology/wiring split, and §7 event vocabulary; Phase 2 (FE-743) added parallel firing policy with greedy token claiming, shared resource pool tokens bounding global concurrency, and worktree-per-slice isolation — the decision gate passed (parallel measurably beats serial on wall clock). Phase-3-prep `petri-declarative-routing` (FE-747) is done: typed Guard predicates on `HandlerDescriptor` plus `enumerateCandidateOutputs` make topology-only enumeration of reachable output places possible (I125-K). Phase 3 (graph compilation) remains blocked on `intent-graph-semantics` (FE-700) for relation-policy gates; Phase 4 (simulation oracle) now has its routing-side structural prerequisite satisfied but still needs Phase 3 for graph-derived gates. The north-star design is `docs/next/architecture/plan-graph-petri-orchestration.md`. +The orchestrator's forward direction is framed as two arcs toward a **full (autonomous) cook orchestrator** — "completed spec → feature built and glued into a real brownfield repo, no manual steps." **Arc 1 (feature delivery)** stacks on FE-843 and ships standalone without the semantic stack. `agent-extension-host` (the dual-mode pi-harness contract) **bases the Arc-1 linear stack** (2026-06-15 decision) — every Arc-1 frontier sits on it — followed by `brunch-detect` (resolve a registry profile id from repo manifest/lockfile evidence at plan time) → `harness-dep-install` (capture the dependency-delta for promotion + classify install/infra failures distinctly from test failures; the install action itself is agent-native) → `app-runtime-probe` (build + boot + exercise the host app — the concrete reachability mechanism) → `integration-oracle` (wire into host + product reachability, via the probe) → `brownfield-promotion` (glue back into the checkout) → `brunch-ship` (one-shot wrapper). A `dogfood-spike` (ln-spike) — run the full chain on one real brunch feature — should precede committing `integration-oracle`, to surface the reachability mechanism, dep-install, orientation depth, and brownfield plan-shape risks cheaply. CLI surface: the real commands are `brunch plan`, `brunch cook`, and `brunch serve` (the one-shot capstone, FE-878). The kitchen-brigade names (prep/recipe/taste/plate) are **phase labels, not commands** — detect runs inside `plan`; probe + oracle (verify) and promotion (plate) run inside `cook`/`serve`. Frontier ids stay descriptive; `serve` chains the phases end-to-end. The settled grounding decision is **cook-time** (planning stays host-blind per D160-K; the cook agent resolves real paths/wiring by reading the worktree), which softens FE-829's `writes` ownership to *advisory in brownfield only* — greenfield keeps it authoritative. Protecting invariant: **brownfield generalization must not change greenfield-mode behavior; shared contracts fork on `plan.mode`** (the 3 reference fixtures + a greenfield smoke must score identically before/after each frontier). **Arc 2 (full orchestrator)** is an autonomy ladder gated behind the parked semantic/Petri-Phase-3/4 substrate: `interactive-recovery` (halt → coherent question answered in a secondary chat, resumes the run) → `intent-conformance-oracle` (independent behavioral-kernel verification, requisite variety) → `adaptive-replan` (architect amends the plan from execution feedback, recompile + resume). Each rung raises the autonomy ceiling and is independently shippable. Non-additive work (refactors/migrations/debugging) is explicitly a separate `transformation-orchestrator` product line, not folded into either arc. The cook-time grounding decision, the D160-K `writes`-advisory amendment, and the greenfield-protecting invariant need recording in SPEC via ln-sync when the first Arc-1 frontier is scoped. **Agent-host coordination:** the pi harness is a dual-mode (`elicit`/`execute`) agent-extension host (`agent-extension-host`) — cook capabilities are `execute`-mode plugins on a shared, mode-neutral core; this contract is the serialization point with the unpublished pi-harness thread (which owns the core), validated against the existing interview as the `elicit` witness. It logically gates only the dispatch-seam frontiers (`integration-oracle`, Arc-2 `interactive-recovery`/`adaptive-replan`), but is sequenced at the **base of the Arc-1 linear stack** (2026-06-15 decision) — so the whole arc lands on it, deliberately serializing the cook stack behind the pi-harness-thread coordination rather than running the seam-independent infra (`brunch-detect`, `harness-dep-install`, `app-runtime-probe`, `brownfield-promotion`) in parallel ahead of it. + The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agent-mutation design notes are reconciled into one direction. `docs/design/MULTI_CHAT.md` is the substrate document. `docs/design/SIDE_CHAT.md` describes side-chat V1 / V2 / V3.0 / V3.1 / V4 phasing on top of that substrate. `docs/design/PATCH_LEDGER.md` remains historical deeper design pressure for semantic mutation history, but canonical future-facing vocabulary is `changeset` / `change`. The product-layer ontology trajectory is split out as `docs/design/INTENT_GRAPH_SEMANTICS.md` and `docs/design/BEHAVIORAL_KERNELS.md`; broader synthesis lives in `docs/archive/design/INTENT_SPEC_EVOLUTION.md`. FE-705's branch-local strategy/proposal notes add scenario options, graph-review oracle, chat-local strategies, and concern/dependency mapping; those notes should become a canonical design doc when the branch is integrated. Coordination uses a substrate-strangler posture: keep existing frontend REST/SSE contracts stable while route adapters and capability adapters converge on shared server-owned handlers, then cut over UI flows only after parity and changeset-backed authority exist. The dev-layer self-tooling trajectory lives in `docs/design/ln-skills/EVOLUTION.md`. ## Sequencing @@ -45,10 +47,24 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen #### Follow-ons surfaced by the 2026-05-26 cook-codebase-mode smoke - ~~**pi-actions evaluate-done collapses the TDD workflow**~~ — **resolved by `cook-harness-fidelity` (FE-813)**: Slice 1 (`d2139d8c`) scoped the evaluator to read-only tools so it cannot fix code during evaluation; Slice 2 (`fcba8ab3`) replaced the LLM verdict with executing the verification targets. -- **cook output promotion (follow-on)** — slice 3 creates real slice branches (`cook-slice//`) but never commits; `cook/` HEAD === source HEAD with modifications in untracked subdirs, so there is no promotion path into the user's checkout. To close: commit slice work, `git merge` slice→epic→`cook/`, then `git merge cook/` from the working branch. Pairs with worktree/branch GC. Quality-of-life; the run worktree is already inspectable by hand. +- **cook output promotion (follow-on)** — slice 3 creates real slice branches (`cook-slice//`) but never commits; `cook/` HEAD === source HEAD with modifications in untracked subdirs, so there is no promotion path into the user's checkout. To close: commit slice work, `git merge` slice→epic→`cook/`, then `git merge cook/` from the working branch. Pairs with worktree/branch GC. Quality-of-life; the run worktree is already inspectable by hand. **Resolved by `brownfield-promotion` (FE-877).** +- ~~**per-slice worktree over-copy + eager seeding (optimization)**~~ — **resolved by `cook-worktree-laziness` (FE-879)**: slice worktrees materialize lazily at fire time (idempotent `ensureSliceWorktree`) instead of all-up-front in `wireHandlers`, and each slice symlinks `node_modules` to the parent's single copy instead of CoW-copying per slice. Closes acceptance (8) "over-copy accepted as a known follow-on optimization" + the sandcastle trigger (b) "native epic-merge over-copy becomes a measurable performance bottleneck." ### Next +**Full cook orchestrator — Arc 1 (feature delivery; stacks on FE-843, ships without the semantic stack):** + +1. `agent-extension-host` — **(contract landed — FE-867)** the pi harness as a dual-mode (`elicit`/`execute`) extension host; cook capabilities register as `execute`-mode plugins. **Bases the Arc-1 linear stack** (2026-06-15 decision): the whole arc stacks on it, coordinated with the unpublished pi-harness thread (which owns the core). Logically it only gates the dispatch-seam frontier (`integration-oracle`), so serializing the seam-independent infra (2–5) behind it is a deliberate coupling of Arc 1 to that coordination, not a hard dependency. Sits over the FE-841 core. +2. `brunch-detect` — **(done — FE-871)** resolve a registry profile id from manifest/lockfile evidence at plan time; brownfield-only front of the chain, now wired into the emitter (slice 2). *(seam-independent)* +3. `harness-dep-install` — **(acceptance 1–2 landed except brownfield — FE-872)** dependency-delta capture + install-failure classification (the install *action* is agent-native via `bash` + FE-843 conventions; this owns lockfile capture for promotion + the fail/infra split). Done: classify + infra-aware halt reason + greenfield manifest/lockfile capture pinned. Remaining: brownfield dep-delta capture — **blocked on `brownfield-promotion`** (#7). +4. `dogfood-spike` (ln-spike) — **(done — 2026-06-16)** ran a real brownfield cook (hand-authored 2-slice plan: feature + wiring, `node:http` app) against a throwaway git repo. **Verdict:** chain works end-to-end (CoW worktree, clean-tree gate, per-slice→`__epic__` merge composed the wiring, TDD red/green, working branch untouched); the agent wired the feature reachable and **self-authored a genuine boot-and-probe** integration test (imports the real entry, `listen(0)`, `http.get('/health')`, asserts not-404). Orphan did **not** reproduce — but reachability was **agent-discretion, not enforced** → confirms the *value* of `integration-oracle`/`app-runtime-probe` (independent, unshortcuttable reachability). Two refinements surfaced: the probe should own the boot mechanism (the agent had to invent a `.js→.ts` resolve hook), and dep-install was unexercised (zero-dep app). Bonus: the `Cannot find module` TDD red was handled as a test-red (not infra) — validates FE-872 slice 1 live. +5. `app-runtime-probe` — **(slices 1–2 landed — FE-875, `runProbe` + `buildProbeSpec`)** build + boot + exercise the host app; the concrete reachability mechanism `integration-oracle` depends on (without it, "reachable" collapses back to "a test that imports the module"). Slice 1: boot + HTTP probe + reachable/not-reachable/infra classification + teardown. Slice 2: harness-owned `ProbeSpec` resolution — `buildProbeSpec(ProbeTarget)` allocates a free ephemeral port and assembles ready/feature URLs from boot-argv + *paths*, so a hardcoded port can't collide under parallel cook (the boot test's hand-rolled port dance is now the production primitive it dogfoods). Stays off the dispatch seam: argv + paths are inputs cook-time grounding will supply; the harness owns only the port pick + URL/env assembly (loopback-only; best-effort ephemeral port with an acknowledged TOCTOU window, no retry framework). Every probe HTTP call (readiness poll + feature request) carries a per-call `AbortSignal.timeout` so a server that accepts a connection but never responds can't hang the probe (and the cook) past the deadline; timeouts are overridable for tests. Remaining: mode-awareness, integration-oracle gating (where the `ProbeTarget` argv/paths come from = `integration-oracle` #6). +6. `integration-oracle` — **(Half A + Half B seam landed — FE-876)** oracle asserts product reachability via `app-runtime-probe`. Half A (off-seam): `Epic.probe?: ProbeTarget` folds a `runProbe` result into the `verify-epic` verdict — after slices merge into `__epic__//`, the epic is `done` only when tests pass **and** the feature is reachable; `not-reachable` is the FE-800 orphan, `infra` is a harness fault. Probe gated behind tests passing (never boot a known-broken build); absent → unchanged unit verdict; reachability rides the existing `report.passed` routing. Half B seam: host-blind `Epic.reachability?: ReachabilityIntent` (architect-emittable, D160-K) + an injectable `ProbeGrounder` (`createPiActions({ groundProbe })`) that cook-time-resolves intent → concrete `ProbeTarget` by reading the worktree; `verify-epic` resolves via `probe ?? ground(reachability)`, a grounder that throws is an `infra` fault (visible, not a silent pass), intent without a grounder is an inert no-op. **Remaining (dispatch seam, lands atomically with the pi-harness contract):** the production `ProbeGrounder` (an `execute`-mode agent that reads the worktree) + architect emission of `reachability` intent — deferred together so intent is enforced the moment it's emitted (avoids perturbing the 3 reference fixtures). Runs in the FE-738 semantic lane. Promotes FE-800's integration-blind follow-on to a frontier. *(grounder impl depends on `agent-extension-host`)* +7. `brownfield-promotion` — **(landed — FE-877, `promoteBrownfieldRun`)** commit a completed brownfield cook result onto the repo's own `cook/` branch as one reviewable commit; extends FE-827's greenfield promotion to brownfield and closes the cook-codebase-mode follow-on (the result no longer sits uncommitted in the worktree). Git plumbing only (`commit-tree` + CAS `update-ref`, parent = the existing `cook/` base, throwaway index + external work-tree), so the user's active branch, working tree, and index are never touched; gitignored deps don't land. Reuses `promotionSourceDir` to compose the tree across slice layouts. Auto-runs on a completed brownfield cook (no `--out` needed); merging into the working branch stays the **user's** call. Unblocks FE-872's brownfield dep-delta capture. +8. `brunch-ship` — **(landed — FE-878, `brunch serve`)** one-shot `brunch serve ` = `plan ` then `cook --spec=` (cook reads the plan just emitted), no manual steps. Pure glue, no new orchestration: serve's `--out` is the *promote* target → cook (brownfield auto-promotes via FE-877 regardless), `--profile` stamps the plan, petrinaut/policy/retry flags forward to cook, `--verbose` to both; a failed plan short-circuits (nothing cooked). Testable units `parseServeArgs` + `runServe` (stages injected); db/snapshot wiring stays in `cli.ts`. Cook's `dir` is threaded from the resolved launch cwd (the dir the plan was written to) — `runCook` reads `opts.dir` raw, so serve must supply it rather than rely on the `parseCookArgs`-only default (R46). **Closes Arc 1.** + +**Runtime umbrella + semantic substrate:** + 1. `intent-graph-semantics` — highest-coordination semantic substrate after FE-705 reconciliation. 4. `changeset-ledger` — Track 4 of the runtime umbrella; parallel with Track 2; semantic history spine needed before canonical proposal acceptance, direct-edit atomicity, and productized scenario options. 5. `chat-context-provision` — Track 5 of the runtime umbrella recast as transcript-first context; can proceed against chat/turn once secondary-chat entry/anchor shape is settled. @@ -64,6 +80,13 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen ### Horizon +**Full cook orchestrator — Arc 2 (full orchestrator; autonomy ladder, gated behind the semantic/Petri-Phase-3/4 substrate):** + +- `interactive-recovery` — keystone safety rung: on rework-budget exhaustion or irreducible oracle ambiguity, synthesize a question into a `qa`/`strategy` secondary chat; the answer resumes the run. Depends on chat runtime (FE-716, done) + run resume (Petri Phase 4) + `changeset-ledger` (FE-701). Do first — makes the orchestrator safe to run unattended before re-plan/intent-verification are perfect. +- `intent-conformance-oracle` — independent behavioral-kernel verification (requisite variety) separate from self-authored tests. Depends on `intent-graph-semantics` (FE-700) + `BEHAVIORAL_KERNELS.md`; reuses the `graph-review` rubric. +- `adaptive-replan` — architect amends the plan from execution feedback; recompile the affected sub-net + resume. Depends on Petri Phase 3 (`petri-graph-compilation`) + Phase 4 (`petri-simulation-oracle`) + FE-738's deferred stale-graph criterion. The latent `architect-generator-loop`; highest cost, last rung. +- `transformation-orchestrator` — separate product line for non-additive work (refactors, migrations, cross-cutting renames, debugging): transformation-shaped intent (transform existing→existing, behavior-preserving, test-guarded), not `requirement → additive slice`. Do not fold into Arc 1/2. + - `petri-graph-compilation` — compile Petri nets from workspace plan-graph + relation policy; depends on `intent-graph-semantics` (FE-700). Extends the existing FE-700 relation-policy registry. - `petri-simulation-oracle` — reachability analysis, deadlock detection, resume from durable markings. Planning oracle for plan-shape defects. Depends on `petri-graph-compilation`. - `parallel-merge-conflict-reconciliation` — LLM-assisted reconciliation of real content collisions in the parallel-greenfield whole-plan merge (two slices, same path, different content), replacing deterministic order-wins. Must be gated: LLM proposes → mandatory post-merge whole-plan verify (tests are the oracle) → repair or refuse. Reintroduces non-determinism at the assembly point, so it fights the FE-813 harness-fidelity direction (D161-K) and needs the verify gate to be trustworthy. Depends on `cook-greenfield-single-tree` (FE-827) whole-plan merge + a whole-plan verify step. @@ -171,11 +194,24 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen - **Spike on 2026-05-26 confirmed the hybrid path is technically viable.** `@ai-hero/sandcastle` (v0.5.12) exposes `createWorktree({ branchStrategy: 'merge-to-head' })` decoupled from agent invocation, exports a built-in `pi` agent provider, and supports `noSandbox()` (no Docker requirement). The hybrid v2 path (sandcastle worktree + sandcastle pi provider) would eliminate brunch's `pi-actions.ts spawnSync` boilerplate and retire `epic-sandbox-merge.ts`'s file-copy over-copy problem via git branch-merge. - **Why deferred now:** Too many integration issues at this stage — sandcastle is pre-1.0 (v0.5.12), pulls in Effect/effect-platform as runtime deps (~300KB), would require migrating brunch's Petri orchestrator to compose with sandcastle's worktree lifecycle, and locks in sandcastle's branch-naming + close-merge semantics. Premature adoption risks coupling brunch to an evolving upstream API before brunch's own brownfield needs are settled. - **Triggering criteria to revisit:** (a) sandcastle ships 1.0 with stable API; OR (b) brunch's native epic-merge over-copy becomes a measurable performance bottleneck; OR (c) brunch needs container-isolation paths (Docker/Vercel) for security or remote-execution reasons; OR (d) Effect-based runtime dependency becomes attractive for unrelated reasons. None of these are true today. -- **Acceptance:** (1) `brunch cook ` with `/.brunch/cook/plan.yaml` no longer exits with "not yet implemented." (2) Top-level sandbox worktree initialized via `git worktree add` of cwd repo on branch `cook/`. (3) Per-slice worktrees branch off the run-level branch. (4) Slices execute against pre-populated worktrees; `pi-actions.ts` unchanged — pi-tools operate on existing code. (5) Source branch in `` is byte-identical before and after a cook run (success or failure). (6) Cook runs leave a discoverable artifact (the `cook/` branch) for the user to review or discard. (7) Greenfield fixture-mode behavior is unchanged (empty worktree, generate-from-scratch); only the run output path moves from `/.cook/runs/` to `/.brunch/cook/runs/` per the SPEC §D50 / §A49 consolidation. All affected tests and fixture paths are updated. (8) `epic-sandbox-merge.ts` continues to work — over-copy accepted as a known follow-on optimization, flagged in code comments. +- **Acceptance:** (1) `brunch cook ` with `/.brunch/cook/plan.yaml` no longer exits with "not yet implemented." (2) Top-level sandbox worktree initialized via `git worktree add` of cwd repo on branch `cook/`. (3) Per-slice worktrees branch off the run-level branch. (4) Slices execute against pre-populated worktrees; `pi-actions.ts` unchanged — pi-tools operate on existing code. (5) Source branch in `` is byte-identical before and after a cook run (success or failure). (6) Cook runs leave a discoverable artifact (the `cook/` branch) for the user to review or discard. (7) Greenfield fixture-mode behavior is unchanged (empty worktree, generate-from-scratch); only the run output path moves from `/.cook/runs/` to `/.brunch/cook/runs/` per the SPEC §D50 / §A49 consolidation. All affected tests and fixture paths are updated. (8) `epic-sandbox-merge.ts` continues to work — over-copy accepted as a known follow-on optimization, flagged in code comments. **(Optimization later closed by `cook-worktree-laziness` / FE-879.)** - **Verification:** `brownfield-smoke.integration.test.ts` constructs a seeded git repo in tmpdir at test setup (NOT committed under `fixtures/` — nested `.git/` creates submodule weirdness), authors a `.brunch/cook/plan.yaml` carrying one slice that modifies an existing file, runs engine.run with fake actions, asserts (a) source branch unchanged, (b) modification landed in the slice worktree, (c) parent worktree is on `cook/`. CLI unit tests pin `resolveCookMode` + clean-tree gate. `worktree.test.ts` + `epic-sandbox-merge.test.ts` pin the codebase-mode seam components. Existing greenfield tests untouched. - **Traceability:** SPEC §D50 (reserved codebase-mode resolver); §A49 (worktree isolation at `/.brunch/cook/runs//worktree/`); Requirement 49. - **Design docs:** SPEC §D50 + §A49; `docs/next/architecture/plan-graph-petri-orchestration.md` (worktree section). +### cook-worktree-laziness + +- **Name:** Cook worktree laziness — lazy per-slice provisioning + shared `node_modules` for brownfield cook +- **Linear:** FE-879 +- **Kind:** hardening (refinement on `cook-codebase-mode`) +- **Status:** done — branch-complete on `ka/fe-879-lazy-cook-worktrees` (PR #223, stacked on FE-864) +- **Objective:** Stop brownfield cook paying an eager, all-slices startup tax. In codebase mode `wireHandlers` provisioned every slice's git worktree up front (N × `git worktree add` + N recursive `node_modules` CoW copies) before any slice fired — for an 8-slice plan, 9 worktrees and 9 `node_modules` recursions, the copy dominating wall-clock. Make provisioning lazy and the dependency tree shared, without changing what cook produces. +- **What landed:** (1) slice-worktree creation moved out of the eager `wireHandlers` loop into `resolveSliceCwd`, materialized on first fire via idempotent `ensureSliceWorktree` — a run touching 2 of 8 slices pays for 2 worktrees, not 8; rework re-fires are no-ops; synchronous (`execFileSync`) provisioning serializes concurrent fires on the JS thread, so parallel-policy worktree adds never overlap. (2) each slice symlinks `node_modules` to the parent worktree's single copy instead of CoW-copying per slice (`SHAREABLE_TOP_LEVEL_ENTRIES` in `cow-copy.ts`); `walkFiles` already skips symlinks, so the shared tree is never re-walked during dependency seeding, merge, or promotion. Other gitignored dirs (`dist/`) still copy per slice. +- **Acceptance:** (1) ✅ codebase-mode slices provisioned lazily at fire time, not eagerly in `wireHandlers`. (2) ✅ `ensureSliceWorktree` idempotent across reworks. (3) ✅ slice `node_modules` is a symlink to the parent, not a copy; other gitignored content still copies. (4) ✅ correctness-neutral — same worktrees/branches, deps resolve through the symlink; brownfield smoke + engine-contract suites unchanged. +- **Risk:** build caches under `node_modules/` (`.cache`, `.vite`) become shared across parallel slices — acceptable for cook's transient runs; revisit if a toolchain needs per-slice write isolation (documented at the call site). +- **Verification:** `npm run verify` green; new `epic-sandbox-merge.test.ts` cases pin the `node_modules` symlink, per-slice copy of other content, and `ensureSliceWorktree` idempotency; brownfield integration smoke unchanged. +- **Traceability:** closes `cook-codebase-mode` acceptance (8) over-copy optimization + sandcastle trigger (b). Refinement on `cook-codebase-mode`; stacked on `orchestrator-enhancements` (FE-864) — closer to the cook engine it touches, and independent of the brownfield-promotion/serve work above it. + ### cook-harness-fidelity - **Name:** Cook harness fidelity — a trustworthy per-slice completion signal @@ -364,6 +400,151 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen - **Traceability:** Requirements 46–50; A98, D160-K, D164-K (pattern), D167-K; refines I130-K (resolved profile persisted; strict-on-unknown). New assumption on build: agent-side install suffices for node profiles. Refinement on `plan-build-architect` (FE-829). - **Design docs:** `docs/design/orchestrator.md`; SPEC §Future Direction Cook plan generation. +### agent-extension-host + +- **Name:** Agent extension host — dual-mode (`elicit`/`execute`) pi-harness contract +- **Linear:** FE-867 · branch `ka/fe-867-agent-extension-host` (under FE-864) · coordinated with the unpublished pi-harness thread +- **Kind:** structural (shared contract / serialization point) +- **Status:** in-progress (2026-06-15) — slice 1 landed (PR #213): contract-first, zero-runtime-migration. `src/agent-extension-host.ts` defines the mode-neutral core contract (mode / capability / plugin / consumer-witness, metadata-only, no imports, no `execute`-only concept); `src/agent-extension-host.test.ts` proves it against both real consumers — cook (`createPiActions()` action ids) and the interview (`createExplorationTools` + a type-enforced coverage check over `keyof InterviewerTools`) — as the `elicit` witness, without migrating the interview runtime (it keeps the Vercel AI SDK). The contract both this work and the unpublished pi-harness thread target; the pi thread owns the core *runtime* implementation. Deferred to later slices (only when a real driver lands): a runtime host/dispatch, a pi adapter for cook, growing `src/agent-extension-host/` as a private sub-tree. +- **Objective:** Categorize the pi harness as a **dual-mode agent-extension host**: a mode-agnostic **core** (session lifecycle [FE-841 in-process pi], dispatch interface, tool-scoping [FE-813 `toolsForAction`], confinement [FE-853], cwd/env/model policy) + two **modes** — `elicit` (drives specification: interviewer, observer, LLM-as-user probe [FE-705]) and `execute` (drives cook: test-writer, code-writer, evaluate-done, verify-epic, + new wiring / recovery-question / replan agents) — + **shared plugins** (context provision, tool adapters, dispatch-recipe format, model policy). Modes differ only by which plugins they load; capabilities register against the core via a stable plugin/dispatch contract. New cook capabilities are `execute`-mode plugins, never bespoke `pi` calls. +- **Why now / unlocks:** The pi harness is reused across spec elicitation and cook execution; without a shared host the modes duplicate dispatch/confinement/tool-scoping and the cook frontiers hardcode `pi`. This is the serialization point with the unpublished pi-harness thread — targeting the contract, not `pi`, keeps the dispatch-seam frontiers decoupled from that thread's rewrite. +- **Abstracted-enough bar (acceptance):** (1) **mode-neutral core** — the core module carries no `execute`-only concept (no worktree/slice/test-runner/`plan.yaml` types); checkable assertion. (2) **two-consumer proof** — the core is validated against ≥2 real consumers: cook (`execute`) and the **existing interview** as the `elicit` witness; if the interview can't sit on the core, it isn't neutral. (3) **open plugin seam** — capabilities register per mode; `elicit`-mode plugins are explicitly out of scope here and their absence does not break the core. (4) **no gold-plating** — the core is no richer than those two consumers justify; primitives serving neither are dropped (no speculative `elicit` features). +- **Out of scope:** the `elicit`-mode plugin implementations (interview internals, FE-705, the unpublished pi-thread work). This frontier owns the core + contract, validated against the interview as a witness — not the `elicit` roadmap. +- **Verification:** mode-neutrality test (core imports no `execute`-only modules); two-consumer compile/dispatch tests (a cook plugin + an existing-interview plugin both run on the core); plugin-registration contract tests; a "no orphan primitive" review gate. +- **Depends on:** FE-841 (in-process pi core), FE-813 (tool-scoping), FE-853 (confinement); coordinated with the unpublished pi-harness thread (owns the core). Sits over the FE-841 core and **bases the Arc-1 linear stack** (2026-06-15 decision) — the whole cook stack lands on it. Logically gates `integration-oracle`, `interactive-recovery`, `adaptive-replan`; the base placement extends that to a stack-order serialization of all of Arc 1 behind the pi-harness-thread coordination. +- **Traceability:** Requirements 46–50; abstract-dispatch-interface coordination note; orchestrator/harness lexicon. +- **Design docs:** `docs/design/orchestrator.md`; `docs/design/AGENT_MUTATION_SURFACE.md`; `docs/design/SUBSTRATE_STRANGLER_COORDINATION.md`. + +### brunch-detect + +- **Name:** Brunch toolchain detection — read the project toolchain from the repo +- **Linear:** FE-871 · branch `ka/fe-871-brunch-detect` (stacked on FE-867) +- **Kind:** bounded feature +- **Status:** done (FE-871). Slice 1 — `detectProfile(repoDir)` / `project-detect.ts`: a pure, evidence-first detector mapping manifests/lockfiles to a registry `ProfileId` (bun lockfile → bun; deno config → deno; `package.json` vitest/jest/none → node-vitest/node-jest/node-test). One clear supported signal resolves; ambiguous evidence (both vitest **and** jest declared) and any repo with no JS/TS evidence return a loud `{detected:false, reason}` via one catch-all rather than silently defaulting to bun — the cheap "which lockfile is present" check, not a language-detection engine (no per-stack Python/Go branches; the catch-all message is already actionable). Slice 2 — `detected` is wired into the `plan-emitter` selection chain as the brownfield front (`flag ≫ detected (brownfield) ≫ spec ≫ architect-classified ≫ bun`) via `resolveEmittedProfile`; a loud detection failure throws rather than silently falling to bun (falling through to an explicit spec/architect choice first). Greenfield (or brownfield without a `repoDir`) keeps the unchanged FE-843 chain — the greenfield no-op. `repoDir` threads CLI launch cwd → `runPlan` → `emitPlanFromSnapshot`; an injectable `detect` seam keeps the emitter tests hermetic. Slice 3 — `detectTestDir(repoDir)` co-locates generated tests where the brownfield repo already keeps its own: detection picks the *runner* (profile), this picks the *path*. A profile's default test directory (`tests/{id}.test.ts`) can fall outside a host repo whose vitest `include` is narrowed (e.g. `src/**`), so the chosen path is unrunnable — vitest reports "No test files found" for an explicitly-named file (observed in a real brownfield cook). Rather than parse the runner's executable-TS config, it samples existing `*.test.*`/`*.spec.*` files (zero-dep bounded `fs` walk, skipping `node_modules`/build dirs) and returns the dominant directory; `withTestDir(toolchain, dir)` relocates the targets while preserving the filename convention. Brownfield-only; `null` (no existing tests) keeps the profile default; greenfield never relocates. Slice 4 — monorepo hardening: `detectTestDir` returns the dominant *full* directory (not just the top segment) so a package-rooted include glob still covers the path; `detectProfile` widens runner detection to declared workspace packages (npm/yarn `workspaces`, pnpm `pnpm-workspace.yaml`; literal + single-level `dir/*` globs) **only when the root declares no runner**, scoped to declared workspaces so a stray nested project (docs prototype, example app) can't poison detection — a root runner still wins without scanning, and workspaces collectively declaring both vitest+jest stays loudly ambiguous. Stacked on `agent-extension-host`. +- **Objective:** Resolve a registry `ProfileId` at **plan time** from the repo's manifest/lockfile evidence — the cheap "which lockfile/manifest is present" check, mapping only to ids already in the FE-843 registry. It is **not** a language-detection engine: anything without a single clear supported signal (ambiguous JS runners, or non-JS stacks like Python/Go) returns a loud `{detected:false}` reason via one actionable catch-all, never a guessed profile. Brownfield-only front of the selection chain (`flag ≫ detected ≫ spec ≫ architect ≫ bun`); the resolved id is stamped into `plan.yaml` so `brunch cook` runs the same toolchain. Greenfield never detects (empty worktree). Resolves toolchain **identity** only — real file paths / existing wiring / `writes` reconciliation is cook-time agent grounding, out of scope here. +- **Why now / unlocks:** The "no manual steps" goal requires reading the real toolchain rather than inferring from spec prose or a `--profile` flag — and it must happen at plan time, because the deterministic test runner reads the stamped `plan.profile` with **no agent in the loop** (`cook-cli.ts`, `pi-actions.ts`), so a wrong default runs the wrong test command with no diagnostic. The cook agent's `read`/`bash` cannot substitute. FE-843 built the registry but deferred detection; this closes that gap. +- **Acceptance:** (1) detection maps a real repo to a registry profile id from manifest/lockfile evidence *(slice 1, done)*; (2) brownfield cook/plan resolves toolchain via detection at the front of the FE-843 chain (`--profile` still overrides) *(slice 2)*; (3) greenfield resolution is unchanged (no detection input); (4) ambiguous/unknown repo fails with an actionable message, not a silent default *(slice 1, done)*; (5) the 3 reference fixtures + greenfield smoke score identically before/after. +- **Verification:** detector unit tests *(slice 1, done — per-stack fixtures + loud `{detected:false}`)*; slice 2: resolution-chain precedence tests (detect vs flag vs spec) + greenfield no-op / before-after-identical test; slice 3: `detectTestDir` clustering/skip/null tests + `withTestDir` relocation tests + emitter tests asserting brownfield targets follow the detected dir while greenfield keeps the profile default; slice 4: full-dir/monorepo `detectTestDir` tests + workspace runner-detection tests (npm/yarn/pnpm, root-wins, literal dir, cross-workspace ambiguity). +- **Depends on:** `toolchain-profile-expansion` (FE-843). +- **Traceability:** Requirements 46–50; refines I130-K; greenfield-protecting invariant (new — record in SPEC via ln-sync). **D160-K boundary:** detection is plan-time profile-*id* resolution (an input to authoring), not architect host-introspection — D160-K constrains the architect/authoring stage, not profile resolution, so `brunch-detect` needs no D160-K amendment. +- **Design docs:** `docs/design/orchestrator.md`. + +### harness-dep-install + +- **Name:** Dependency-delta capture + install-failure classification +- **Linear:** FE-872 · branch `ka/fe-872-dep-install-classification` (stacked on FE-871) +- **Kind:** bounded feature +- **Status:** acceptance 1 done (FE-872) — classify + react. **Slice 1 (classify):** `TestResult` gains a `failureKind?: 'infra' | 'test'` discriminant (`types.ts`); `ToolchainTestRunner.run` classifies a failed run via `classifyTestFailure` (`test-runner.ts`) — **conservative**: only an unambiguous "the runner itself isn't there" signal (spawn `ENOENT`, or a shell `command not found` / `is not recognized`) is `infra`; everything else is `test`, because a missing *module* is ambiguous with a legitimate TDD red and mislabeling a real failure as infra would silently skip it. The `tests-run` net report surfaces an aggregate `failureKind` (infra dominates) so consumers don't rescan `results` (`net-compiler.ts`). **Slice 2 (react):** an exhausted run whose tests never executed now halts with a `toolchain/install failure` reason instead of the misdirecting `retry exhaustion` (`net-compiler.ts`). Deliberately **not** a bespoke re-install net arc — the loop already loops back and the agent re-installs natively via `bash` on its next turn; the harness only needs the honest terminal cause. **Slice 3 (greenfield dep capture):** the manifest + lockfile the agent produced are now pinned as a promotion invariant via `git ls-files` (`promote-run.test.ts`) — `promoteGreenfieldRun`'s blanket copy already lands them; this turns that incidental behavior into an asserted, reproducible-tree guarantee. **Remaining:** brownfield dep-delta capture over the CoW baseline is **blocked on `brownfield-promotion`** (no brownfield promote path exists yet). Reframed: the install *action* is agent-native (cook write-actions carry `bash`; FE-843 `testConventions` already inject per-profile install/scaffold prose per A98), so this is **not** an install verb — it owns only the two things the agent's bash install does not give for free. +- **Objective:** The cook agent already adds + installs deps via its `bash` tool, driven by FE-843's per-profile `testConventions` (A98) — no install verb or abstraction is introduced. This frontier closes the two not-free parts: (a) **dependency-delta capture** — the lockfile/manifest changes the agent produces are captured onto the promotion path (greenfield asserted, not incidental to `promote-run`'s blanket copy; brownfield captures the *delta* over the CoW-copied baseline, not the whole tree); (b) **install/infra-failure classification** — a failure-kind discriminant so a failed `npm install` is distinguishable from a test failure (the FE-843-deferred fail/infra split). +- **Why now / unlocks:** Today `TestResult` is a single `{passed}` boolean (`types.ts`) and `evaluateVerificationTargets` collapses any thrown/failed run to `passed:false` — so a failed install looks identical to a logic bug and sends the code-writer to "fix the code" while the toolchain never installed, burning cook-loop iterations. And without deliberate lockfile capture the promoted tree isn't reproducible. `app-runtime-probe` / `integration-oracle` depend on deps being **present and reproducible in the promoted tree** — i.e. on (a) — not on an install verb. +- **Acceptance:** (1) install/setup failure is a *distinct* outcome from test failure — the runner outcome type carries a failure-kind discriminant (`infra` vs `test`) and the cook loop / run report react accordingly; (2) lockfile / dependency-manifest changes the agent makes are captured on promotion — greenfield asserted (not incidental), brownfield as the delta over the CoW baseline; (3) install stays worktree-scoped, never the user's checkout (assert). *(The install action itself + greenfield scaffold-from-scratch are FE-843/A98 agent-native behavior, not acceptance criteria here.)* +- **Verification:** failure-classification unit tests on the runner outcome (install/infra vs test); lockfile / dep-delta capture tests on the promotion path (greenfield + brownfield-delta); worktree-scoped-install assertion test. +- **Depends on:** `brunch-detect` (profile), `cook-codebase-mode` (worktree). Upstream of `app-runtime-probe`, `integration-oracle`. +- **Traceability:** Requirements 46–50; A98 (cook agent scaffolds + installs — the agent-native install this frontier relies on, not re-builds); absorbs the FE-843-deferred fail/infra test-outcome split. +- **Design docs:** `docs/design/orchestrator.md`. + +### app-runtime-probe + +- **Name:** App runtime probe — build, boot, and exercise the host app +- **Linear:** FE-875 · branch `ka/fe-875-app-runtime-probe` (stacked on FE-872) +- **Kind:** structural +- **Status:** slice 1 landed (FE-875) — `runProbe(spec, sandboxDir)` / `app-probe.ts` + `ProbeSpec`/`ProbeResult`/`ProbeOutcomeKind` (`types.ts`): boots an app, polls readiness, probes one HTTP feature endpoint, classifies `reachable` (<400) / `not-reachable` (booted but endpoint absent/erroring — the orphan) / `infra` (never booted), and always tears the boot process down (SIGTERM→SIGKILL). The *app-execution* analogue of `test-runner.ts`, with the infra/feature split mirroring FE-872's infra/test. **Design decision:** the boot argv + URLs are `ProbeSpec` **inputs** (cook-time grounding supplies them later), not a per-stack boot engine — the harness owns only the deterministic, read-only *check*; boot mechanics may lean on agent `bash` (honors the boundary below). Tested against real seeded `node:http` apps (reachable / orphan-404 / boot-fail / missing-binary / teardown). **Remaining slices:** mode-awareness (#4 brownfield real host vs greenfield self-composed epic), integration-oracle gating (#3), and where the `ProbeSpec` comes from (architect wiring intent + cook grounding). Prior spike verdict (2026-06-16): boot over the wire is **feasible** (a `node:http` entry on `listen(0)` answered `http.get`), but boot carried per-stack friction (the agent hand-rolled a `.js→.ts` resolve hook); dep-carrying boot still unproven (spike app was zero-dep). +- **Objective:** Provide a harness that builds the host application, boots it, and exercises the cooked feature to confirm it is actually reachable in the running app — not merely unit-test-green. Mechanism beyond the test runner: app-boot + a runtime probe (dev-server boot + HTTP/CDP/Playwright-style check), toolchain-derived from the `ProjectProfile`. Mode-aware: brownfield boots the real host; greenfield boots the self-composed epic. +- **Agent-native action vs harness-owned verification:** the frontier's value is the **independent, deterministic assertion** the cook agent cannot shortcut or self-report — not the boot action (the agent already has `bash` and can start a dev server / curl it). FE-800's orphan problem is precisely that the agent's self-report can't be trusted, so what this frontier owns is a read-only probe result outside the agent's authorship (the same discipline that keeps `evaluate-done` read-only at `pi-actions.ts:70`). The **boot mechanics may lean on agent `bash`** (start dev server, hit an endpoint) rather than a bespoke per-stack boot engine; the deterministic, unshortcuttable *check* of the result is the part the harness must own. +- **Why now / unlocks:** `integration-oracle` asserts "feature reachable in the running app," but verification today only runs the test runner in the worktree. Without an app-boot probe, "reachable" degrades to "a test imports the module" and the orphan problem (FE-800) survives. This is the load-bearing reachability mechanism; `integration-oracle` depends on it. The hidden heavy lift inside Arc 1 — validate the mechanism with `dogfood-spike` before committing. +- **Acceptance:** (1) the probe builds + boots the host app from the worktree using the resolved toolchain; (2) it exercises the cooked feature and returns a structured reachable / not-reachable result; (3) the probe result is the evidence `integration-oracle` gates on; (4) brownfield boots the real host, greenfield boots the self-composed epic; (5) infra failure (build/boot broke) is distinguishable from feature-absent (not reachable). +- **Verification:** probe-harness integration test (seeded app + cooked feature → reachable); orphan-replay test (feature module present but unwired → not-reachable, replaying the `spatial_graph_layout` regression); toolchain-derived boot tests; infra-failure-vs-not-reachable split test. +- **Depends on:** `cook-codebase-mode` (done), `brunch-detect`, `harness-dep-install` (boot needs deps). Upstream of `integration-oracle`. Scoped after `dogfood-spike`. +- **Traceability:** Requirements 46–50; FE-800 integration-blind follow-on; complements FE-813 (real *test* execution) by adding real *app* execution. +- **Design docs:** `docs/design/orchestrator.md`; `docs/praxis/dev-server-logs.md`; `docs/praxis/manual-testing.md`. + +### integration-oracle + +- **Name:** Integration oracle — host wiring + product reachability +- **Linear:** unassigned (create on start) +- **Kind:** structural +- **Status:** not-started (drafted 2026-06-15) — Arc 1; promotes the FE-800 integration-blind follow-on to a frontier. +- **Objective:** Make a cooked feature real and reachable in the host, not orphaned. Three parts: (a) the architect emits a **generic integration/wiring slice** ("wire feature into host") rather than only FE-829's per-epic integration-*test* seam; (b) **cook-time grounding** — the cook agent resolves the real wiring by reading the worktree (no host introspection at plan time, D160-K intact); (c) an **integration oracle** in the FE-738 semantic lane asserts product reachability **via `app-runtime-probe`** (build + boot + exercise the host app — not merely test-runner-green) — brownfield: feature exists/reachable in the running app; greenfield: the epic self-composes (the `__epic__` merge + integration test). Reachability definition forks on `plan.mode`. +- **Why now / unlocks:** The first brownfield cook produced orphan modules that passed criteria without existing in the running app (FE-800 follow-on, 2026-06-04). Reachability is the external reality check that turns "executes a plan" into "ships a feature." Builds on harness fidelity (FE-813 — the harness actually runs the targets) and FE-829 integration seams. **`dogfood-spike` (2026-06-16) sharpened the framing:** given an `integration-test` target + reachability-demanding criterion prose, the cook agent *did* self-author a genuine boot-and-probe test and wired the feature reachable — the orphan did not reproduce. But nothing **forced** it; reachability was agent-discretion. So this frontier's job is precisely to make reachability **enforced and independent of agent-authored tests**, not to hope the agent stays honest. +- **Agent-native action vs harness-owned verification:** the wiring *action* (part b) is agent-native — the cook agent reads the worktree and edits the wiring itself; the frontier does **not** build a wiring engine. What it owns is part (c): an **oracle the agent cannot author or shortcut**, asserting product reachability via `app-runtime-probe`'s independent result. The orphan problem is unsolvable by self-report, so the oracle's value is its independence (same read-only discipline as `evaluate-done`, `pi-actions.ts:70`), not the doing. +- **Cook-time grounding decision (settled 2026-06-15):** planning stays host-blind; the cook agent grounds against the real repo. This **softens FE-829 slice-4A `writes` single-writer ownership to *advisory in brownfield only*** (agent reconciles paths against the real layout); greenfield keeps `writes` authoritative (parallel race-safety + eval gate depend on it). Needs a **D160-K amendment + a new grounding decision** recorded in SPEC via ln-sync. +- **Acceptance:** (1) architect emits a generic wiring slice for feature epics; (2) cook agent resolves real wiring by reading the worktree; (3) integration oracle gates completion on product reachability, mode-forked (brownfield reachable-in-app / greenfield self-compose); (4) the brownfield orphan-module regression (`spatial_graph_layout`) is caught; (5) greenfield behavior unchanged — 3 reference fixtures + greenfield smoke score identically; (6) `writes` advisory in brownfield, authoritative in greenfield (contract forks on `plan.mode`); (7) the wiring agent is an `execute`-mode plugin on `agent-extension-host`, not a bespoke `pi` call. +- **Verification:** brownfield smoke asserting reachability (feature present in running app), replaying the orphan regression; greenfield self-compose oracle tests; mode-fork contract tests on `writes`/`checkPlan`; semantic-lane oracle adapter tests. +- **Depends on:** `cook-harness-fidelity` (FE-813, done), `plan-build-architect` (FE-829), `brunch-detect`, `harness-dep-install`, `app-runtime-probe` (the reachability mechanism), `agent-extension-host` (wiring agent = `execute`-mode plugin). Upstream of `brownfield-promotion`. +- **Traceability:** Requirements 46–50; D160-K (amendment pending), D161-K, D167-K, A98; FE-800 integration-blind follow-on; greenfield-protecting invariant (new). +- **Design docs:** `docs/design/orchestrator.md`; `docs/next/architecture/plan-graph-petri-orchestration.md` (semantic lane). + +### brownfield-promotion + +- **Name:** Brownfield output promotion — glue the cook result into the checkout +- **Linear:** unassigned (create on start) +- **Kind:** structural +- **Status:** not-started (drafted 2026-06-15) — Arc 1; promotes the cook-codebase-mode promotion follow-on to a frontier. +- **Objective:** Commit/merge a completed brownfield cook run into the user's checkout. Today slice branches (`cook-slice//`) commit but never merge: `cook/` HEAD === source HEAD with modifications in untracked subdirs, so there is no promotion path. Close it: commit slice work → merge slice→epic→`cook/` → merge `cook/` into the working branch (completed-gated, never silent), mirroring FE-827's greenfield `promote-run.ts`. Pairs with worktree/branch GC. +- **Why now / unlocks:** "Glue back to the original code" in the literal git sense. Greenfield promotion landed (FE-827, D166-K); brownfield is the open follow-on. Without it a brownfield cook runs but can't deliver. +- **Acceptance:** (1) completed brownfield run promotes into the working branch via `git merge` (never silent, completed-gated, `--out`/`--force` parity with greenfield); (2) source branch byte-identical until explicit promotion (cook-codebase-mode invariant preserved); (3) collisions reported, not silently overwritten; (4) greenfield promotion path unchanged. +- **Verification:** brownfield promotion integration test (seeded git repo → cook run → promote → assert merge into working branch); source-unchanged-until-promote test; collision-report test. +- **Depends on:** `cook-codebase-mode` (done), `cook-greenfield-single-tree` (FE-827, done), `integration-oracle`. +- **Traceability:** Requirement 49; D166-K (extend to brownfield), A49; cook-codebase-mode promotion follow-on. +- **Design docs:** `docs/design/orchestrator.md`; SPEC §A49. + +### brunch-ship + +- **Name:** Brunch ship — one-shot autonomous spec→feature wrapper +- **Linear:** unassigned (create on start) +- **Kind:** bounded feature +- **Status:** not-started (drafted 2026-06-15) — Arc 1 capstone. +- **Objective:** A single `brunch serve ` command running prep → recipe → cook → taste → plate end-to-end with no manual steps, reading `plan.mode` (FE-826) to pick greenfield vs brownfield resolution. The plan stays a reviewable artifact but requires no manual authoring/approval in ship mode. +- **Why now / unlocks:** Closes the "no manual steps" goal by composing the Arc 1 frontiers into one autonomous flow. +- **Acceptance:** (1) `brunch serve ` runs the full chain unattended; (2) mode-correct resolution via `plan.mode`; (3) failure surfaces a coherent halt (graceful — full recovery is Arc 2 `interactive-recovery`); (4) greenfield and brownfield both supported. +- **Verification:** end-to-end integration (greenfield fixture + brownfield seeded repo) asserting a promoted artifact; mode-routing tests; halt-surfacing test. +- **Depends on:** `brunch-detect`, `integration-oracle`, `brownfield-promotion`; `cook-mode-from-spec` (FE-826, done). +- **Traceability:** Requirements 46–50. +- **Design docs:** `docs/design/orchestrator.md`. +- **Presentation seam (sub-slices, ride under FE-878 — no separate issue/branch):** the `serve`/`cook`/`plan` CLI grows a full-screen Ink TUI (brunch wordmark header in the brand gradient, kitchen-brigade phase tracker, live activity panel). Design (ln-design 2026-06-16): a thin `emit(CookEvent)` boundary → pure `reduce(events)→RunState` → PendingActivity-centric Ink presenter; `selectPresenter` picks `ink`/`plain`/`silent` by env; `reports.jsonl` stays the durable medium (CookEvent is ephemeral). The brigade names stay phase labels, not commands. Oracle per **SPEC I136-K**. **Slices 1a + 1b (done)** — seam foundation (`presenter.ts` + `presenter/`) + CLI wiring; both `plan` and `cook` surfaces migrated to `emit(CookEvent)`, elapsed timer moved to a presenter-owned **injected clock** (seeded by `cook-start`); `pi-actions` console-free; verified end-to-end. **Slice 2a (done)** — Ink presenter is real (brunch wordmark header, monotonic brigade tracker, bounded activity log, renders to stderr); shared `format.ts`/`clock.ts` + `RunStore` + pure `nextPhase`; verified via reduce/store units + ink-testing-library frames + the server build. **Slice 2b (done)** — the dead-air fix: `activity-start/progress/end` events bracket the four waits (pi sessions self-bracket in `runPi` with a KB heartbeat; test-run/probe via `withActivity`; promotion in `cook-cli`), a `pending` map in `RunStore`, and an Ink `PendingPanel` (live spinner + label + elapsed + detail). The presentation seam is complete. Residual: spinner freezes during the synchronous `spawnSync` test run; real-terminal walkthrough is outer-loop debt. + +### interactive-recovery + +- **Name:** Interactive recovery — halt into an answerable question that resumes the run +- **Linear:** unassigned +- **Kind:** structural +- **Status:** horizon (Arc 2 keystone) — gated on run resume. +- **Objective:** When a slice exhausts its rework budget or an oracle rejects on irreducible ambiguity, synthesize a coherent question (what's blocking, options) and land it as a turn in a `qa`/`strategy` secondary chat; the user's answer resumes the run from durable markings. Makes unattended failure graceful (ask, don't orphan or ship-wrong) and fuses the interview and execution substrates into one loop. The asking reuses the existing `elicit`-mode / secondary-chat substrate (FE-716) — not a new Q&A channel; the load-bearing new work is **resume from durable markings** (Petri Phase 4). +- **Why now / unlocks:** The graceful-degradation layer that makes the orchestrator safe to run unattended even before re-plan and intent-verification are perfect. Highest value-per-cost Arc 2 rung; do first. +- **Acceptance:** (1) budget-exhaustion / irreducible-ambiguity halt emits a structured question, not just a halt reason; (2) the question renders in a secondary chat the user can answer; (3) the answer resumes the run from durable markings; (4) a durable record links question→answer→resumed run; (5) the question agent is an `execute`-mode plugin on `agent-extension-host`. +- **Verification:** halt-to-question synthesis tests; secondary-chat rendering/answer tests; resume-from-marking integration test; durable linkage test. +- **Depends on:** `chat-runtime-secondary-chats` (FE-716, done), Petri Phase 4 run resume (`petri-simulation-oracle`), `changeset-ledger` (FE-701) for durable answers, `agent-extension-host` (question agent = `execute`-mode plugin). +- **Traceability:** Requirement 45 (chat surface), Requirements 46–50 (execution); FE-819 halt visibility; D161-K. +- **Design docs:** `docs/design/orchestrator.md`; `docs/design/CONVERSATIONAL_WORKSPACE_RUNTIME.md`. + +### intent-conformance-oracle + +- **Name:** Intent-conformance oracle — independent behavioral-kernel verification +- **Linear:** unassigned +- **Kind:** structural +- **Status:** horizon (Arc 2) — gated on FE-700. +- **Objective:** Verify a built feature against intent with requisite variety — independent of the agent's self-authored tests. The spec carries **behavioral kernels** (contrastive input→expected-behavior pairs produced by the interview, never seen by the build agent); a semantic-lane oracle runs them against the built feature. Reachability (integration oracle) + intent (kernel oracle) + real execution (FE-813) together give requisite variety. +- **Why now / unlocks:** "Done" currently means self-authored tests pass — no variety against intent, so an underspecified spec ships wrong work with green checks. Closes the spec-level verification gap. +- **Acceptance:** (1) behavioral kernels are first-class spec material (from FE-700); (2) a kernel oracle runs them against the built feature in the semantic lane, separate from self-authored tests; (3) completion requires kernel conformance + reachability + real test execution; (4) kernel failures surface as actionable findings. +- **Verification:** kernel-oracle adapter tests; end-to-end where self-authored tests pass but a kernel fails (proves independence); reuse of graph-review rubric dimensions. +- **Depends on:** `intent-graph-semantics` (FE-700), `BEHAVIORAL_KERNELS.md`; reuses `graph-review-scenario-options` (FE-702) rubric; complements `integration-oracle`. +- **Traceability:** Requirements 38, 46–50; A77, A78 (semantics); ln-oracles requisite variety. +- **Design docs:** `docs/design/BEHAVIORAL_KERNELS.md`; `docs/design/INTENT_GRAPH_SEMANTICS.md`; `docs/design/orchestrator.md`. + +### adaptive-replan + +- **Name:** Adaptive re-plan — amend the plan from execution feedback +- **Linear:** unassigned +- **Kind:** structural +- **Status:** horizon (Arc 2, highest cost) — gated on Petri Phase 3 + Phase 4. +- **Objective:** When execution reveals the plan is wrong (missing dep, absent integration point, wrong scope), re-invoke the architect with execution feedback + world state to amend the plan, recompile the affected sub-net, and resume — instead of retrying the same frozen slice. Requires the plan to be a mutable graph (Phase 3) with durable, resumable markings (Phase 4) and stale-graph detection (FE-738 deferred criterion 5). +- **Why now / unlocks:** Removes the last "plan was right" assumption from autonomy — the orchestrator becomes self-correcting. The latent `architect-generator-loop`. Most structurally expensive (touches the core substrate commitment); last rung. +- **Acceptance:** (1) a re-plan trigger fires on defined execution-feedback conditions; (2) the architect amends the plan (graph-level), not just retries a slice; (3) the affected sub-net recompiles and the run resumes from durable markings; (4) `graph_revision_stale` / `GraphRevisionCurrent` semantics gate stale work; (5) re-plans are recorded as changesets; (6) the replan agent is an `execute`-mode plugin on `agent-extension-host`. +- **Verification:** re-plan trigger tests; sub-net recompile + resume integration test; stale-graph gate tests; changeset linkage of plan amendments. +- **Depends on:** `petri-graph-compilation` (Phase 3), `petri-simulation-oracle` (Phase 4, resume), FE-738 deferred criterion 5, `intent-graph-semantics` (FE-700), `changeset-ledger` (FE-701), `agent-extension-host` (replan agent = `execute`-mode plugin). +- **Traceability:** Requirements 46–50; FE-738 acceptance criterion 5 (deferred); spec §graph-revision. +- **Design docs:** `docs/next/architecture/plan-graph-petri-orchestration.md`; `docs/design/orchestrator.md`. + ### petrinaut-colour-fold - **Name:** Petrinaut export — colour-fold per-slice subnet diff --git a/memory/SPEC.md b/memory/SPEC.md index 316bed72c..e262a30d5 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -157,7 +157,7 @@ Brunch operates inside a **workspace**: the cwd-backed software context whose lo | A95 | Transcript-first context with explicit context snapshots on turn rows plus active graph-item handles on chats can keep secondary chats useful across multi-chat item changes without a persisted context-spec table. Handles only need re-snapshotting when the referenced item's version/fingerprint advances. | medium | open | D139, D140, D154, Requirement 45 | Context-provision tests for snapshot insertion, item-list/neighborhood/economic-graph snapshot builders, stale-handle refresh, and prompt/context-pack rendering. | | A96 | Async-by-default reconciliation can move Pending review into an in-stream target-grouped reconciliation chat without hiding judgment work or surfacing auto-confirmed noise. | medium | open | D135, D137, D138, D146, D153 | Track 3 classifier scheduling, target-ordering tests, and dense reconciliation walkthroughs. | | A97 | A completed intent graph can be projected + planned into a valid `brunch cook` plan.yaml: `requirement` items and `criterion --verifies--> requirement` edges read deterministically, but execution-order `depends_on` is **not** spec truth (the observer captures only epistemic deps; FE-700 does not change this) and must come from an LLM planning pass plus a deterministic reconciliation stage, not a graph read. | high | validated | D160-K, Requirements 46–50 | Two spikes 2026-06-03 against real completed spec 2 ("brunch_graphs"): projection clean + verification fully covered; graph-read req→req deps = 0; one `generateObject` call yielded a credible acyclic DAG + free non-buildable-constraint detection, but dangled deps onto constraints (needs reconciliation). | -| A98 | Brownfield integration-demanding verification can be enforced **without** emitter-side host introspection: the integration requirement is latent in app-observable criterion prose, the runnable verification target carries that intent (`kind:'integration-test'` + criterion text instead of a flattened unit-test path), and the cook agent — which has full host access in the worktree — authors an app-reachability test. The emitter stays intent-graph-only (D160-K); enforcement is shared with the FE-813 harness (runs it) and the FE-738 semantic lane (judges reachability). | medium | open | D160-K, D161-K, Requirements 46–50 | Spike 2026-06-04 resolved the *source* question (host info comes from the agent at run time, not the emitter; `integration-test` kind + criterion prose already exist but are discarded by reconciliation). Enforcement *strength* still unproven — validate by building the emitter target-shaping slice and replaying the `spatial_graph_layout` orphan-code regression as an outer-loop oracle. | +| A98 | Brownfield integration-demanding verification can be enforced **without** emitter-side host introspection: the integration requirement is latent in app-observable criterion prose, the runnable verification target carries that intent (`kind:'integration-test'` + criterion text instead of a flattened unit-test path), and the cook agent — which has full host access in the worktree — authors an app-reachability test. The emitter stays intent-graph-only (D160-K); enforcement is shared with the FE-813 harness (runs it) and the FE-738 semantic lane (judges reachability). | medium | partially-validated | D160-K, D161-K, Requirements 46–50 | Spike 2026-06-04 resolved the *source* question (host info comes from the agent at run time, not the emitter; `integration-test` kind + criterion prose already exist but are discarded by reconciliation). **dogfood-spike 2026-06-16 validated the *mechanism*:** in a real brownfield cook (2-slice plan, `node:http` app), given an `integration-test` target + reachability-demanding prose, the cook agent self-authored a genuine boot-and-probe test (imported the real entry, `listen(0)`, `http.get`, asserted not-404) and wired the feature reachable — the orphan did not reproduce. **Enforcement *strength* remains the open gap:** nothing *forced* the boot-probe; reachability was agent-discretion. Close it via `integration-oracle` (independent, non-agent-authored reachability) + replaying the `spatial_graph_layout` orphan regression. | | A99 | **Resolved 2026-06-09 — reversed to arc-scoped deltas.** Petrinaut's actual-mode consumer expects each `TransitionFiring.input`/`output` to carry only the transition's arc-scoped delta: `input` = tokens consumed from input-arc places, `output` = the **new tokens** to add to output-arc places — never places the transition isn't connected to. Petrinaut reconstructs the running marking from `initialState` by folding each delta. This **reverses** the FE-819 Card A reading that the frame reader treats `firing.output` as a whole frame with no folding (the reading that drove the 2026-06-05 switch to full markings after pools/budgets rendered empty mid-run). Brunch now emits deltas again (FE-764's original shape). | high | resolved | FE-764 contract, FE-819 Card A (reversed) | Confirmed by the Petrinaut team (Chris, 2026-06-09): firings must be arc-scoped deltas; full markings list places a transition isn't connected to. **Watch:** re-verify on staging that delta rendering does not regress the 2026-06-05 "pools/budgets empty mid-run" symptom — i.e. Petrinaut now folds deltas onto `initialState` rather than treating `output` as a whole frame. | | A100-K | The schema-checkable executability invariants in `PlanContract` — acyclic `depends_on` over existing slice ids, ≥1 verification target per slice, every slice in an epic, every requirement covered or non-buildable, and every multi-slice epic carrying an `integration-test` seam — are a sufficient definition of "cook-executable" for `brunch plan` to gate and deterministically repair against. File-disjointness (one writer per file, or a declared join owner) is deliberately excluded until a `Slice.writes` field exists. | medium | partially-validated | D167-K, D158-K | **Static half validated (FE-829 slice 1):** all three reference fixtures pass `checkPlan` (base profile) unmodified — which forced the base/emitted profile split, since authored fixtures intentionally carry bare multi-slice epics; known-bad plans (cycle, self/dangling dep, unverified slice, orphan slice, uncomposed multi-slice epic) are flagged and auto-repaired; every emitted plan is accepted under the strict `emitted` profile. **File-disjointness half opened (FE-829 slice 4):** the `Slice.writes` field now exists and `checkPlan` enforces single-writer-per-file (`file-write-conflict`, a design-class warning, never auto-repaired); the "declared join owner" exception was dropped — a join slice is simply the sole writer of a shared coordination file, not a multi-writer exception (D160-K amendment, I132-K). LLM authoring of `writes` / decomposition / join synthesis remains deferred. **Still open:** cooking a repaired multi-slice-epic plan to an assembled, merge-verified artifact (closing the FE-800 integration-blind gap) needs the middle/outer loop — `brunch plan ` against a completed-spec DB, then `brunch cook`. | @@ -216,8 +216,9 @@ Brunch operates inside a **workspace**: the cwd-backed software context whose lo 163. **The Petrinaut actual-mode wire definition is plain-graph, not SDCPN** (FE-819) — Petrinaut's "actual/live" Brunch route narrowed to a `.strict()` schema accepting only `{version?, meta?, title?, places[id,name,x?,y?], transitions[id,name,inputArcs,outputArcs,x?,y?]}` (`brunchNetDefinitionSchema`); it supplies SDCPN defaults itself (`normalizeBrunchDefinition`) with extensions disabled. Brunch's `projectNetDefinition` therefore emits the slim shape and drops `types` plus every SDCPN-only place/transition field (`colorId`, `dynamicsEnabled`, `differentialEquationId`, `lambdaType`, `lambdaCode`, `transitionKernelCode`) — under `.strict()` these would be rejected, not ignored. Consequence: colour-fold slice identity is not expressible on this interface (identity fold only) until the standardized Brunch/Petrinaut protocol is owned in Petrinaut Core. The schema is mirrored in-repo (`petrinaut-brunch-contract-schema.ts`) as the projection oracle so a Petrinaut-side tightening fails a brunch test. Depends on: Requirement 48; D162-K. 164. **Greenfield/brownfield is spec-derived plan truth, not plan location** — the emitted `plan.yaml` carries `mode` (`Plan.mode`) from `specification.mode`; `brunch cook` reads `plan.mode` to choose the worktree strategy. The resolver splits into `resolveCookPlan` (locate the plan path) + `resolveSandboxPlan` (mode-driven worktree decision: greenfield → empty worktree; brownfield → clone the cwd repo + clean-tree gate, brownfield-only). Reverses the earlier location-keyed reading of Requirement 50 — a spec-emitted greenfield plan no longer clones the cwd. Authored/legacy plans without a `mode` load as greenfield. Depends on: Requirements 46, 49, 50; A65 (greenfield/brownfield grounding posture). (FE-826) 165. **Cook slice layout is policy-selected** (FE-827) — `OrchestratorInput.sliceLayout` is `'shared'` only for **serial greenfield** (all slices accrete into the single run sandbox; verify-epic runs in place; no per-slice dirs, no `__epic__` merge) and `'per-slice'` otherwise. Per-slice means git worktrees for brownfield and plain dirs for parallel greenfield, both merged into `__epic__//` for verification. The shared tree trades the per-slice dependency-correctness oracle and parallelism for one directly-usable, in-place-verified tree; parallel greenfield keeps isolation (race-safe) at the cost of a merge. `runCook` derives the layout; there is no policy refusal (greenfield parallel is allowed). Depends on: Requirements 46, 49; D164-K. -166. **Greenfield promotion-back is opt-in, completed-gated, and never silent** (FE-827) — `brunch cook --out=` promotes a greenfield run's tree into the target only when the run completed (`result.status === 'completed'`); halted/brownfield runs promote nothing (the run artifact stays inspectable). Landing is commit-on-branch: empty target → `git init` + commit on `main`; existing repo → commit on a `cook/` branch (the user's branch untouched); a non-empty target is refused unless `--force`. The promotion source follows the layout (D165-K): shared → the run sandbox; per-slice (parallel) → a whole-plan merge of all completed slices (declaration-order-wins, collisions reported). Closes the cook output-promotion gap for greenfield; brownfield promotion (git-merge chain) remains a follow-on. Depends on: Requirements 46, 49; D164-K, D165-K. +166. **Greenfield promotion-back is opt-in, completed-gated, and never silent** (FE-827) — `brunch cook --out=` promotes a greenfield run's tree into the target only when the run completed (`result.status === 'completed'`); halted/brownfield runs promote nothing (the run artifact stays inspectable). Landing is commit-on-branch: empty target → `git init` + commit on `main`; existing repo → commit on a `cook/` branch (the user's branch untouched); a non-empty target is refused unless `--force`. The promotion source follows the layout (D165-K): shared → the run sandbox; per-slice (parallel) → a whole-plan merge of all completed slices (declaration-order-wins, collisions reported). Closes the cook output-promotion gap for greenfield; brownfield promotion landed separately as decision 168 (FE-877). Depends on: Requirements 46, 49; D164-K, D165-K. 167. **The emitter guarantees cook-executability through a self-contained `PlanContract` + deterministic repair, separate from intent projection** (FE-829) — `brunch plan` gates its output on a producer-agnostic `PlanContract` that checks the schema-checkable executability invariants (I129-K), plus a deterministic repair loop that fixes the **mechanical class** (Kahn cycle-break; mint a missing verification target; **synthesize an `integration-test` seam on every multi-slice epic** so the per-epic merge runs and composition is proven) while surfacing the **design class** (uncovered requirement; shared file with no declared join owner) as typed warnings rather than silently inventing or dropping scope. This splits today's reconciliation ("always repair, never check") into detect-then-repair, makes "is this plan cook-executable?" one reusable predicate that also validates hand-authored fixtures, and directly closes the FE-800 integration-blind / "green checks, no assembled artifact" gap. Slice 1 (contract + repair; no LLM; no file/decomposition authoring) does **not** touch D160-K. **Slice-1 refinement (2026-06-09):** the reusable-predicate goal collided with the read-only reference fixtures — two of them carry intentionally bare multi-slice epics (their `core` and `pipeline` epics) — so the seam invariant is enforced through **two `checkPlan` profiles**: `base` (default, for authored/producer-input plans) reports the missing seam as a *warning*; `emitted` (for `brunch plan` output) reports it as an *error*. `repairPlan` always synthesizes the seam regardless of profile, so emitted plans pass `emitted` while fixtures pass `base` unmodified. Implemented as `plan-contract.ts` (`checkPlan`/`repairPlan`) + a shared `plan-graph.ts` Kahn helper (reused by `reconcilePlan` so the two cycle-break policies cannot drift) + a `project-profile.ts` `Toolchain` descriptor that *derives* verification targets (`sliceTarget`/`epicTarget`) instead of hardcoding `tests/.test.ts`. Depends on: Requirements 46–50; A97, D158-K, D161-K; establishes I129-K. See Future Direction §Cook plan generation for the build-architect arc and the deferred D160-K amendment. +168. **Brownfield promotion is automatic and plumbing-only; `brunch serve` is the one-shot capstone** (FE-877, FE-878) — a completed brownfield cook auto-commits its composed tree onto the repo's own `cook/` branch (the branch the CoW sandbox already created from `HEAD`) via git plumbing (`commit-tree` + CAS `update-ref`, throwaway index + external work-tree), so the user's active branch, working tree, and index are never touched; merging stays the user's call. `--out` is therefore greenfield-only — for brownfield it is ignored with a warning. `brunch serve ` = `plan ` then `cook --spec=` (cook reads the just-emitted plan; serve threads the resolved launch cwd as cook's `dir` because `runCook` reads `opts.dir` raw — the launch-cwd default lives only in `parseCookArgs`, R46); serve's `--out` is the greenfield promote target, petrinaut/policy/retry flags forward to cook, and a failed plan short-circuits (nothing cooked). Pure glue — no new orchestration; the testable units are `parseServeArgs` + `runServe` (stages injected) with db/snapshot wiring in `cli.ts`. **Closes Arc 1.** Depends on: Requirements 46, 49; decision 166; establishes I135-K. (FE-877, FE-878) #### Provider, prompt/context, and agent substrate @@ -270,15 +271,17 @@ Each invariant is a formalization candidate: the property is stated in human lan | I123-K | Worktree isolation holds — fixture directory and source repo are never mutated by an orchestrator run; worktree is cwd-scoped at `/.brunch/cook/runs//worktree/`. Slice layout follows policy (D165-K): serial greenfield runs all slices in the single shared run tree (verify-epic in place, no `__epic__`); parallel greenfield and brownfield isolate per slice and merge into `__epic__//`. Brownfield clones the cwd repo and preserves the source repo's HEAD and tracked-file state byte-identically; greenfield never clones the source. | worktree.test.ts, brownfield-smoke.integration.test.ts, engine-contract.test.ts | Requirement 49; D159-K, D164-K, D165-K | | I124-K | Epic verification runs against a freshly-rebuilt `/__epic__//` dir holding the deterministic merge of its completed slices' worktrees (later slices in plan declaration order overwrite earlier ones on path collisions; collisions are reported via the `epic-sandbox-merged` event). Per-slice worktrees are not mutated by the merge. | epic-sandbox-merge.test.ts, engine-contract.test.ts | Requirement 49; D159-K | | I125-K | Topology output-place candidates are fully declared in `HandlerDescriptor` via typed `Guard` predicates; `wireHandlers` introduces no new output places at fire time. Pure consumers can enumerate the reachable output-place set per transition from topology data alone via `enumerateCandidateOutputs(transition)`. Halt paths (budget exhaustion, verify-epic failure) and token transforms (reportId attach, retry/rework count propagation) remain runtime concerns and are explicitly not covered by this invariant. | topology.test.ts, engine-contract.test.ts | Requirements 46, 47, 48; D155-K (FE-747) | -| I126-K | The cook evaluator observes, never produces: `evaluate-done` runs with read-only tools (`toolsForAction('evaluate-done') === 'read'`) so it cannot mutate the sandbox during evaluation, and per-slice `done` reflects real execution of the slice's verification targets — ≥1 target and every target passing via `evaluateVerificationTargets` — rather than an LLM verdict. | pi-actions.test.ts, engine-contract.test.ts, brownfield-smoke.integration.test.ts | Requirements 46–50; D161-K (FE-813) | +| I126-K | The cook evaluator observes, never produces: `evaluate-done` runs with read-only tools (`toolsForAction('evaluate-done') === 'read'`) so it cannot mutate the sandbox during evaluation, and per-slice `done` reflects real execution of the slice's verification targets — ≥1 target and every target passing via the shared `runVerification` seam (one `TestRunner`; `evaluate-done`, `verify-epic`, and the net `run-tests` path share it — FE-872 unification; `evaluateVerificationTargets` / private `runTest` deleted) — rather than an LLM verdict. | pi-actions.test.ts, engine-contract.test.ts, brownfield-smoke.integration.test.ts | Requirements 46–50; D161-K (FE-813) | | I127-K | Brunch's Petrinaut stream markings are count-only (`Marking = Record`): the static reducer and the live bus produce per-place token counts in each firing's arc-scoped consume/produce delta (A99; `initialState` is the single full marking), with no `TokenColour[]` arm. The wire `NetDefinition` is plain-graph (no `colorId` or other SDCPN fields), so slice/colour identity has no wire carrier — identity fold only. The projected definition validates against the mirrored `brunchNetDefinitionSchema` under `.strict()`. | petrinaut-stream-export.test.ts (arc-scoped delta oracle + strict-schema validation), petrinaut-stream-bus.test.ts (replay-equivalence) | Requirement 48; D162-K, D163-K (FE-819) | -| I128-K | Greenfield promotion-back never silently overwrites and never promotes an incomplete run: `brunch cook --out` lands a tree only when the run completed, refuses a non-empty target without `--force`, and always lands as a git commit (init+commit on `main` for an empty target, or a `cook/` branch in an existing repo). The promotion source follows the slice layout — the run sandbox (serial) or a whole-plan merge of completed slices (parallel). | promote-run.test.ts, cook-cli.test.ts (`promotionSourceDir`) | Requirements 46, 49; D166-K (FE-827) | +| I128-K | Greenfield promotion-back never silently overwrites and never promotes an incomplete run: `brunch cook --out` lands a tree only when the run completed, refuses a non-empty target without `--force`, and always lands as a git commit (init+commit on `main` for an empty target, or a `cook/` branch in an existing repo). The promotion source follows the slice layout — the run sandbox (serial) or a whole-plan merge of completed slices (parallel). `--out` is **greenfield-only**: a brownfield run auto-promotes onto `cook/` and ignores `--out` with a warning (I135-K). | promote-run.test.ts, cook-cli.test.ts (`promotionSourceDir`) | Requirements 46, 49; D166-K (FE-827) | +| I135-K | Brownfield promotion never touches the user's checkout: `promoteBrownfieldRun` lands a completed brownfield cook's composed tree as one commit on the repo's existing `cook/` branch via git plumbing (`read-tree` base → `add -A` against an external work-tree → `write-tree` → `commit-tree -p base` → CAS `update-ref`) under a throwaway `GIT_INDEX_FILE`, so the active branch, working tree, and index are unchanged and gitignored deps don't land. Auto-runs on a completed brownfield run (no `--out` needed); a missing `cook/` branch throws. | promote-run.test.ts (lands on cook/ with parent=base; main/HEAD/working-tree/index untouched; tracked-deletion; real linked-worktree topology) | Requirements 46, 49; decision 168 (FE-877) | | I130-K | No tech stack is hardcoded in the cook harness: the `test-writer` prompt names no framework, and the runner (`ToolchainTestRunner`) plus the cook task builders (`sliceTestTask`/`epicVerifyTask`) take the test command and conventions from the `Toolchain` resolved from `plan.profile` (`resolveToolchain`, bun default). `brunch cook` and `brunch plan` resolve the same profile, so emitted targets and the runner that executes them agree. | project-profile.test.ts, test-runner.test.ts (toolchain command honored), pi-actions.test.ts (task builders carry conventions + prompt has no hardcoded stack) | SPEC §Future Direction Cook plan generation; D161-K, D167-K (FE-829 slice 2, FE-813) | | I129-K | Every plan `brunch plan` emits satisfies the schema-checkable executability contract: `depends_on` is acyclic over existing slice ids, every slice has ≥1 verification target, every slice belongs to an epic, every requirement is covered or explicitly non-buildable, and every multi-slice epic carries an `integration-test` verification target. `checkPlan` is total and pure with **two profiles**: `base` (default) treats a multi-slice epic missing its seam as a *warning* so authored/reference plans pass `check` unmodified, while `emitted` escalates it to an *error*; `repairPlan` always synthesizes the seam, so `brunch plan` output satisfies the strict `emitted` profile. Every deterministic repair is surfaced as a typed warning; `check(repair(plan))` is accepted under `emitted`. File-disjointness / join-ownership is out of scope until a `Slice.writes` field lands. | plan-contract.test.ts, plan-emitter.test.ts (FE-829 slice 1) | Requirements 46–50; D167-K; A100-K (FE-829) | | I133-K | `brunch plan` is a build-ARCHITECT: a single schema-constrained LLM call (`architectPlan`) AUTHORS a decomposed, file-disjoint slice set — scaffold + per-behaviour slices + a join slice that is the sole writer of shared coordination files — each carrying `writes` (file ownership) and `derivedFrom` (requirement provenance; never persisted on the emitted `Plan`). Per D160-K (amended) the architect does **no host introspection** and authors **no test content**: verification targets are synthesized deterministically by `materializeArchitectedPlan` (toolchain-derived) and the cook agent writes tests at run time (A98). The materializer is pure: it filters unknown requirement refs (slices kept), drops self/dangling deps, breaks cycles (shared Kahn policy), resolves epic membership from `slice.epic_id`, appends each requirement's criteria into the derived slice's definition prose, and emits a coverage sidecar. The emitter gates the authored plan with `repairPlan` + `checkPlan` (`emitted` profile + generalized requirement-provenance coverage); **if authoring throws/parses-malformed OR the authored plan is uncovered/contract-failing, it falls back to a deterministic projection plan** (`reconcilePlan(projected, ∅)` + repair — no second LLM call) and surfaces one `architect-failed-fallback-to-projection` warning. A surviving `file-write-conflict` is surfaced (never silently shipped). Deterministic plumbing only is verified; decomposition QUALITY is deferred to the slice-5 eval harness + opt-in real-LLM smoke. Supersedes the slice-3 enrichment stage (`planExecutionOrdering`, I131-K) on the mainline. | plan-architect.test.ts (schema/prompt/failure), plan-materialize.test.ts (coverage/provenance/dep-clean/purity), plan-emitter.test.ts (authored happy path, fallback on throw + uncovered + malformed, file-write-conflict surfaced, toolchain), plan-runner.test.ts | Requirements 46–50; A97, A98, A100-K; D160-K (amended), D167-K; supersedes I131-K (FE-829 slice 4B) | | I134-K | `brunch plan`'s authored output has a deterministic acceptance oracle, `evaluatePlanShape` (`plan-eval.ts`), separate from the emitter mainline (it is an outer-loop scorer, not wired into emission). It returns a `PlanEvalReport` with an **explicit, narrow `verdict` gate** — `reject` iff any `error`-severity finding under the strict `emitted` contract profile, OR any `file-write-conflict`, OR any slice missing a `writes` declaration — never a score threshold, so the non-deterministic architect cannot game a scalar into acceptance. Alongside the gate it reports a graded structural-feature vector (`metrics`) measured against the SHARED fixture-design principles (docs/design/orchestrator-demo-fixtures.md), **not** against any fixture's ids/paths/counts: verification coverage, integration-seam coverage, writes coverage, single-writer, transitively-redundant-dependency penalty, slice sharpness, dependency signal. `overall` is a weighted mean for trending only (soft heuristics half-weight). The three reference fixtures are the self-test: each must `accept` and score `overall === 1`, which required refreshing them (added `writes` to every slice; added the previously-missing integration seam to the `core` and `pipeline` epics of two reference fixtures — they now satisfy their own stated principle #2 under the strict `emitted` profile). | plan-eval.test.ts (fixture self-test scores 1; write-conflict/missing-seam/missing-writes/dangling-dep → reject; redundant-edge + flatten + over-broad-slice graded lower; monotonicity) | Requirements 46–50; A100-K; D167-K; I129-K, I132-K, I133-K (FE-829 slice 5) | | I132-K | `Slice.writes?: string[]` declares the repo-relative POSIX file paths a slice exclusively mutates (exact paths only — no globs/directories), and `checkPlan` enforces single-writer-per-file: a path declared by ≥2 slices is a `file-write-conflict` — a **design-class warning** (never an error, never auto-repaired), since resolving it changes decomposition/ownership. Duplicate paths within one slice are deduped first and never self-conflict. A "join slice" is the sole writer of a shared coordination file that `depends_on` the slices it joins — not a multi-writer exception. `repairPlan` preserves `writes` verbatim and never moves ownership or synthesizes a join slice; `loadPlan` round-trips the field (absent → undefined). Emitter/LLM authoring of `writes` + requirement decomposition + join synthesis is deferred (D160-K amendment + slice-5 eval). | plan-contract.test.ts (disjoint accepted, overlap warns, intra-slice dup no-false-positive, repair preserves), plan-loader.test.ts (writes round-trip) | Requirements 46–50; A98, A100-K; D160-K (amended), D167-K (FE-829 slice 4) | | I131-K | **Retired (FE-829 post-slice-5)** — `planExecutionOrdering` and its whole `plan-llm-planning.ts` module (+ test) are deleted, having been superseded on the mainline by the authoring architect (I133-K, slice 4B). The only load-bearing survivor, the `PlanningEnrichment` type (reconcile's deterministic-fallback input contract), now lives in `plan-reconciliation.ts` next to its consumer; the duplicate `RunModel` type consolidated onto `plan-architect.ts`. The Zod `planningEnrichmentSchema` and `defaultRunModel` went with the deleted function. Historical record (the enrichment-over-projected-slices stage): the slice-3 planner only classified/grouped/ordered the existing `req-*` slices — it never invented, split, merged, renamed, or removed them — and was prompt-enriched with per-slice criteria + `projectPlanningContext` relation edges + the inlined reference-fixture exemplars. That enrichment seam was never validated for model quality and is fully replaced by `architectPlan` (I133-K) + the slice-5 eval harness (I134-K). | plan-planning-context.test.ts (edge lifting/ownership/dedupe — the surviving context seam); architect + eval coverage per I133-K / I134-K | Requirements 46–50; A97; D167-K; superseded by I133-K, retired post-I134-K (FE-829 slice 3 → retired) | +| I136-K | **FE-878 presentation seam.** All `serve`/`cook`/`plan` terminal output flows through one `emit(CookEvent)` boundary, never direct `console.*`/`log()` outside `presenter/`; the orchestrator never imports the renderer. A pure `selectPresenter({command,isTTY,ci,reporterFlag})` chooses the backend — `plain` (CI / non-TTY / default), `silent` (`agent` mode), `ink` (interactive TTY; falls back to plain until slice 2). `PlainPresenter` reproduces pre-refactor stderr **byte-identically**; for the cook surface this is made deterministic by an **injected clock** (the presenter owns the elapsed/duration timer) plus a redaction normalizer for absolute paths and `runId`. The bus fans out synchronously and **swallows a thrown presenter** (`emitWarning`) so presentation can never abort a run; **stdout stays empty / JSONL-only**. Behavior-preserving — no `*-started`/activity instrumentation and no live Ink rendering (slice 2). **Slices 1a + 1b (done):** seam foundation (`presenter.ts` root + `presenter/{events,bus,select,plain,silent}.ts`) + CLI wiring; **both surfaces migrated** — `plan` (`plan-runner`) and `cook` (`cook-cli` banner/summary/promotion/petrinaut via a `line` passthrough arm + `pi-actions` per-action progress as structured `action`/`verbose` arms). The elapsed timer moved off `pi-actions`' module-level `Date.now()` into the presenter's **injected clock**, seeded by a `cook-start` event. `pi-actions` is now console-free; `cook-cli`'s only residual `console.error` is the injectable Petrinaut-setup default, which the cook path overrides with a bus-backed `log`. **Slice 2a (done):** the `ink` backend is real (no longer a plain fallback) — formatting consolidated into a shared `format.ts` + `clock.ts` (used by both backends so log bodies can't drift), a `RunStore` folds the event stream into `{phase, lines}`, a pure monotonic `nextPhase` projects the brigade tracker (coarse, from post-hoc events; precise in-flight transitions are 2b), and the Ink `App` (brunch-wordmark header in the brand gradient + brigade strip with `✓/◐/○` marks + bounded activity log) renders to **stderr**. **Slice 2b (done):** the dead-air fix — `activity-start`/`activity-progress`/`activity-end` events; the four long waits are bracketed (the three agent sessions self-bracket inside `runPi` with a throttled KB heartbeat; the test-run + probe waits use a `withActivity` helper; promotion brackets in `cook-cli`), always closing via `finally`. `RunStore` tracks a `pending` map; the Ink `PendingPanel` shows a live spinner + label + elapsed + detail (a tick interval runs only while pending is non-empty). Plain/CI renders one `⋯` start line per wait. The seam is now complete across all three commands and both backends. **Lifecycle:** the bus creator owns disposal — entry points run through `withCookBus(command, fn)`, which builds the bus and `dispose()`s it (unmounting Ink) in `finally`, so the TUI can't be left mounted and hang the process (ln-review finding). | bus.test.ts (fan-out + error isolation), presenter.test.ts (withCookBus disposes on success + throw), select.test.ts (decision table), plain.test.ts (byte-exact plan + cook arms incl. injected-clock elapsed + activity start-line), plan-runner.test.ts (golden stderr via capturing bus), brownfield-smoke.integration.test.ts (cook end-to-end through the bus), phase.test.ts (monotonic brigade), run-store.test.ts (event fold + pending map + stable snapshot), ink/app.test.tsx (frame: egg + active phase + activity + pending panel), pi-actions.test.ts (balanced activity start/end incl. on session failure), cook-report.test.ts (banner + completion-summary golden — the cook line strings are pure-tested, ln-review #3) | Requirements 46–50; D156-K (reports.jsonl stays the durable medium; CookEvent is ephemeral presentation only) (FE-878) | ## Future Direction Register @@ -451,6 +454,7 @@ Every meaningful code change should pass `npm run fix` in the inner loop and `np | Middle | Prompt/context golden and classifier corpora | Prompt/context output remains inspectable and regressable as prompts evolve. | Requirements 40, 41; A84, A88; I112, I114 | | Middle | Context-snapshot replay and handle-refresh oracles | Turn-level snapshots replay unchanged after graph edits; active handles re-snapshot only when changeset-backed item versions advance. | Requirement 45; A95; D154; I120 | | Middle | Structured context-builder assertions plus selected golden renderings | Item-list, neighborhood, and economic whole-graph snapshots contain required ids, sections, relation/provenance signals, and stable rendering boundaries without overfitting prose. | Requirements 40, 45; A84, A95; I112, I120 | +| Middle | Differential / golden-master with injected clock + path/runId redaction | The `serve`/`cook`/`plan` presentation refactor preserves stderr byte-for-byte; output stays behind the `emit(CookEvent)` boundary and off stdout. | Requirements 46–50; I136-K | | Outer | Fixture-backed manual walkthroughs | Phase transitions, export, resume, graph view, and waiting states feel legible. | Requirements 5, 13–15, 33 | | Outer | Brownfield and scenario-quality review | Generated questions/bundles are useful, grounded, honest about tradeoffs, and not overconfident. | Requirements 3, 16, 20; A67, A68, A90, A91 | | Outer | Dense cascade/reconciliation walkthroughs | Users can understand and resolve downstream graph impact without skipping necessary judgment. | A48, A88, I113, I114 | @@ -471,11 +475,14 @@ Every meaningful code change should pass `npm run fix` in the inner loop and `np | LLM classifier correctness and determinism | Proposals never auto-apply; re-run exists; corpora/goldens grow from failures. | Substantive items are mislabeled as auto-confirm or repeated runs diverge materially. | | Economic whole-graph snapshot quality | Structured assertions plus one selected golden rendering fixture; human review for whether compact context is useful. | Secondary-chat answers show missing authority/provenance/relation context or snapshots become too large for routine prompts. | | Context-handle refresh before real item versions | Defer handle freshness semantics until `changeset-ledger` supplies real item versions rather than blessing a temporary content fingerprint. | `chat-context-provision` is pulled before changeset-backed item versions exist. | +| Frozen spinner during a synchronous test run | Slice 2b brackets every wait, but `test-runner` uses blocking `spawnSync`, so the spinner can't animate (only the label + start-elapsed show) while a ≤60s test runs; the async pi session animates fine. | The test-run wait becomes a felt pain point — then move `test-runner` to an async spawn so the event loop can tick. | +| Real-terminal Ink *visual* behavior (resize, Ctrl-C, escape codes) | Teardown is now wired + tested (`withCookBus` disposes the bus → unmounts Ink in `finally`; ln-review caught that nothing disposed it before). Frames are unit-tested via ink-testing-library and bundled in the build; what's left is purely visual — not yet walked through in a live terminal. | A manual `brunch cook`/`serve` run shows visual/resize glitches, or before relying on the TUI for a demo. | ### Design Notes - Context-handle freshness should wait for real item versions from `changeset-ledger`; do not bless a temporary content/edge fingerprint as the durable refresh oracle. - Economic whole-graph snapshot verification should pair structured JSON assertions for required sections/counts/ids with a small number of golden renderings and human review, rather than treating exact prose as the primary oracle. +- The CLI presentation refactor's byte-identical golden oracle depends on making non-determinism injectable rather than masked: the elapsed/duration clock is a presenter-owned dependency (deterministic in tests, reused by the slice-2 Ink tests), and only paths/`runId` are redacted by a small normalizer. Avoid normalizing the timer itself — that would let the normalizer, the thing under test, mask real drift. ### Acceptance Criteria diff --git a/package-lock.json b/package-lock.json index cb16e0427..432b0c9b8 100644 --- a/package-lock.json +++ b/package-lock.json @@ -38,6 +38,7 @@ "drizzle-orm": "^0.45.2", "embla-carousel-react": "^8.6.0", "express": "^5.2.1", + "ink": "^7.0.6", "lucide-react": "^1.8.0", "md-pen": "^1.2.0", "motion": "^12.38.0", @@ -76,6 +77,7 @@ "code-inspector-plugin": "^1.5.1", "drizzle-kit": "^0.31.10", "happy-dom": "^20.8.9", + "ink-testing-library": "^4.0.0", "oxfmt": "^0.43.0", "oxlint": "^1.58.0", "oxlint-tsgolint": "^0.19.0", @@ -249,6 +251,46 @@ "zod": "^3.25.76 || ^4.1.8" } }, + "node_modules/@alcalzone/ansi-tokenize": { + "version": "0.3.0", + "resolved": "https://registry.npmjs.org/@alcalzone/ansi-tokenize/-/ansi-tokenize-0.3.0.tgz", + "integrity": "sha512-p+CMKJ93HFmLkjXKlXiVGlMQEuRb6H0MokBSwUsX+S6BRX8eV5naFZpQJFfJHjRZY0Hmnqy1/r6UWl3x+19zYA==", + "license": "MIT", + "dependencies": { + "ansi-styles": "^6.2.1", + "is-fullwidth-code-point": "^5.0.0" + }, + "engines": { + "node": ">=18" + } + }, + "node_modules/@alcalzone/ansi-tokenize/node_modules/ansi-styles": { + "version": "6.2.3", + "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-6.2.3.tgz", + "integrity": "sha512-4Dj6M28JB+oAH8kFkTLUo+a2jwOFkuqb3yucU0CANcRRUbxS0cP0nZYCGjcc3BNXwRIsUVmDGgzawme7zvJHvg==", + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/chalk/ansi-styles?sponsor=1" + } + }, + "node_modules/@alcalzone/ansi-tokenize/node_modules/is-fullwidth-code-point": { + "version": "5.1.0", + "resolved": "https://registry.npmjs.org/is-fullwidth-code-point/-/is-fullwidth-code-point-5.1.0.tgz", + "integrity": "sha512-5XHYaSyiqADb4RnZ1Bdad6cPp8Toise4TzEjcOYDHZkTCbKgiUl7WTUCpNWHuxmDt91wnsZBc9xinNzopv3JMQ==", + "license": "MIT", + "dependencies": { + "get-east-asian-width": "^1.3.1" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/@antfu/install-pkg": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/@antfu/install-pkg/-/install-pkg-1.1.0.tgz", @@ -1514,7 +1556,7 @@ "typebox": "1.1.38" }, "bin": { - "pi-ai": "dist/cli.js" + "pi-ai": "./dist/cli.js" }, "engines": { "node": ">=22.19.0" @@ -10631,6 +10673,21 @@ "string-width": "^4.1.0" } }, + "node_modules/ansi-escapes": { + "version": "7.3.0", + "resolved": "https://registry.npmjs.org/ansi-escapes/-/ansi-escapes-7.3.0.tgz", + "integrity": "sha512-BvU8nYgGQBxcmMuEeUEmNTvrMVjJNSH7RgW24vXexN4Ven6qCvy4TntnvlnwnMLTVlcRQQdbRY8NKnaIoeWDNg==", + "license": "MIT", + "dependencies": { + "environment": "^1.0.0" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/ansi-regex": { "version": "5.0.1", "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz", @@ -10788,6 +10845,18 @@ "dev": true, "license": "MIT" }, + "node_modules/auto-bind": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/auto-bind/-/auto-bind-5.0.1.tgz", + "integrity": "sha512-ooviqdwwgfIfNmDwo94wlshcdzfO64XV0Cg6oDsDYBJfITDz1EngD2z7DkbvCWn+XIMsIqW27sEVF6qcpJrRcg==", + "license": "MIT", + "engines": { + "node": "^12.20.0 || ^14.13.1 || >=16.0.0" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/axe-core": { "version": "4.11.2", "resolved": "https://registry.npmjs.org/axe-core/-/axe-core-4.11.2.tgz", @@ -11639,6 +11708,65 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/cli-truncate": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/cli-truncate/-/cli-truncate-6.0.0.tgz", + "integrity": "sha512-3+YKIUFsohD9MIoOFPFBldjAlnfCmCDcqe6aYGFqlDTRKg80p4wg35L+j83QQ63iOlKRccEkbn8IuM++HsgEjA==", + "license": "MIT", + "dependencies": { + "slice-ansi": "^9.0.0", + "string-width": "^8.2.0" + }, + "engines": { + "node": ">=22" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/cli-truncate/node_modules/ansi-regex": { + "version": "6.2.2", + "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-6.2.2.tgz", + "integrity": "sha512-Bq3SmSpyFHaWjPk8If9yc6svM8c56dB5BAtW4Qbw5jHTwwXXcTLoRMkpDJp6VL0XzlWaCHTXrkFURMYmD0sLqg==", + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/chalk/ansi-regex?sponsor=1" + } + }, + "node_modules/cli-truncate/node_modules/string-width": { + "version": "8.2.1", + "resolved": "https://registry.npmjs.org/string-width/-/string-width-8.2.1.tgz", + "integrity": "sha512-IIaP0g3iy9Cyy18w3M9YcaDudujEAVHKt3a3QJg1+sr/oX96TbaGUubG0hJyCjCBThFH+tFpcIyoUHUn1ogaLA==", + "license": "MIT", + "dependencies": { + "get-east-asian-width": "^1.5.0", + "strip-ansi": "^7.1.2" + }, + "engines": { + "node": ">=20" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/cli-truncate/node_modules/strip-ansi": { + "version": "7.2.0", + "resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-7.2.0.tgz", + "integrity": "sha512-yDPMNjp4WyfYBkHnjIRLfca1i6KMyGCtsVgoKe/z1+6vukgaENdgGBZt+ZmKPc4gavvEZ5OgHfHdrazhgNyG7w==", + "license": "MIT", + "dependencies": { + "ansi-regex": "^6.2.2" + }, + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/chalk/strip-ansi?sponsor=1" + } + }, "node_modules/cli-width": { "version": "4.1.0", "resolved": "https://registry.npmjs.org/cli-width/-/cli-width-4.1.0.tgz", @@ -11707,6 +11835,18 @@ "dev": true, "license": "MIT" }, + "node_modules/code-excerpt": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/code-excerpt/-/code-excerpt-4.0.0.tgz", + "integrity": "sha512-xxodCmBen3iy2i0WtAK8FlFNrRzjUqjRsMfho58xT/wvZU1YTM3fCnRjcy1gJPMepaRlgm/0e6w8SpWHpn3/cA==", + "license": "MIT", + "dependencies": { + "convert-to-spaces": "^2.0.1" + }, + "engines": { + "node": "^12.20.0 || ^14.13.1 || >=16.0.0" + } + }, "node_modules/code-inspector-plugin": { "version": "1.5.1", "resolved": "https://registry.npmjs.org/code-inspector-plugin/-/code-inspector-plugin-1.5.1.tgz", @@ -11841,6 +11981,15 @@ "integrity": "sha512-Kvp459HrV2FEJ1CAsi1Ku+MY3kasH19TFykTz2xWmMeq6bk2NU3XXvfJ+Q61m0xktWwt+1HSYf3JZsTms3aRJg==", "license": "MIT" }, + "node_modules/convert-to-spaces": { + "version": "2.0.1", + "resolved": "https://registry.npmjs.org/convert-to-spaces/-/convert-to-spaces-2.0.1.tgz", + "integrity": "sha512-rcQ1bsQO9799wq24uE5AM2tAILy4gXGIK/njFWcVQkGNZ96edlpY+A7bjwvzjYvLDyzmG1MmMLZhpcsb+klNMQ==", + "license": "MIT", + "engines": { + "node": "^12.20.0 || ^14.13.1 || >=16.0.0" + } + }, "node_modules/cookie": { "version": "0.7.2", "resolved": "https://registry.npmjs.org/cookie/-/cookie-0.7.2.tgz", @@ -13613,6 +13762,18 @@ "node": ">=6" } }, + "node_modules/environment": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/environment/-/environment-1.1.0.tgz", + "integrity": "sha512-xUtoPkMggbz0MPyPiIWr1Kp4aeWJjDZ6SMvURhimjdZgsRuDplF5/s9hcgGhyXMhs+6vpnuoiZ2kFiu3FMnS8Q==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/error-ex": { "version": "1.3.4", "resolved": "https://registry.npmjs.org/error-ex/-/error-ex-1.3.4.tgz", @@ -13676,6 +13837,16 @@ "node": ">= 0.4" } }, + "node_modules/es-toolkit": { + "version": "1.47.1", + "resolved": "https://registry.npmjs.org/es-toolkit/-/es-toolkit-1.47.1.tgz", + "integrity": "sha512-5RAqEwf4P4E17p+W75KLOWw/nOvKZzSQpxM32IpI2KZLaVonjTrZ0Ai5ghMaVI9eKC2p8eoQgcBdkEDgzFk6+Q==", + "license": "MIT", + "workspaces": [ + "docs", + "benchmarks" + ] + }, "node_modules/esast-util-from-estree": { "version": "2.0.0", "resolved": "https://registry.npmjs.org/esast-util-from-estree/-/esast-util-from-estree-2.0.0.tgz", @@ -15420,6 +15591,18 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/indent-string": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/indent-string/-/indent-string-5.0.0.tgz", + "integrity": "sha512-m6FAo/spmsW2Ab2fU35JTYwtOKa2yAwXSwgjSv1TJzh4Mh7mC3lzAOVLBprb72XsTrgkEIsl7YrFNAiDiRhIGg==", + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/inherits": { "version": "2.0.4", "resolved": "https://registry.npmjs.org/inherits/-/inherits-2.0.4.tgz", @@ -15432,6 +15615,236 @@ "integrity": "sha512-JV/yugV2uzW5iMRSiZAyDtQd+nxtUnjeLt0acNdw98kKLrvuRVyB80tsREOE7yvGVgalhZ6RNXCmEHkUKBKxew==", "license": "ISC" }, + "node_modules/ink": { + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/ink/-/ink-7.0.6.tgz", + "integrity": "sha512-/KG651f+LHln9gumb5ltieFqzNGJdhX1b/WwsCUd2Py7Htuk9KUzyFrk25ugmzjXyDneXSoXD3cm4ql4dWFGsQ==", + "license": "MIT", + "dependencies": { + "@alcalzone/ansi-tokenize": "^0.3.0", + "ansi-escapes": "^7.3.0", + "ansi-styles": "^6.2.3", + "auto-bind": "^5.0.1", + "chalk": "^5.6.2", + "cli-boxes": "^4.0.1", + "cli-cursor": "^4.0.0", + "cli-truncate": "^6.0.0", + "code-excerpt": "^4.0.0", + "es-toolkit": "^1.45.1", + "indent-string": "^5.0.0", + "is-in-ci": "^2.0.0", + "patch-console": "^2.0.0", + "react-reconciler": "^0.33.0", + "scheduler": "^0.27.0", + "signal-exit": "^3.0.7", + "slice-ansi": "^9.0.0", + "stack-utils": "^2.0.6", + "string-width": "^8.2.0", + "terminal-size": "^4.0.1", + "type-fest": "^5.5.0", + "widest-line": "^6.0.0", + "wrap-ansi": "^10.0.0", + "ws": "^8.20.0", + "yoga-layout": "~3.2.1" + }, + "engines": { + "node": ">=22" + }, + "peerDependencies": { + "@types/react": ">=19.2.0", + "react": ">=19.2.0", + "react-devtools-core": ">=6.1.2" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + }, + "react-devtools-core": { + "optional": true + } + } + }, + "node_modules/ink-testing-library": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/ink-testing-library/-/ink-testing-library-4.0.0.tgz", + "integrity": "sha512-yF92kj3pmBvk7oKbSq5vEALO//o7Z9Ck/OaLNlkzXNeYdwfpxMQkSowGTFUCS5MSu9bWfSZMewGpp7bFc66D7Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=18" + }, + "peerDependencies": { + "@types/react": ">=18.0.0" + }, + "peerDependenciesMeta": { + "@types/react": { + "optional": true + } + } + }, + "node_modules/ink/node_modules/ansi-regex": { + "version": "6.2.2", + "resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-6.2.2.tgz", + "integrity": "sha512-Bq3SmSpyFHaWjPk8If9yc6svM8c56dB5BAtW4Qbw5jHTwwXXcTLoRMkpDJp6VL0XzlWaCHTXrkFURMYmD0sLqg==", + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/chalk/ansi-regex?sponsor=1" + } + }, + "node_modules/ink/node_modules/ansi-styles": { + "version": "6.2.3", + "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-6.2.3.tgz", + "integrity": "sha512-4Dj6M28JB+oAH8kFkTLUo+a2jwOFkuqb3yucU0CANcRRUbxS0cP0nZYCGjcc3BNXwRIsUVmDGgzawme7zvJHvg==", + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/chalk/ansi-styles?sponsor=1" + } + }, + "node_modules/ink/node_modules/chalk": { + "version": "5.6.2", + "resolved": "https://registry.npmjs.org/chalk/-/chalk-5.6.2.tgz", + "integrity": "sha512-7NzBL0rN6fMUW+f7A6Io4h40qQlG+xGmtMxfbnH/K7TAtt8JQWVQK+6g0UXKMeVJoyV5EkkNsErQ8pVD3bLHbA==", + "license": "MIT", + "engines": { + "node": "^12.17.0 || ^14.13 || >=16.0.0" + }, + "funding": { + "url": "https://github.com/chalk/chalk?sponsor=1" + } + }, + "node_modules/ink/node_modules/cli-boxes": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/cli-boxes/-/cli-boxes-4.0.1.tgz", + "integrity": "sha512-5IOn+jcCEHEraYolBPs/sT4BxYCe2nHg374OPiItB1O96KZFseS2gthU4twyYzeDcFew4DaUM/xwc5BQf08JJw==", + "license": "MIT", + "engines": { + "node": ">=18.20 <19 || >=20.10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/ink/node_modules/cli-cursor": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/cli-cursor/-/cli-cursor-4.0.0.tgz", + "integrity": "sha512-VGtlMu3x/4DOtIUwEkRezxUZ2lBacNJCHash0N0WeZDBS+7Ux1dm3XWAgWYxLJFMMdOeXMHXorshEFhbMSGelg==", + "license": "MIT", + "dependencies": { + "restore-cursor": "^4.0.0" + }, + "engines": { + "node": "^12.20.0 || ^14.13.1 || >=16.0.0" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/ink/node_modules/onetime": { + "version": "5.1.2", + "resolved": "https://registry.npmjs.org/onetime/-/onetime-5.1.2.tgz", + "integrity": "sha512-kbpaSSGJTWdAY5KPVeMOKXSrPtr8C8C7wodJbcsd51jRnmD+GZu8Y0VoU6Dm5Z4vWr0Ig/1NKuWRKf7j5aaYSg==", + "license": "MIT", + "dependencies": { + "mimic-fn": "^2.1.0" + }, + "engines": { + "node": ">=6" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/ink/node_modules/restore-cursor": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/restore-cursor/-/restore-cursor-4.0.0.tgz", + "integrity": "sha512-I9fPXU9geO9bHOt9pHHOhOkYerIMsmVaWB0rA2AI9ERh/+x/i7MV5HKBNrg+ljO5eoPVgCcnFuRjJ9uH6I/3eg==", + "license": "MIT", + "dependencies": { + "onetime": "^5.1.0", + "signal-exit": "^3.0.2" + }, + "engines": { + "node": "^12.20.0 || ^14.13.1 || >=16.0.0" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/ink/node_modules/signal-exit": { + "version": "3.0.7", + "resolved": "https://registry.npmjs.org/signal-exit/-/signal-exit-3.0.7.tgz", + "integrity": "sha512-wnD2ZE+l+SPC/uoS0vXeE9L1+0wuaMqKlfz9AMUo38JsyLSBWSFcHR1Rri62LZc12vLr1gb3jl7iwQhgwpAbGQ==", + "license": "ISC" + }, + "node_modules/ink/node_modules/string-width": { + "version": "8.2.1", + "resolved": "https://registry.npmjs.org/string-width/-/string-width-8.2.1.tgz", + "integrity": "sha512-IIaP0g3iy9Cyy18w3M9YcaDudujEAVHKt3a3QJg1+sr/oX96TbaGUubG0hJyCjCBThFH+tFpcIyoUHUn1ogaLA==", + "license": "MIT", + "dependencies": { + "get-east-asian-width": "^1.5.0", + "strip-ansi": "^7.1.2" + }, + "engines": { + "node": ">=20" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/ink/node_modules/strip-ansi": { + "version": "7.2.0", + "resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-7.2.0.tgz", + "integrity": "sha512-yDPMNjp4WyfYBkHnjIRLfca1i6KMyGCtsVgoKe/z1+6vukgaENdgGBZt+ZmKPc4gavvEZ5OgHfHdrazhgNyG7w==", + "license": "MIT", + "dependencies": { + "ansi-regex": "^6.2.2" + }, + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/chalk/strip-ansi?sponsor=1" + } + }, + "node_modules/ink/node_modules/widest-line": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/widest-line/-/widest-line-6.0.0.tgz", + "integrity": "sha512-U89AsyEeAsyoF0zVJBkG9zBgekjgjK7yk9sje3F4IQpXBJ10TF6ByLlIfjMhcmHMJgHZI4KHt4rdNfktzxIAMA==", + "license": "MIT", + "dependencies": { + "string-width": "^8.1.0" + }, + "engines": { + "node": ">=20" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/ink/node_modules/wrap-ansi": { + "version": "10.0.0", + "resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-10.0.0.tgz", + "integrity": "sha512-SGcvg80f0wUy2/fXES19feHMz8E0JoXv2uNgHOu4Dgi2OrCy1lqwFYEJz1BLbDI0exjPMe/ZdzZ/YpGECBG/aQ==", + "license": "MIT", + "dependencies": { + "ansi-styles": "^6.2.3", + "string-width": "^8.2.0", + "strip-ansi": "^7.1.2" + }, + "engines": { + "node": ">=20" + }, + "funding": { + "url": "https://github.com/chalk/wrap-ansi?sponsor=1" + } + }, "node_modules/inline-style-parser": { "version": "0.2.7", "resolved": "https://registry.npmjs.org/inline-style-parser/-/inline-style-parser-0.2.7.tgz", @@ -15598,6 +16011,21 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/is-in-ci": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/is-in-ci/-/is-in-ci-2.0.0.tgz", + "integrity": "sha512-cFeerHriAnhrQSbpAxL37W1wcJKUUX07HyLWZCW1URJT/ra3GyUTzBgUnh24TMVfNTV2Hij2HLxkPHFZfOZy5w==", + "license": "MIT", + "bin": { + "is-in-ci": "cli.js" + }, + "engines": { + "node": ">=20" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/is-in-ssh": { "version": "1.0.0", "resolved": "https://registry.npmjs.org/is-in-ssh/-/is-in-ssh-1.0.0.tgz", @@ -17946,7 +18374,6 @@ "version": "2.1.0", "resolved": "https://registry.npmjs.org/mimic-fn/-/mimic-fn-2.1.0.tgz", "integrity": "sha512-OqbOk5oEQeAZ8WXWydlu9HJjz9WVdEIvamMCcXmuqUYjTknH/sqsWvhQ3vgwKFRR1HpjvNBKQ37nbJgYzGqGcg==", - "dev": true, "license": "MIT", "engines": { "node": ">=6" @@ -18900,6 +19327,15 @@ "node": ">= 0.8" } }, + "node_modules/patch-console": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/patch-console/-/patch-console-2.0.0.tgz", + "integrity": "sha512-0YNdUceMdaQwoKce1gatDScmMo5pu/tfABfnzEqeG0gtTmd7mh/WcwgUjtAeOU7N8nFFlbQBnFK2gXW5fGvmMA==", + "license": "MIT", + "engines": { + "node": "^12.20.0 || ^14.13.1 || >=16.0.0" + } + }, "node_modules/path-browserify": { "version": "1.0.1", "resolved": "https://registry.npmjs.org/path-browserify/-/path-browserify-1.0.1.tgz", @@ -19617,6 +20053,21 @@ "license": "MIT", "peer": true }, + "node_modules/react-reconciler": { + "version": "0.33.0", + "resolved": "https://registry.npmjs.org/react-reconciler/-/react-reconciler-0.33.0.tgz", + "integrity": "sha512-KetWRytFv1epdpJc3J4G75I4WrplZE5jOL7Yq0p34+OVOKF4Se7WrdIdVC45XsSSmUTlht2FM/fM1FZb1mfQeA==", + "license": "MIT", + "dependencies": { + "scheduler": "^0.27.0" + }, + "engines": { + "node": ">=0.10.0" + }, + "peerDependencies": { + "react": "^19.2.0" + } + }, "node_modules/react-refresh": { "version": "0.18.0", "resolved": "https://registry.npmjs.org/react-refresh/-/react-refresh-0.18.0.tgz", @@ -20868,6 +21319,49 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/slice-ansi": { + "version": "9.0.0", + "resolved": "https://registry.npmjs.org/slice-ansi/-/slice-ansi-9.0.0.tgz", + "integrity": "sha512-SO/3iYL5S3W57LLEniscOGPZgOqZUPCx6d3dB+52B80yJ0XstzsC/eV8gnA4tM3MHDrKz+OCFSLNjswdSC+/bA==", + "license": "MIT", + "dependencies": { + "ansi-styles": "^6.2.3", + "is-fullwidth-code-point": "^5.1.0" + }, + "engines": { + "node": ">=22" + }, + "funding": { + "url": "https://github.com/chalk/slice-ansi?sponsor=1" + } + }, + "node_modules/slice-ansi/node_modules/ansi-styles": { + "version": "6.2.3", + "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-6.2.3.tgz", + "integrity": "sha512-4Dj6M28JB+oAH8kFkTLUo+a2jwOFkuqb3yucU0CANcRRUbxS0cP0nZYCGjcc3BNXwRIsUVmDGgzawme7zvJHvg==", + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/chalk/ansi-styles?sponsor=1" + } + }, + "node_modules/slice-ansi/node_modules/is-fullwidth-code-point": { + "version": "5.1.0", + "resolved": "https://registry.npmjs.org/is-fullwidth-code-point/-/is-fullwidth-code-point-5.1.0.tgz", + "integrity": "sha512-5XHYaSyiqADb4RnZ1Bdad6cPp8Toise4TzEjcOYDHZkTCbKgiUl7WTUCpNWHuxmDt91wnsZBc9xinNzopv3JMQ==", + "license": "MIT", + "dependencies": { + "get-east-asian-width": "^1.3.1" + }, + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/smart-buffer": { "version": "4.2.0", "resolved": "https://registry.npmjs.org/smart-buffer/-/smart-buffer-4.2.0.tgz", @@ -20972,6 +21466,27 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/stack-utils": { + "version": "2.0.6", + "resolved": "https://registry.npmjs.org/stack-utils/-/stack-utils-2.0.6.tgz", + "integrity": "sha512-XlkWvfIm6RmsWtNJx+uqtKLS8eqFbxUg0ZzLXqY0caEy9l7hruX8IpiDnjsLavoBgqCCR71TqWO8MaXYheJ3RQ==", + "license": "MIT", + "dependencies": { + "escape-string-regexp": "^2.0.0" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/stack-utils/node_modules/escape-string-regexp": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/escape-string-regexp/-/escape-string-regexp-2.0.0.tgz", + "integrity": "sha512-UpzcLCXolUWcNu5HtVMHYdXJjArjsF9C0aNnquZYY4uW/Vu0miy5YoWvbV345HauVvcAUnpRuhMMcqTcGOY2+w==", + "license": "MIT", + "engines": { + "node": ">=8" + } + }, "node_modules/stackback": { "version": "0.0.2", "resolved": "https://registry.npmjs.org/stackback/-/stackback-0.0.2.tgz", @@ -21237,7 +21752,6 @@ "version": "1.0.0", "resolved": "https://registry.npmjs.org/tagged-tag/-/tagged-tag-1.0.0.tgz", "integrity": "sha512-yEFYrVhod+hdNyx7g5Bnkkb0G6si8HJurOoOEgC8B/O0uXLHlaey/65KRv6cuWBNhBgHKAROVpc7QyYqE5gFng==", - "dev": true, "license": "MIT", "engines": { "node": ">=20" @@ -21303,6 +21817,18 @@ "node": ">=6" } }, + "node_modules/terminal-size": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/terminal-size/-/terminal-size-4.0.1.tgz", + "integrity": "sha512-avMLDQpUI9I5XFrklECw1ZEUPJhqzcwSWsyyI8blhRLT+8N1jLJWLWWYQpB2q2xthq8xDvjZPISVh53T/+CLYQ==", + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/throttleit": { "version": "2.1.0", "resolved": "https://registry.npmjs.org/throttleit/-/throttleit-2.1.0.tgz", @@ -22022,7 +22548,6 @@ "version": "5.5.0", "resolved": "https://registry.npmjs.org/type-fest/-/type-fest-5.5.0.tgz", "integrity": "sha512-PlBfpQwiUvGViBNX84Yxwjsdhd1TUlXr6zjX7eoirtCPIr08NAmxwa+fcYBTeRQxHo9YC9wwF3m9i700sHma8g==", - "dev": true, "license": "(MIT OR CC0-1.0)", "dependencies": { "tagged-tag": "^1.0.0" @@ -23507,6 +24032,12 @@ "url": "https://github.com/sponsors/sindresorhus" } }, + "node_modules/yoga-layout": { + "version": "3.2.1", + "resolved": "https://registry.npmjs.org/yoga-layout/-/yoga-layout-3.2.1.tgz", + "integrity": "sha512-0LPOt3AxKqMdFBZA3HBAt/t/8vIKq7VaQYbuA8WxCgung+p9TVyKRYdpvCb80HcdTN2NkbIKbhNwKUfm3tQywQ==", + "license": "MIT" + }, "node_modules/zod": { "version": "4.3.6", "resolved": "https://registry.npmjs.org/zod/-/zod-4.3.6.tgz", diff --git a/package.json b/package.json index 5d4980b65..db1350a9b 100644 --- a/package.json +++ b/package.json @@ -83,6 +83,7 @@ "drizzle-orm": "^0.45.2", "embla-carousel-react": "^8.6.0", "express": "^5.2.1", + "ink": "^7.0.6", "lucide-react": "^1.8.0", "md-pen": "^1.2.0", "motion": "^12.38.0", @@ -118,6 +119,7 @@ "code-inspector-plugin": "^1.5.1", "drizzle-kit": "^0.31.10", "happy-dom": "^20.8.9", + "ink-testing-library": "^4.0.0", "oxfmt": "^0.43.0", "oxlint": "^1.58.0", "oxlint-tsgolint": "^0.19.0", diff --git a/src/agent-extension-host.test.ts b/src/agent-extension-host.test.ts new file mode 100644 index 000000000..4d230dd92 --- /dev/null +++ b/src/agent-extension-host.test.ts @@ -0,0 +1,144 @@ +import { readFileSync } from 'node:fs'; +import { dirname, join } from 'node:path'; +import { fileURLToPath } from 'node:url'; + +import { describe, expect, it } from 'vitest'; + +import { type AgentExtensionConsumerWitness, flattenCapabilityIds } from './agent-extension-host.js'; +import { createPiActions } from './orchestrator/src/pi-actions.js'; +import type { InterviewerTools } from './server/interview.js'; +import { createExplorationTools } from './server/tools/index.js'; + +const here = dirname(fileURLToPath(import.meta.url)); + +// The cook (`execute`) consumer, described as host plugins — one cook action per +// capability. Proven below against the real `createPiActions()` surface. +const cookWitness = { + consumerId: 'cook', + mode: 'execute', + plugins: [ + { + id: 'execute.evaluate-done', + mode: 'execute', + capabilities: [ + { + id: 'evaluate-done', + summary: 'Decide a slice is done by running its verification targets.', + handler: null, + }, + ], + }, + { + id: 'execute.write-tests', + mode: 'execute', + capabilities: [{ id: 'write-tests', summary: 'Write failing tests for a slice.', handler: null }], + }, + { + id: 'execute.write-code', + mode: 'execute', + capabilities: [{ id: 'write-code', summary: 'Write code to make a slice pass.', handler: null }], + }, + { + id: 'execute.assess-semantic', + mode: 'execute', + capabilities: [ + { id: 'assess-semantic', summary: 'Assess semantic satisfaction of a slice.', handler: null }, + ], + }, + { + id: 'execute.verify-epic', + mode: 'execute', + capabilities: [{ id: 'verify-epic', summary: 'Write + run an epic integration test.', handler: null }], + }, + ], +} as const satisfies AgentExtensionConsumerWitness; + +// The interview (`elicit`) consumer as the neutrality WITNESS. The interview keeps +// its own runtime (Vercel AI SDK); this only proves its capability surface fits +// the same host contract. `as const` preserves the literal ids for the type-level +// coverage proof below. +const interviewWitness = { + consumerId: 'interview', + mode: 'elicit', + plugins: [ + { + id: 'elicit.ask-question', + mode: 'elicit', + capabilities: [{ id: 'ask_question', summary: 'Ask the user a structured question.', handler: null }], + }, + { + id: 'elicit.preface', + mode: 'elicit', + capabilities: [ + { id: 'present_preface', summary: 'Present a provisional context preface.', handler: null }, + ], + }, + { + id: 'elicit.phase-closure', + mode: 'elicit', + capabilities: [ + { id: 'propose_phase_closure', summary: 'Propose closing the current phase.', handler: null }, + ], + }, + { + id: 'elicit.workspace-exploration', + mode: 'elicit', + capabilities: [ + { id: 'read_file', summary: 'Read a workspace file.', handler: null }, + { id: 'grep', summary: 'Search workspace file contents.', handler: null }, + { id: 'find_files', summary: 'Find workspace files.', handler: null }, + { id: 'list_directory', summary: 'List a workspace directory.', handler: null }, + ], + }, + ], +} as const satisfies AgentExtensionConsumerWitness; + +describe('agent-extension-host contract is a mode-neutral core', () => { + it('the contract module is dependency-free, which is what keeps it mode-neutral', () => { + const src = readFileSync(join(here, 'agent-extension-host.ts'), 'utf8'); + // No imports is the load-bearing guarantee: a module that imports nothing + // cannot reference an `execute`-only type (Slice/Epic/Plan/Toolchain/worktree…) + // or an SDK type. That makes neutrality structural rather than a denylist of + // names we have to remember to update. + expect(src).not.toMatch(/^\s*import[\s{*]/m); + }); + + it('a consumer witness only loads plugins of its own mode (per-mode registration)', () => { + for (const witness of [cookWitness, interviewWitness]) { + for (const plugin of witness.plugins) { + expect(plugin.mode).toBe(witness.mode); + } + } + }); +}); + +describe('two-consumer proof — both real surfaces fit the host contract', () => { + it('the cook execute surface matches the registered capabilities exactly', () => { + const registered = new Set(flattenCapabilityIds(cookWitness)); + const actual = new Set(Object.keys(createPiActions())); + expect(registered).toEqual(actual); + }); + + it('the interview exploration plugin matches the real tool surface exactly', () => { + // `createExplorationTools` is DB-free, so this family is proven bidirectionally + // against live code: the witness may neither omit a real tool nor invent a + // phantom one. The three native interviewer tools (ask_question / + // present_preface / propose_phase_closure) can't be checked this way — + // constructing them needs a live DB — so their coverage is type-level only + // (the `keyof InterviewerTools` assertion below), which is superset-only: it + // proves the witness omits no real tool, not that it invents none. + const explorationPlugin = interviewWitness.plugins.find((p) => p.id === 'elicit.workspace-exploration'); + const witnessed = new Set(explorationPlugin?.capabilities.map((c) => c.id)); + const actual = new Set(Object.keys(createExplorationTools(here))); + expect(witnessed).toEqual(actual); + }); + + it('the interview witness covers every interviewer tool id (type-enforced under lint --type-check)', () => { + type ElicitCapabilityId = (typeof interviewWitness.plugins)[number]['capabilities'][number]['id']; + // If the interview adds a tool not represented in the witness, `Covered` + // becomes `false` and this assignment fails the type-aware lint gate. + type Covered = keyof InterviewerTools extends ElicitCapabilityId ? true : false; + const covered: Covered = true; + expect(covered).toBe(true); + }); +}); diff --git a/src/agent-extension-host.ts b/src/agent-extension-host.ts new file mode 100644 index 000000000..cb826c246 --- /dev/null +++ b/src/agent-extension-host.ts @@ -0,0 +1,58 @@ +// Agent extension host — the mode-neutral contract (FE-867). +// +// The pi harness is reused across two jobs: driving specification (`elicit`) +// and driving cook (`execute`). Rather than two harnesses, treat it as one +// dual-mode *agent-extension host*: a mode-agnostic core that consumers extend +// by registering capabilities as per-mode plugins. Modes differ only by which +// plugins they load. +// +// This module is the serialization point with the parallel pi-harness work that +// owns the core *implementation*. It deliberately defines only transport-safe +// contract metadata — no session lifecycle, no stream/dispatch runtime, no SDK +// types — so it stays neutral across both consumers (cook via the pi SDK, the +// interview via the Vercel AI SDK) and across whichever runtime lands later. +// +// Invariant (checkable): this file has no imports and names no `execute`-only +// concept (slice / epic / plan / worktree / test-runner / toolchain). If it did, +// it would no longer be a mode-neutral core. See agent-extension-host.test.ts. + +/** The two ways the shared agent-extension host is driven. */ +export type AgentExtensionMode = 'elicit' | 'execute'; + +/** + * Transport-safe descriptor of one capability a consumer registers against the + * host. Mirrors `capability-registry.ts`: metadata only — the executable handler + * lives behind the host's dispatch, so this contract never owns runtime semantics. + */ +export interface AgentExtensionCapabilityContract { + id: string; + summary: string; + handler: null; +} + +/** + * A plugin is the unit of per-mode registration: a named bundle of capabilities + * loaded into one mode. "Modes differ only by which plugins they load" is exactly + * this — `execute` loads the cook plugins, `elicit` loads the interview plugins. + */ +export interface AgentExtensionPluginContract { + id: string; + mode: AgentExtensionMode; + capabilities: readonly AgentExtensionCapabilityContract[]; +} + +/** + * A consumer (e.g. cook, the interview) described as the set of plugins it loads + * into a single mode. Used to prove a real consumer fits the host contract + * without migrating its runtime — the "witness" of mode-neutrality. + */ +export interface AgentExtensionConsumerWitness { + consumerId: string; + mode: AgentExtensionMode; + plugins: readonly AgentExtensionPluginContract[]; +} + +/** Enumerate the capability ids a consumer registers — the host's dispatch keys. */ +export function flattenCapabilityIds(witness: AgentExtensionConsumerWitness): string[] { + return witness.plugins.flatMap((plugin) => plugin.capabilities.map((capability) => capability.id)); +} diff --git a/src/orchestrator/src/app-probe.test.ts b/src/orchestrator/src/app-probe.test.ts new file mode 100644 index 000000000..8e4a76ead --- /dev/null +++ b/src/orchestrator/src/app-probe.test.ts @@ -0,0 +1,170 @@ +// The probe boots a *real* app process in a tmp worktree and exercises it over +// the wire — no mocks — so these tests pin the actual boot/ready/probe/teardown +// behavior the orphan check depends on. Apps are zero-dep `node:http` scripts. + +import { mkdtempSync, rmSync, writeFileSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; + +import { afterEach, describe, expect, it } from 'vitest'; + +import { buildProbeSpec, runProbe } from './app-probe.js'; +import type { ProbeSpec } from './types.js'; + +const dirs: string[] = []; + +afterEach(() => { + for (const dir of dirs.splice(0)) rmSync(dir, { recursive: true, force: true }); +}); + +function sandbox(serverSource: string): string { + const dir = mkdtempSync(join(tmpdir(), 'app-probe-')); + dirs.push(dir); + writeFileSync(join(dir, 'server.js'), serverSource); + return dir; +} + +/** An app that answers `routes` (path → status); everything else is 404. */ +const appServing = (routes: Record): string => + `const http = require('node:http');\n` + + `const routes = ${JSON.stringify(routes)};\n` + + `http.createServer((req, res) => {\n` + + ` const status = routes[req.url] ?? 404;\n` + + ` res.writeHead(status); res.end(String(status));\n` + + `}).listen(Number(process.env.PORT), '127.0.0.1');\n`; + +// Dogfoods the harness-owned spec builder: the test supplies only argv + paths, +// `buildProbeSpec` allocates the port and assembles the URLs the app boots on. +async function specFor(routes: Record): Promise<{ spec: ProbeSpec; dir: string }> { + const spec = await buildProbeSpec({ + boot: ['node', 'server.js'], + readyPath: '/health', + featurePath: '/feature', + }); + return { dir: sandbox(appServing(routes)), spec }; +} + +describe('runProbe classifies real app reachability', () => { + it('an app whose feature endpoint answers 2xx → reachable', async () => { + const { spec, dir } = await specFor({ '/health': 200, '/feature': 200 }); + const result = await runProbe(spec, dir); + expect(result.kind).toBe('reachable'); + expect(result.reachable).toBe(true); + expect(result.status).toBe(200); + }); + + it('an app that boots but 404s the feature endpoint → not-reachable (the orphan)', async () => { + // Feature module present-but-unwired replays as: server up, route absent. + const { spec, dir } = await specFor({ '/health': 200 }); + const result = await runProbe(spec, dir); + expect(result.kind).toBe('not-reachable'); + expect(result.reachable).toBe(false); + expect(result.status).toBe(404); + }); + + it('a boot command that exits immediately → infra (distinct from not-reachable)', async () => { + const dir = sandbox('process.exit(1);\n'); + const result = await runProbe( + { boot: ['node', 'server.js'], readyUrl: 'http://127.0.0.1:1/x', featureUrl: 'http://127.0.0.1:1/x' }, + dir, + ); + expect(result.kind).toBe('infra'); + expect(result.reachable).toBe(false); + }); + + it('a missing boot binary → infra, not a crash', async () => { + const dir = sandbox(appServing({ '/health': 200 })); + const started = Date.now(); + const result = await runProbe( + { + boot: ['definitely-not-a-real-binary-xyz'], + readyUrl: 'http://127.0.0.1:1/x', + featureUrl: 'http://127.0.0.1:1/x', + }, + dir, + ); + expect(result.kind).toBe('infra'); + expect(Date.now() - started).toBeLessThan(1_000); + }); +}); + +describe('runProbe bounds its HTTP calls so a hung app cannot hang the probe', () => { + // A server that accepts connections (and the HTTP request) but never sends a + // response — the case the wall-clock deadline alone can't catch, because a + // bare `await fetch` would block forever between deadline checks. + const neverResponds = (readyRoutes: Record = {}): string => + `const http = require('node:http');\n` + + `const ready = ${JSON.stringify(readyRoutes)};\n` + + `http.createServer((req, res) => {\n` + + ` if (ready[req.url] !== undefined) { res.writeHead(ready[req.url]); res.end('ok'); return; }\n` + + ` /* otherwise: never respond */\n` + + `}).listen(Number(process.env.PORT), '127.0.0.1');\n`; + + it('a ready path that accepts connections but never responds → infra within the deadline', async () => { + const spec = await buildProbeSpec({ + boot: ['node', 'server.js'], + readyPath: '/health', + featurePath: '/feature', + }); + const dir = sandbox(neverResponds()); + const started = Date.now(); + const result = await runProbe(spec, dir, { readyTimeoutMs: 600, readyAttemptMs: 2_000 }); + expect(result.kind).toBe('infra'); + expect(Date.now() - started).toBeLessThan(1_200); + }); + + it('a booted app whose feature endpoint never responds → infra, not a hang', async () => { + const spec = await buildProbeSpec({ + boot: ['node', 'server.js'], + readyPath: '/health', + featurePath: '/feature', + }); + const dir = sandbox(neverResponds({ '/health': 200 })); + const result = await runProbe(spec, dir, { requestTimeoutMs: 300 }); + expect(result.kind).toBe('infra'); + expect(result.output).toMatch(/feature probe request failed/); + }); +}); + +describe('runProbe tears the boot process down', () => { + it('the booted app is no longer listening after the probe returns', async () => { + const { spec, dir } = await specFor({ '/health': 200, '/feature': 200 }); + await runProbe(spec, dir); + // The port the app bound should be free again — nothing left listening. + await expect(fetch(spec.featureUrl)).rejects.toThrow(); + }); +}); + +describe('buildProbeSpec resolves a target into a runnable spec', () => { + it('allocates a port and assembles ready/feature URLs from the paths', async () => { + const spec = await buildProbeSpec({ + boot: ['node', 'server.js'], + readyPath: '/health', + featurePath: '/feature', + }); + const port = Number(spec.env?.PORT); + expect(port).toBeGreaterThan(0); + expect(spec.readyUrl).toBe(`http://127.0.0.1:${port}/health`); + expect(spec.featureUrl).toBe(`http://127.0.0.1:${port}/feature`); + expect(spec.boot).toEqual(['node', 'server.js']); + }); + + it('layers caller env under the allocated PORT so PORT always wins', async () => { + const spec = await buildProbeSpec({ + boot: ['node', 'server.js'], + readyPath: '/', + featurePath: '/', + env: { NODE_ENV: 'test', PORT: '1' }, + }); + expect(spec.env?.NODE_ENV).toBe('test'); + expect(Number(spec.env?.PORT)).toBeGreaterThan(1); + }); + + it('hands out distinct ports across concurrent allocations', async () => { + const specs = await Promise.all( + Array.from({ length: 8 }, () => buildProbeSpec({ boot: ['x'], readyPath: '/', featurePath: '/' })), + ); + const ports = specs.map((s) => Number(s.env?.PORT)); + expect(new Set(ports).size).toBe(ports.length); + }); +}); diff --git a/src/orchestrator/src/app-probe.ts b/src/orchestrator/src/app-probe.ts new file mode 100644 index 000000000..1ad4b733c --- /dev/null +++ b/src/orchestrator/src/app-probe.ts @@ -0,0 +1,193 @@ +// App runtime probe (FE-875): build/boot the host app from the cook worktree and +// exercise one feature endpoint over the wire, returning a structured, read-only +// verdict the cook agent cannot self-report. This is the *app-execution* analogue +// of `test-runner.ts`'s test execution — the reachability mechanism behind +// `integration-oracle`. +// +// Boundary (anti-overengineering): the value is the deterministic, unshortcuttable +// *check* of the result — not the boot action. The boot argv + URLs are inputs +// (`ProbeSpec`), so the boot mechanics may lean on the agent's `bash` rather than a +// bespoke per-stack boot engine. The same discipline keeps `evaluate-done` +// read-only (`pi-actions.ts`). + +import { type ChildProcess, spawn } from 'node:child_process'; +import { createServer } from 'node:net'; + +import type { ProbeResult, ProbeSpec, ProbeTarget } from './types.js'; + +const READY_TIMEOUT_MS = 10_000; +const READY_POLL_MS = 150; +const READY_ATTEMPT_MS = 2_000; +const REQUEST_TIMEOUT_MS = 5_000; +const TEARDOWN_GRACE_MS = 2_000; +const DEFAULT_HOST = '127.0.0.1'; + +/** + * Per-call timeouts so the probe can never hang on a server that accepts a + * connection but never responds. Overridable (tests use small values); each + * defaults to the module constant. + */ +export type ProbeTimeouts = { + /** Overall deadline for the app to become ready (default READY_TIMEOUT_MS). */ + readyTimeoutMs?: number; + /** Timeout for a single readiness poll (default READY_ATTEMPT_MS). */ + readyAttemptMs?: number; + /** Timeout for the feature-probe request (default REQUEST_TIMEOUT_MS). */ + requestTimeoutMs?: number; +}; + +const delay = (ms: number): Promise => new Promise((resolve) => setTimeout(resolve, ms)); + +/** + * Resolve a `ProbeTarget` (boot argv + paths) into a runnable `ProbeSpec` by + * picking a port and assembling the ready/feature URLs. URL/env assembly is the + * harness-owned piece: the boot argv + paths come from cook-time grounding, but + * the port must not be hardcoded — under parallel cook each slice boots its own + * app and a fixed port would collide. The allocated `PORT` is exposed to the + * boot process via env (the near-universal convention); caller env is layered + * first so `PORT` always wins. Always loopback — non-loopback bind-host + * semantics aren't owned here and aren't needed for the reachability check. + */ +export async function buildProbeSpec(target: ProbeTarget): Promise { + const port = await allocatePort(); + const base = `http://${DEFAULT_HOST}:${port}`; + return { + boot: target.boot, + readyUrl: `${base}${target.readyPath}`, + featureUrl: `${base}${target.featurePath}`, + env: { ...target.env, PORT: String(port) }, + }; +} + +/** + * Best-effort free ephemeral port. Bind :0, read the assigned port, release it. + * There is an inherent release-then-claim window (TOCTOU): another process could + * grab the port before the boot child binds it. On loopback with OS-assigned + * ephemeral ports this is rare and acceptable for this harness — if it ever + * causes real flake, the booted app's actual bound port becomes the source of + * truth (a later frontier), not a retry loop here. + */ +function allocatePort(): Promise { + return new Promise((resolve, reject) => { + const srv = createServer(); + srv.once('error', reject); + srv.listen(0, DEFAULT_HOST, () => { + const addr = srv.address(); + const port = typeof addr === 'object' && addr ? addr.port : 0; + srv.close(() => resolve(port)); + }); + }); +} + +/** + * Boot the app, wait until it accepts connections, probe the feature endpoint, + * classify the outcome, and always tear the boot process down. The feature is + * `reachable` when it answers < 400 (wired into the running app), `not-reachable` + * when the app is up but the endpoint is absent/erroring (the orphan), and + * `infra` when the app never came up or the probe request itself failed. + */ +export async function runProbe( + spec: ProbeSpec, + sandboxDir: string, + timeouts: ProbeTimeouts = {}, +): Promise { + const readyTimeoutMs = timeouts.readyTimeoutMs ?? READY_TIMEOUT_MS; + const readyAttemptMs = timeouts.readyAttemptMs ?? READY_ATTEMPT_MS; + const requestTimeoutMs = timeouts.requestTimeoutMs ?? REQUEST_TIMEOUT_MS; + const [command, ...args] = spec.boot; + const child = spawn(command!, args, { + cwd: sandboxDir, + env: { ...process.env, ...spec.env }, + stdio: ['ignore', 'pipe', 'pipe'], + }); + + const chunks: string[] = []; + child.stdout?.on('data', (c: Buffer) => chunks.push(c.toString('utf8'))); + child.stderr?.on('data', (c: Buffer) => chunks.push(c.toString('utf8'))); + // A spawn error (ENOENT) means the binary never started — bail immediately + // rather than polling for the full readiness timeout. + let bootError = ''; + let spawnFailed = false; + child.on('error', (err) => { + spawnFailed = true; + bootError = String(err); + }); + const output = (): string => [bootError, chunks.join('')].filter(Boolean).join('\n'); + + try { + const ready = await waitForReady( + spec.readyUrl, + () => hasExited(child) || bootError !== '', + readyTimeoutMs, + readyAttemptMs, + ); + if (!ready) { + const why = + bootError !== '' + ? 'boot process failed to start' + : hasExited(child) + ? 'boot process exited before becoming ready' + : 'boot did not become ready within timeout'; + return { kind: 'infra', reachable: false, output: `${why}\n${output()}` }; + } + + let status: number; + try { + status = (await fetch(spec.featureUrl, { signal: AbortSignal.timeout(requestTimeoutMs) })).status; + } catch (err) { + return { + kind: 'infra', + reachable: false, + output: `feature probe request failed: ${String(err)}\n${output()}`, + }; + } + + if (status < 400) return { kind: 'reachable', reachable: true, status, output: output() }; + // Booted but the endpoint is absent (404) or erroring — the feature is not + // wired into the running app. `status` is carried so callers see the detail. + return { kind: 'not-reachable', reachable: false, status, output: output() }; + } finally { + await teardown(child, () => spawnFailed); + } +} + +function hasExited(child: ChildProcess): boolean { + return child.exitCode !== null || child.signalCode !== null; +} + +/** + * Poll until the app answers any HTTP response, boot gives up, or we time out. + * Each poll carries its own `attemptMs` timeout (`AbortSignal.timeout`) so a + * connection that is accepted but never answered aborts the attempt instead of + * blocking forever — otherwise the wall-clock `deadline` (only checked between + * attempts) would never be reached. + */ +async function waitForReady( + url: string, + bootGaveUp: () => boolean, + timeoutMs: number, + attemptMs: number, +): Promise { + const deadline = Date.now() + timeoutMs; + while (Date.now() < deadline) { + if (bootGaveUp()) return false; + const remainingMs = deadline - Date.now(); + try { + // Any HTTP response (even 404) means the server is accepting connections. + await fetch(url, { signal: AbortSignal.timeout(Math.min(attemptMs, remainingMs)) }); + return true; + } catch { + await delay(Math.min(READY_POLL_MS, Math.max(0, deadline - Date.now()))); + } + } + return false; +} + +/** SIGTERM, then SIGKILL if it doesn't exit — never leave an orphaned boot. */ +async function teardown(child: ChildProcess, spawnFailed: () => boolean): Promise { + if (spawnFailed() || hasExited(child)) return; + const exited = new Promise((resolve) => child.once('exit', () => resolve())); + child.kill('SIGTERM'); + const died = await Promise.race([exited.then(() => true), delay(TEARDOWN_GRACE_MS).then(() => false)]); + if (!died) child.kill('SIGKILL'); +} diff --git a/src/orchestrator/src/cook-cli.ts b/src/orchestrator/src/cook-cli.ts index 98d5eb040..585dfe4a6 100644 --- a/src/orchestrator/src/cook-cli.ts +++ b/src/orchestrator/src/cook-cli.ts @@ -2,6 +2,7 @@ import { spawnSync } from 'node:child_process'; import { existsSync } from 'node:fs'; import { join, resolve } from 'node:path'; +import { cookBannerLines, cookSummaryLines } from './cook-report.js'; import { createOrchestrator } from './engine.js'; import { type MergeConflict, mergeCompletedSlicesIntoTree } from './epic-sandbox-merge.js'; import { FileReportSink } from './file-report-sink.js'; @@ -12,8 +13,9 @@ import { createPetrinautStreamBus, type PetrinautStreamBus } from './petrinaut-s import { createPetrinautStreamServer, type PetrinautStreamServer } from './petrinaut-stream-server.js'; import { createPiActions } from './pi-actions.js'; import { loadPlan } from './plan-loader.js'; +import type { CookBus } from './presenter.js'; import { resolveToolchain } from './project-profile.js'; -import { promoteGreenfieldRun } from './promote-run.js'; +import { promoteBrownfieldRun, promoteGreenfieldRun } from './promote-run.js'; import { parseSpecId, resolveLatestSpecPlanPath, specPlanPath, specsRootDir } from './spec-plan-paths.js'; import { ToolchainTestRunner } from './test-runner.js'; import type { Plan, PlanMode } from './types.js'; @@ -401,7 +403,16 @@ function isCleanGitWorkingTree(dir: string): GitWorkingTreeCheck { return { kind: 'dirty', status }; } -export async function runCook(opts: CookOptions): Promise { +export async function runCook(opts: CookOptions, bus: CookBus): Promise { + const line = (text: string) => bus.emit({ kind: 'line', text }); + const promoting = (label: string, fn: () => T): T => { + bus.emit({ kind: 'activity-start', id: 'promote', label }); + try { + return fn(); + } finally { + bus.emit({ kind: 'activity-end', id: 'promote' }); + } + }; const launchCwd = process.env.BRUNCH_LAUNCH_CWD || process.cwd(); // Streaming pre-flight happens before any cook side effect (banner, plan @@ -416,7 +427,7 @@ export async function runCook(opts: CookOptions): Promise { env: { PETRINAUT_URL: process.env.PETRINAUT_URL }, }); if ('error' in resolvedUrl) { - console.error(resolvedUrl.error); + line(resolvedUrl.error); process.exit(1); } petrinautUrl = resolvedUrl.url; @@ -425,7 +436,7 @@ export async function runCook(opts: CookOptions): Promise { const resolved = resolveCookPlan(opts.dir, opts.specId); if (resolved.kind === 'error') { - console.error(resolved.message); + line(resolved.message); process.exit(1); } @@ -434,7 +445,7 @@ export async function runCook(opts: CookOptions): Promise { // Worktree strategy follows the plan's spec-derived mode, not its location. const sandbox = resolveSandboxPlan(plan.mode, resolved.sourceDir); if (sandbox.kind === 'error') { - console.error(sandbox.message); + line(sandbox.message); process.exit(1); } @@ -451,15 +462,16 @@ export async function runCook(opts: CookOptions): Promise { const epicCount = plan.epics.length; const sliceCount = plan.slices.length; - console.error(''); - console.error(` brunch cook`); - console.error(` ──────────────────────────────────────`); - console.error(` policy ${opts.policy}`); - console.error(` plan ${epicCount} epics, ${sliceCount} slices`); - console.error(` retries ${opts.maxRetries}`); - console.error(` sandbox ${sandboxDir}`); - console.error(` reports ${reportsPath}`); - console.error(''); + for (const l of cookBannerLines({ + policy: opts.policy, + epicCount, + sliceCount, + maxRetries: opts.maxRetries, + sandboxDir, + reportsPath, + })) { + line(l); + } const reports = new FileReportSink(reportsPath); const toolchain = resolveToolchain(plan.profile); @@ -468,7 +480,15 @@ export async function runCook(opts: CookOptions): Promise { const engine = createOrchestrator(opts.policy); const runStart = Date.now(); - const actions = createPiActions({ verbose: opts.verbose, runStart, toolchain }); + // Seed the presenter's elapsed clock; per-action progress carries no + // pre-formatted timing — the presenter owns it (I136-K). + bus.emit({ kind: 'cook-start', runStart }); + const actions = createPiActions({ + verbose: opts.verbose, + emit: (event) => bus.emit(event), + toolchain, + testRunner, + }); // Stand up the live-stream setup handle when streaming is enabled. // Auto-open is suppressed by `--no-petrinaut-open` or CI. @@ -478,6 +498,7 @@ export async function runCook(opts: CookOptions): Promise { petrinautUrl, shouldOpen: opts.petrinautOpen && !process.env.CI, openUrl: defaultOpenUrl, + log: (text) => line(text), ...(streamPort !== undefined ? { port: streamPort } : {}), }) : undefined; @@ -504,41 +525,31 @@ export async function runCook(opts: CookOptions): Promise { const duration = fmtDuration(Date.now() - runStart); const ok = result.status === 'completed'; - console.error(''); - console.error(` ──────────────────────────────────────`); - console.error( - ` ${ok ? '✓' : '✗'} ${result.status}${result.reason ? ` — ${result.reason}` : ''} (${duration})`, - ); - for (const warning of result.warnings) { - console.error(` ! ${warning}`); - } - console.error(''); - - for (const e of result.epics) { - const icon = e.status === 'completed' ? '✓' : '✗'; - const slices = result.slices.filter( - (s) => plan.slices.find((ps) => ps.id === s.sliceId)?.epic_id === e.epicId, - ); - const sliceSummary = slices - .map((s) => `${s.status === 'completed' ? '✓' : '✗'} ${s.sliceId}`) - .join(' '); - console.error(` ${icon} ${e.epicId}`); - console.error(` ${sliceSummary}`); + for (const l of cookSummaryLines({ + status: result.status, + ...(result.reason ? { reason: result.reason } : {}), + duration, + warnings: result.warnings, + epics: result.epics, + slices: result.slices, + planSlices: plan.slices, + reportCount: result.reports.length, + reportsPath, + })) { + line(l); } - console.error(''); - console.error(` ${result.reports.length} events → ${reportsPath}`); - console.error(''); - - // Promotion-back is opt-in via --out and greenfield-only; a run that did - // not complete promotes nothing (the artifact stays inspectable). - if (opts.outDir) { - if (sandbox.kind === 'codebase') { - console.error(` ! --out promotion is greenfield-only; brownfield output stays at ${sandboxDir}`); - console.error(''); - } else if (!ok) { - console.error(` ! run did not complete — nothing promoted. Artifact: ${sandboxDir}`); - console.error(''); + // Brownfield promotion is automatic (the result already lives on the repo's + // own `cook/` branch); greenfield promotion is opt-in via --out. A run + // that did not complete promotes nothing — the artifact stays inspectable. + if (sandbox.kind === 'codebase') { + if (opts.outDir) { + line(` ! --out is ignored for brownfield; the result lands on cook/${runId} in the repo`); + line(''); + } + if (!ok) { + line(` ! run did not complete — nothing promoted. Artifact: ${sandboxDir}`); + line(''); } else { try { const source = promotionSourceDir({ @@ -549,23 +560,55 @@ export async function runCook(opts: CookOptions): Promise { completedSliceIds: result.slices.filter((s) => s.status === 'completed').map((s) => s.sliceId), }); for (const c of source.conflicts) { - console.error( - ` ! merge conflict on ${c.path} (slices ${c.slices.join(', ')}; kept ${c.winner})`, - ); + line(` ! merge conflict on ${c.path} (slices ${c.slices.join(', ')}; kept ${c.winner})`); } - const promoted = promoteGreenfieldRun({ - sandboxDir: source.dir, - target: opts.outDir, - runId, - force: opts.force, + const promoted = promoting(`promoting → cook/${runId}`, () => + promoteBrownfieldRun({ + sourceDir: sandbox.sourceDir, + sourceTreeDir: source.dir, + runId, + }), + ); + line( + ` ✓ promoted → ${promoted.branch} @ ${promoted.commit.slice(0, 8)} (merge it into your branch when ready)`, + ); + line(''); + } catch (err) { + line(` ✗ promotion failed: ${err instanceof Error ? err.message : String(err)}`); + line(''); + recordCookExitStatus(false); + return; + } + } + } else if (opts.outDir) { + if (!ok) { + line(` ! run did not complete — nothing promoted. Artifact: ${sandboxDir}`); + line(''); + } else { + try { + const source = promotionSourceDir({ + sliceLayout, + sandboxDir, + runDir, + plan, + completedSliceIds: result.slices.filter((s) => s.status === 'completed').map((s) => s.sliceId), }); - console.error( - ` ✓ promoted → ${promoted.target} (${promoted.branch} @ ${promoted.commit.slice(0, 8)})`, + for (const c of source.conflicts) { + line(` ! merge conflict on ${c.path} (slices ${c.slices.join(', ')}; kept ${c.winner})`); + } + const promoted = promoting(`promoting → ${opts.outDir}`, () => + promoteGreenfieldRun({ + sandboxDir: source.dir, + target: opts.outDir!, + runId, + force: opts.force, + }), ); - console.error(''); + line(` ✓ promoted → ${promoted.target} (${promoted.branch} @ ${promoted.commit.slice(0, 8)})`); + line(''); } catch (err) { - console.error(` ✗ promotion failed: ${err instanceof Error ? err.message : String(err)}`); - console.error(''); + line(` ✗ promotion failed: ${err instanceof Error ? err.message : String(err)}`); + line(''); recordCookExitStatus(false); return; } diff --git a/src/orchestrator/src/cook-report.test.ts b/src/orchestrator/src/cook-report.test.ts new file mode 100644 index 000000000..0eaf79848 --- /dev/null +++ b/src/orchestrator/src/cook-report.test.ts @@ -0,0 +1,94 @@ +import { describe, expect, it } from 'vitest'; + +import { cookBannerLines, cookSummaryLines } from './cook-report.js'; + +describe('cookBannerLines', () => { + it('renders the cook banner block byte-for-byte', () => { + expect( + cookBannerLines({ + policy: 'serial', + epicCount: 2, + sliceCount: 5, + maxRetries: 3, + sandboxDir: '/runs/abc/worktree', + reportsPath: '/runs/abc/reports.jsonl', + }), + ).toEqual([ + '', + ' brunch cook', + ' ──────────────────────────────────────', + ' policy serial', + ' plan 2 epics, 5 slices', + ' retries 3', + ' sandbox /runs/abc/worktree', + ' reports /runs/abc/reports.jsonl', + '', + ]); + }); +}); + +describe('cookSummaryLines', () => { + it('renders a completed run with its epic/slice tree and event count', () => { + expect( + cookSummaryLines({ + status: 'completed', + duration: '1m02s', + warnings: [], + epics: [{ epicId: 'api', status: 'completed' }], + slices: [ + { sliceId: 'login', status: 'completed' }, + { sliceId: 'logout', status: 'completed' }, + ], + planSlices: [ + { id: 'login', epic_id: 'api' }, + { id: 'logout', epic_id: 'api' }, + ], + reportCount: 12, + reportsPath: '/runs/abc/reports.jsonl', + }), + ).toEqual([ + '', + ' ──────────────────────────────────────', + ' ✓ completed (1m02s)', + '', + ' ✓ api', + ' ✓ login ✓ logout', + '', + ' 12 events → /runs/abc/reports.jsonl', + '', + ]); + }); + + it('renders a halted run with its reason and warnings, and per-epic/slice failure marks', () => { + expect( + cookSummaryLines({ + status: 'halted', + reason: 'budget exhausted', + duration: '8.4s', + warnings: ['retry budget hit on login'], + epics: [{ epicId: 'api', status: 'halted' }], + slices: [ + { sliceId: 'login', status: 'failed' }, + { sliceId: 'logout', status: 'completed' }, + ], + planSlices: [ + { id: 'login', epic_id: 'api' }, + { id: 'logout', epic_id: 'api' }, + ], + reportCount: 7, + reportsPath: '/r.jsonl', + }), + ).toEqual([ + '', + ' ──────────────────────────────────────', + ' ✗ halted — budget exhausted (8.4s)', + ' ! retry budget hit on login', + '', + ' ✗ api', + ' ✗ login ✓ logout', + '', + ' 7 events → /r.jsonl', + '', + ]); + }); +}); diff --git a/src/orchestrator/src/cook-report.ts b/src/orchestrator/src/cook-report.ts new file mode 100644 index 000000000..c4fd9eb76 --- /dev/null +++ b/src/orchestrator/src/cook-report.ts @@ -0,0 +1,63 @@ +// Pure line-builders for `brunch cook`'s banner and completion summary. +// +// Extracted from runCook so the exact text is golden-testable without booting +// the engine (ln-review #3 — the strings had no oracle). runCook feeds these +// lines to the presentation bus. + +export type CookBannerInput = { + policy: string; + epicCount: number; + sliceCount: number; + maxRetries: number; + sandboxDir: string; + reportsPath: string; +}; + +export function cookBannerLines(input: CookBannerInput): string[] { + return [ + '', + ' brunch cook', + ' ──────────────────────────────────────', + ` policy ${input.policy}`, + ` plan ${input.epicCount} epics, ${input.sliceCount} slices`, + ` retries ${input.maxRetries}`, + ` sandbox ${input.sandboxDir}`, + ` reports ${input.reportsPath}`, + '', + ]; +} + +export type CookSummaryInput = { + status: string; + reason?: string; + duration: string; + warnings: string[]; + epics: { epicId: string; status: string }[]; + slices: { sliceId: string; status: string }[]; + planSlices: { id: string; epic_id: string }[]; + reportCount: number; + reportsPath: string; +}; + +export function cookSummaryLines(input: CookSummaryInput): string[] { + const ok = input.status === 'completed'; + const lines: string[] = [ + '', + ' ──────────────────────────────────────', + ` ${ok ? '✓' : '✗'} ${input.status}${input.reason ? ` — ${input.reason}` : ''} (${input.duration})`, + ]; + for (const warning of input.warnings) lines.push(` ! ${warning}`); + lines.push(''); + + for (const epic of input.epics) { + const icon = epic.status === 'completed' ? '✓' : '✗'; + const sliceSummary = input.slices + .filter((s) => input.planSlices.find((ps) => ps.id === s.sliceId)?.epic_id === epic.epicId) + .map((s) => `${s.status === 'completed' ? '✓' : '✗'} ${s.sliceId}`) + .join(' '); + lines.push(` ${icon} ${epic.epicId}`, ` ${sliceSummary}`); + } + + lines.push('', ` ${input.reportCount} events → ${input.reportsPath}`, ''); + return lines; +} diff --git a/src/orchestrator/src/cow-copy.ts b/src/orchestrator/src/cow-copy.ts index bbd90104d..5b19b22b2 100644 --- a/src/orchestrator/src/cow-copy.ts +++ b/src/orchestrator/src/cow-copy.ts @@ -1,5 +1,5 @@ import { spawnSync } from 'node:child_process'; -import { cpSync, existsSync, readdirSync } from 'node:fs'; +import { cpSync, existsSync, readdirSync, symlinkSync } from 'node:fs'; import { join, resolve } from 'node:path'; /** @@ -23,16 +23,24 @@ export function cowCopy(src: string, dest: string): void { /** Top-level names skipped when CoW-copying into cook sandboxes. */ export const COW_COPY_DEFAULT_EXCLUDE = new Set(['.git', '.brunch']); +const NO_SYMLINKS: ReadonlySet = new Set(); + /** - * CoW-copy top-level entries from `sourceDir` that are absent in `destDir` + * Provision top-level entries from `sourceDir` that are absent in `destDir` * (untracked/gitignored dirs like `node_modules/`, `dist/`). Skips names in * `exclude` and entries already present in the destination (typically tracked * files materialized by `git worktree add`). + * + * Names in `symlink` are linked to the source entry instead of copied — used to + * share a single read-only `node_modules/` across slice sandboxes rather than + * paying a CoW copy per slice. Everything else is CoW-copied (lazy on APFS / + * reflink filesystems, deep copy otherwise). */ export function copyMissingTopLevelEntries( sourceDir: string, destDir: string, exclude: ReadonlySet = COW_COPY_DEFAULT_EXCLUDE, + symlink: ReadonlySet = NO_SYMLINKS, ): void { const source = resolve(sourceDir); const dest = resolve(destDir); @@ -40,6 +48,11 @@ export function copyMissingTopLevelEntries( if (exclude.has(entry)) continue; const destPath = join(dest, entry); if (existsSync(destPath)) continue; - cowCopy(join(source, entry), destPath); + const sourcePath = join(source, entry); + if (symlink.has(entry)) { + symlinkSync(sourcePath, destPath); + } else { + cowCopy(sourcePath, destPath); + } } } diff --git a/src/orchestrator/src/engine-contract.test.ts b/src/orchestrator/src/engine-contract.test.ts index a801b7e28..5dd81ad96 100644 --- a/src/orchestrator/src/engine-contract.test.ts +++ b/src/orchestrator/src/engine-contract.test.ts @@ -36,6 +36,7 @@ const engines = [ function createFakes(opts?: { evalSequence?: boolean[]; // sequence of done values for evaluate-done testRunResults?: boolean[]; // sequence of passed values for test runner + testFailureKind?: 'infra' | 'test'; // failureKind stamped on failed runs (default: test) verifyEpicResult?: boolean; // result of verify-epic semanticResults?: boolean[]; // sequence of satisfied values for assess-semantic throwOnAction?: string; // action name that throws @@ -136,7 +137,8 @@ function createFakes(opts?: { const passed = testSeq[testRunIdx % testSeq.length]!; testRunIdx++; callOrder.push(`run-tests:${passed ? 'pass' : 'fail'}`); - return { passed, output: passed ? 'ok' : 'FAIL' }; + if (passed) return { passed, output: 'ok' }; + return { passed, output: 'FAIL', failureKind: opts?.testFailureKind ?? 'test' }; }, }; @@ -924,6 +926,36 @@ describe('Adapter: §7 event vocabulary', () => { // the halt token deposited on the `:halted` place. expect(halted[0]!.reason).toMatch(/retry exhaustion/); }); + + it('infra failure names the toolchain cause in the halt reason', async () => { + // FE-872: an exhausted run whose verification hit infra/toolchain failure + // must not read as "retry exhaustion" — that misdirects the reader to the + // code instead of the runner/toolchain. + const fakes = createFakes({ testRunResults: [false], testFailureKind: 'infra' }); + const ctx: RunCtx = { + reportIds: [], + sliceOutcomes: new Map(), + epicOutcomes: new Map(), + }; + const input: OrchestratorInput = { + plan: simplePlan, + sandboxDir: '/tmp/fake', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 1 }, + }; + + const net = compilePlan(input, ctx); + const events: NetEvent[] = []; + await net.run('serial', () => net.hasHaltToken(), { emit: (e) => events.push(e) }); + + const halted = events.filter((e) => e.kind === 'net_halted'); + expect(halted.length).toBe(1); + expect(halted[0]!.reason).toMatch(/toolchain\/install failure/); + expect(halted[0]!.reason).not.toMatch(/never ran/); + expect(halted[0]!.reason).not.toMatch(/retry exhaustion/); + }); }); // --------------------------------------------------------------------------- @@ -1458,9 +1490,14 @@ describe('Engine contract test #12 — parallel fires concurrently', () => { const parallelMs = Date.now() - t1; // Parallel should be no slower than serial (they're effectively equal - // now that async dispatch lets handlers overlap in both policies). - // Allow a small constant slack for scheduling jitter. - expect(parallelMs).toBeLessThan(serialMs + 25); + // now that async dispatch lets handlers overlap in both policies). The + // tolerance scales with serialMs because both runs absorb scheduling jitter + // and CPU contention from concurrent test files (real-process suites elsewhere + // can starve the event loop); an absolute slack flakes under that load. A true + // regression — parallel policy serializing its handlers — would be many × + // serialMs (the plan fans out ~15 async handlers at DELAY_MS each), well past + // this bound. + expect(parallelMs).toBeLessThan(serialMs * 2 + 50); }); }); diff --git a/src/orchestrator/src/epic-sandbox-merge.test.ts b/src/orchestrator/src/epic-sandbox-merge.test.ts index 14cdb91ca..cae35ac12 100644 --- a/src/orchestrator/src/epic-sandbox-merge.test.ts +++ b/src/orchestrator/src/epic-sandbox-merge.test.ts @@ -1,9 +1,11 @@ import { execFileSync } from 'node:child_process'; import { existsSync, + lstatSync, mkdirSync, mkdtempSync, readFileSync, + readlinkSync, rmSync, symlinkSync, writeFileSync, @@ -14,6 +16,7 @@ import { dirname, join } from 'node:path'; import { afterEach, describe, expect, it } from 'vitest'; import { + ensureSliceWorktree, epicIdsForEpicVerifyMerge, mergeCompletedSlicesIntoTree, mergeSlicesIntoEpicSandbox, @@ -274,19 +277,31 @@ describe('seedSliceFromParentWorktree', () => { expect(readFileSync(join(sliceDir, 'src/a.ts'), 'utf8')).toBe('export const a = 1;\n'); }); - it('untracked content arrives via CoW copy from the parent', () => { + it('untracked content (other than node_modules) arrives via CoW copy from the parent', () => { const { parent, addUntracked } = makeGitParentWorktree('r2'); - // Simulate node_modules / generated artifacts present in the parent - // worktree but NOT tracked by git. - addUntracked('node_modules/dep/index.js', 'module.exports = 1;\n'); + // Simulate generated artifacts present in the parent worktree but NOT + // tracked by git. `dist/` is copied (a slice may rebuild it independently). addUntracked('dist/bundle.js', 'console.log("bundle");\n'); const sliceDir = seedSliceFromParentWorktree(parent, 'only', singleSlicePlan, 'r2'); - expect(readFileSync(join(sliceDir, 'node_modules/dep/index.js'), 'utf8')).toBe('module.exports = 1;\n'); + expect(lstatSync(join(sliceDir, 'dist')).isSymbolicLink()).toBe(false); expect(readFileSync(join(sliceDir, 'dist/bundle.js'), 'utf8')).toBe('console.log("bundle");\n'); }); + it('shares node_modules via a symlink to the parent rather than copying it', () => { + const { parent, addUntracked } = makeGitParentWorktree('r2b'); + addUntracked('node_modules/dep/index.js', 'module.exports = 1;\n'); + + const sliceDir = seedSliceFromParentWorktree(parent, 'only', singleSlicePlan, 'r2b'); + + const linkPath = join(sliceDir, 'node_modules'); + expect(lstatSync(linkPath).isSymbolicLink()).toBe(true); + expect(readlinkSync(linkPath)).toBe(join(parent, 'node_modules')); + // Resolves transparently for pi-actions reading deps through the link. + expect(readFileSync(join(linkPath, 'dep/index.js'), 'utf8')).toBe('module.exports = 1;\n'); + }); + it('slice worktree is checked out on a slice-level cook branch', () => { const { parent } = makeGitParentWorktree('r3'); @@ -343,6 +358,85 @@ describe('seedSliceFromParentWorktree', () => { ); }); +describe('ensureSliceWorktree', () => { + const dirs: string[] = []; + afterEach(() => { + for (const d of dirs) rmSync(d, { recursive: true, force: true }); + dirs.length = 0; + }); + + const singleSlicePlan: Plan = { + mode: 'brownfield', + epics: [{ id: 'e1', summary: '', depends_on: [], verification: [] }], + slices: [{ id: 'only', epic_id: 'e1', definition: '', depends_on: [], verification: [] }], + }; + + function makeGitParentWorktree(runId: string): string { + const source = mkdtempSync(join(tmpdir(), 'cook-source-')); + dirs.push(source); + execFileSync('git', ['init', '-q', '-b', 'main'], { cwd: source }); + execFileSync('git', ['config', 'user.email', 'test@example.com'], { cwd: source }); + execFileSync('git', ['config', 'user.name', 'Test'], { cwd: source }); + writeFileSync(join(source, 'README.md'), '# project\n'); + execFileSync('git', ['add', '.'], { cwd: source }); + execFileSync('git', ['commit', '-q', '-m', 'initial'], { cwd: source }); + + const runDir = mkdtempSync(join(tmpdir(), 'cook-run-')); + dirs.push(runDir); + const parent = join(runDir, 'worktree'); + execFileSync('git', ['worktree', 'add', '-q', '-b', `cook/${runId}`, parent, 'HEAD'], { cwd: source }); + return parent; + } + + it( + 'creates the slice worktree on first call and is a no-op on repeat (rework-safe)', + () => { + const parent = makeGitParentWorktree('r1'); + + const first = ensureSliceWorktree(parent, 'only', singleSlicePlan, 'r1'); + expect(existsSync(join(first, 'README.md'))).toBe(true); + + // Second call must not throw (seedSliceFromParentWorktree would, via its + // path-availability assertion) and must return the same dir. + const second = ensureSliceWorktree(parent, 'only', singleSlicePlan, 'r1'); + expect(second).toBe(first); + }, + GIT_TEST_TIMEOUT_MS, + ); + + it( + 'fails loudly when a slice id collides with a tracked parent entry, not a worktree', + () => { + // A slice id matching a tracked top-level dir (here `src`) resolves to an + // existing path that is NOT a provisioned worktree. Early-returning it + // would hand the project source to the slice as its sandbox. + const source = mkdtempSync(join(tmpdir(), 'cook-source-')); + dirs.push(source); + execFileSync('git', ['init', '-q', '-b', 'main'], { cwd: source }); + execFileSync('git', ['config', 'user.email', 'test@example.com'], { cwd: source }); + execFileSync('git', ['config', 'user.name', 'Test'], { cwd: source }); + mkdirSync(join(source, 'src')); + writeFileSync(join(source, 'src', 'index.ts'), 'export {};\n'); + execFileSync('git', ['add', '.'], { cwd: source }); + execFileSync('git', ['commit', '-q', '-m', 'initial'], { cwd: source }); + + const runDir = mkdtempSync(join(tmpdir(), 'cook-run-')); + dirs.push(runDir); + const parent = join(runDir, 'worktree'); + execFileSync('git', ['worktree', 'add', '-q', '-b', 'cook/r2', parent, 'HEAD'], { cwd: source }); + + const collidingPlan: Plan = { + mode: 'brownfield', + epics: [{ id: 'e1', summary: '', depends_on: [], verification: [] }], + slices: [{ id: 'src', epic_id: 'e1', definition: '', depends_on: [], verification: [] }], + }; + + expect(() => ensureSliceWorktree(parent, 'src', collidingPlan, 'r2')).toThrow(/collides/i); + }, + GIT_TEST_TIMEOUT_MS, + ); +}); + describe('mergeSlicesIntoEpicSandbox', () => { const dirs: string[] = []; afterEach(() => { diff --git a/src/orchestrator/src/epic-sandbox-merge.ts b/src/orchestrator/src/epic-sandbox-merge.ts index 9bd2afb03..0f2cfc851 100644 --- a/src/orchestrator/src/epic-sandbox-merge.ts +++ b/src/orchestrator/src/epic-sandbox-merge.ts @@ -251,15 +251,59 @@ export function seedSliceFromParentWorktree( ); // 2. CoW-copy whatever's in the parent worktree but NOT in the slice - // worktree yet — i.e. untracked / gitignored content (`node_modules/`, - // `dist/`, etc.) that pi-actions might need at runtime. + // worktree yet — i.e. untracked / gitignored content (`dist/`, etc.) that + // pi-actions might need at runtime. `node_modules/` is symlinked to the + // parent's single copy instead of duplicated per slice (see + // SHAREABLE_TOP_LEVEL_ENTRIES); `walkFiles` skips symlinks, so the shared + // tree is never re-walked during dependency seeding, merge, or promotion. const excludedNames = new Set(['.git', '.brunch', EPIC_MERGE_SEGMENT]); for (const s of plan.slices) excludedNames.add(s.id); - copyMissingTopLevelEntries(parentSandboxDir, sliceDir, excludedNames); + copyMissingTopLevelEntries(parentSandboxDir, sliceDir, excludedNames, SHAREABLE_TOP_LEVEL_ENTRIES); return sliceDir; } +/** + * Top-level gitignored entries shared across slice sandboxes via symlink rather + * than CoW-copied per slice. `node_modules/` is install output that pi-actions + * read (resolve deps, run tests/build) but do not author, so a single + * parent-owned copy linked into each slice removes N-1 redundant tree copies. + * Build caches under it (`.cache`, `.vite`) become shared too — acceptable for + * cook's transient runs; revisit if a tool needs per-slice write isolation. + */ +const SHAREABLE_TOP_LEVEL_ENTRIES: ReadonlySet = new Set(['node_modules']); + +/** + * Idempotent codebase-mode slice worktree provisioning: create the git worktree + * on first call, no-op if it already exists. Called from `resolveSliceCwd` on + * every fire (action, run-tests, assess) and across reworks, so it must tolerate + * repeats. Provisioning is synchronous (`execFileSync`), so concurrent fires of + * distinct slices under the parallel policy serialize on the JS thread — no two + * `git worktree add` invocations against the shared object store overlap. + */ +export function ensureSliceWorktree( + parentSandboxDir: string, + sliceId: string, + plan: Plan, + runId: string, +): string { + const sliceDir = resolveSliceWorktreeDir(parentSandboxDir, sliceId); + if (existsSync(sliceDir)) { + // An existing path is only a no-op when it is a real git worktree we + // provisioned (own `.git` gitfile). A bare existing entry means the slice + // id collided with a tracked parent path (e.g. slice `src` vs the repo's + // `src/`); adopting it as the sandbox would silently break per-slice + // isolation, so fail loudly — matching seedSliceFromParentWorktree's guard. + if (!existsSync(join(sliceDir, '.git'))) { + throw new Error( + `Slice id "${sliceId}" collides with an existing entry in the parent worktree (not a provisioned cook worktree)`, + ); + } + return sliceDir; + } + return seedSliceFromParentWorktree(parentSandboxDir, sliceId, plan, runId); +} + /** Copy completed dependency slice worktrees into `slice`'s sandbox (plan order). */ export function seedSliceSandboxFromDeps( parentSandboxDir: string, diff --git a/src/orchestrator/src/net-compiler.ts b/src/orchestrator/src/net-compiler.ts index 5343aec7d..075c0579a 100644 --- a/src/orchestrator/src/net-compiler.ts +++ b/src/orchestrator/src/net-compiler.ts @@ -5,12 +5,9 @@ // 3. compilePlan(input, ctx) → PetriNet (convenience wrapper) // --------------------------------------------------------------------------- -import { mkdirSync } from 'node:fs'; - import { + ensureSliceWorktree, mergeSlicesIntoEpicSandbox, - resolveSliceWorktreeDir, - seedSliceFromParentWorktree, seedSliceSandboxFromDeps, sliceIdsForEpicVerifyMerge, } from './epic-sandbox-merge.js'; @@ -19,6 +16,7 @@ import type { NetBlueprint, TokenSeed, TransitionSkeleton } from './net-blueprin import { PetriNet } from './petri-net.js'; import type { Token } from './petri-net.js'; import { createReport } from './report-helpers.js'; +import { runVerification } from './test-runner.js'; import type { ActionContext, OrchestratorInput, Plan, RunCtx, RunPolicy, Slice } from './types.js'; // --------------------------------------------------------------------------- @@ -555,35 +553,30 @@ export function wireHandlers(blueprint: NetBlueprint, input: OrchestratorInput, net.addPlace(place); } - // Runtime filesystem preparation lives in wireHandlers so every action/test - // cwd exists before any transition can fire. This is the one intentional side - // effect in the wiring pass; a future prepareRunFilesystem step can split it - // out if more provisioning responsibilities accumulate. - // Per-slice dirs are parallel-safe; dependency seeding happens at fire time. - // In codebase mode, seed each slice dir with the parent worktree's contents - // (the source repo's HEAD via `git worktree add`) so pi-actions can modify - // existing code instead of writing into an empty dir. + // Per-slice sandboxes are provisioned lazily at fire time (in resolveSliceCwd), + // not eagerly here: a run that touches 2 of 8 slices pays for 2 worktrees, not + // 8. Each slice dir is an independent root, so concurrent fires of distinct + // slices never contend; repeat fires of the same slice (rework) are idempotent. // 'shared' (serial greenfield): all slices accrete into the run sandbox. // 'per-slice': each slice gets its own git worktree (codebase) or plain dir // (greenfield parallel), merged into __epic__ for verification. + // Fail fast on the missing-runId precondition rather than at first fire. const sliceLayout = input.sliceLayout ?? 'per-slice'; - if (input.sandboxMode === 'codebase') { - if (!input.runId) { - throw new Error('codebase mode requires input.runId (used to name slice-level git branches)'); - } - for (const slice of plan.slices) { - seedSliceFromParentWorktree(input.sandboxDir, slice.id, plan, input.runId); - } - } else if (sliceLayout === 'per-slice') { - for (const slice of plan.slices) { - mkdirSync(resolveSliceWorktreeDir(input.sandboxDir, slice.id), { recursive: true }); - } + const { runId } = input; + if (input.sandboxMode === 'codebase' && !runId) { + throw new Error('codebase mode requires input.runId (used to name slice-level git branches)'); } - const resolveSliceCwd = (slice: Slice): string => - sliceLayout === 'shared' - ? input.sandboxDir - : seedSliceSandboxFromDeps(input.sandboxDir, plan, slice, { preserveExisting: true }); + const resolveSliceCwd = (slice: Slice): string => { + if (sliceLayout === 'shared') return input.sandboxDir; + // Codebase mode: materialize the slice's git worktree (HEAD checkout + + // symlinked node_modules) on first touch so pi-actions modify existing code + // rather than an empty dir; greenfield per-slice gets a plain dir below. + if (input.sandboxMode === 'codebase') { + ensureSliceWorktree(input.sandboxDir, slice.id, plan, runId!); + } + return seedSliceSandboxFromDeps(input.sandboxDir, plan, slice, { preserveExisting: true }); + }; // Register transitions with wired fire handlers for (const skel of blueprint.transitions) { @@ -714,18 +707,24 @@ export function wireHandlers(blueprint: NetBlueprint, input: OrchestratorInput, const deferred = (async () => { const slice = plan.slices.find((s) => s.id === sliceId)!; const sandboxDir = resolveSliceCwd(slice); - const results = []; - for (const target of targets) { - results.push({ target, ...(await testRunner.run(target, sandboxDir)) }); - } - const passed = results.length > 0 && results.every((result) => result.passed); + // Shared verification seam: same verdict rule + infra-dominates + // aggregate as evaluate-done / verify-epic (FE-872 unification). + const { + done: passed, + failureKind, + results, + } = await runVerification( + targets.map((target) => ({ target })), + testRunner, + sandboxDir, + ); const output = results.map((result) => result.output).join('\n'); const reportId = createReport(reports, { epicId, sliceId, actor: 'test-runner', event: 'tests-run', - payload: { passed, output, results }, + payload: { passed, output, failureKind, results }, }); ctx.reportIds.push(reportId); @@ -738,12 +737,18 @@ export function wireHandlers(blueprint: NetBlueprint, input: OrchestratorInput, } if (retryCount >= maxRetries) { // FE-761 Slice 2b: structural halt — emit a halt token - // carrying its own reason. + // carrying its own reason. FE-872: when verification reports an + // infra failure, name that cause — "retry exhaustion" would + // misdirect the reader to the code. ctx.sliceOutcomes.set(sliceId, { sliceId, status: 'halted' }); + const haltReason = + failureKind === 'infra' + ? `Slice ${sliceId} toolchain/install failure during verification` + : `Slice ${sliceId} retry exhaustion`; return [ { place: p(sliceId, 'halted'), - token: { ...tok, haltReason: `Slice ${sliceId} retry exhaustion` }, + token: { ...tok, haltReason }, }, ]; } diff --git a/src/orchestrator/src/pi-actions.test.ts b/src/orchestrator/src/pi-actions.test.ts index 7a8ddb1d3..acc3325d8 100644 --- a/src/orchestrator/src/pi-actions.test.ts +++ b/src/orchestrator/src/pi-actions.test.ts @@ -3,20 +3,20 @@ import { tmpdir } from 'node:os'; import { dirname, join } from 'node:path'; import { fileURLToPath } from 'node:url'; -import { describe, expect, it } from 'vitest'; +import { afterEach, describe, expect, it } from 'vitest'; import { createPiActions, epicVerifyTask, - evaluateVerificationTargets, runPi, type SessionFactory, sliceTestTask, toolsForAction, } from './pi-actions.js'; +import type { CookEvent } from './presenter/events.js'; import { brunchProfile, bunProfile } from './project-profile.js'; import { InMemoryReportSink } from './report-sink.js'; -import type { ActionContext, Epic, Slice } from './types.js'; +import type { ActionContext, Epic, Plan, ProbeGrounder, Slice, TestResult, TestRunner } from './types.js'; const promptsDir = join(dirname(fileURLToPath(import.meta.url)), '..', 'prompts'); @@ -58,34 +58,342 @@ describe('cook task builders carry the toolchain conventions, not a hardcoded st }); }); -describe('evaluateVerificationTargets — done reflects real test execution', () => { - it('done only when at least one target exists and every target passes', async () => { - const { done } = await evaluateVerificationTargets([{ target: 'a' }, { target: 'b' }], async () => true); - expect(done).toBe(true); +describe('evaluate-done / verify-epic share the runner seam — failureKind is visible at both', () => { + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const epic: Epic = { + id: 'utils', + summary: 'Utilities', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'tests/utils.integration.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + + function fakeRunner(result: TestResult): TestRunner { + return { + async run() { + return result; + }, + }; + } + + function ctx(reports: InMemoryReportSink): ActionContext { + return { slice, epic, plan, sandboxDir: '/tmp/unused', reports }; + } + + it('evaluate-done surfaces an infra failureKind in the eval-done report', async () => { + const reports = new InMemoryReportSink(); + const actions = createPiActions({ + testRunner: fakeRunner({ passed: false, output: 'no runner', failureKind: 'infra' }), + }); + const id = await actions['evaluate-done']!(ctx(reports)); + const payload = reports.getById(id)!.payload as { done: boolean; failureKind?: string }; + expect(payload.done).toBe(false); + expect(payload.failureKind).toBe('infra'); + }); + + it('evaluate-done reports a passing verdict with no failureKind', async () => { + const reports = new InMemoryReportSink(); + const actions = createPiActions({ testRunner: fakeRunner({ passed: true, output: 'ok' }) }); + const id = await actions['evaluate-done']!(ctx(reports)); + const payload = reports.getById(id)!.payload as { done: boolean; failureKind?: string }; + expect(payload.done).toBe(true); + expect(payload.failureKind).toBeUndefined(); + }); + + it('verify-epic surfaces an infra failureKind in the epic-verified report', async () => { + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const reports = new InMemoryReportSink(); + // verify-epic first runs a pi session to author the integration test; stub + // it so no real agent runs, then the injected runner reports the infra fail. + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const actions = createPiActions({ + testRunner: fakeRunner({ passed: false, output: 'no runner', failureKind: 'infra' }), + createSession, + }); + const id = await actions['verify-epic']!(ctx(reports)); + const payload = reports.getById(id)!.payload as { passed: boolean; failureKind?: string }; + expect(payload.passed).toBe(false); + expect(payload.failureKind).toBe('infra'); + }); + + it('brackets the test-run wait with a balanced activity-start/end', async () => { + const events: CookEvent[] = []; + const actions = createPiActions({ + testRunner: fakeRunner({ passed: true, output: 'ok' }), + emit: (e) => events.push(e), + }); + await actions['evaluate-done']!(ctx(new InMemoryReportSink())); + + const starts = events.filter((e) => e.kind === 'activity-start'); + const ends = events.filter((e) => e.kind === 'activity-end'); + expect(starts).toHaveLength(1); + expect(ends).toHaveLength(1); + expect((ends[0] as { id: string }).id).toBe((starts[0] as { id: string }).id); + }); + + it('closes the pi-session activity even when the session fails (finally)', async () => { + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const events: CookEvent[] = []; + const createSession = (async () => { + throw new Error('session boom'); + }) as unknown as SessionFactory; + const actions = createPiActions({ createSession, emit: (e) => events.push(e) }); + + await expect(actions['write-tests']!(ctx(new InMemoryReportSink()))).rejects.toThrow(); + expect(events.filter((e) => e.kind === 'activity-start')).toHaveLength(1); + expect(events.filter((e) => e.kind === 'activity-end')).toHaveLength(1); + }); +}); + +describe('verify-epic integration oracle (FE-876) — reachability folds into the epic verdict', () => { + const probeDirs: string[] = []; + afterEach(() => { + for (const dir of probeDirs.splice(0)) rmSync(dir, { recursive: true, force: true }); }); - it('not done if any target fails, and reports per-target results', async () => { - const { done, results } = await evaluateVerificationTargets( - [{ target: 'a' }, { target: 'b' }], - async (t) => t === 'a', + // A real zero-dep app that answers `routes` (path → status); 404 otherwise. + function appSandbox(routes: Record): string { + const dir = mkdtempSync(join(tmpdir(), 'verify-epic-probe-')); + probeDirs.push(dir); + writeFileSync( + join(dir, 'server.js'), + `const http = require('node:http');\n` + + `const routes = ${JSON.stringify(routes)};\n` + + `http.createServer((req, res) => {\n` + + ` const status = routes[req.url] ?? 404;\n` + + ` res.writeHead(status); res.end(String(status));\n` + + `}).listen(Number(process.env.PORT), '127.0.0.1');\n`, ); - expect(done).toBe(false); - expect(results).toEqual([ - { target: 'a', passed: true }, - { target: 'b', passed: false }, - ]); + return dir; + } + + function epicWithProbe(): Epic { + return { + id: 'utils', + summary: 'Utilities', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'tests/utils.integration.test.ts' }], + probe: { boot: ['node', 'server.js'], readyPath: '/health', featurePath: '/feature' }, + }; + } + + function passingActions(sandboxDir: string): { + actions: ReturnType; + ctx: (reports: InMemoryReportSink) => ActionContext; + } { + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const epic = epicWithProbe(); + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: true, output: 'ok' }; + }, + }, + createSession, + }); + return { actions, ctx: (reports) => ({ slice, epic, plan, sandboxDir, reports }) }; + } + + it('tests pass + feature reachable → epic passes (reachable)', async () => { + const reports = new InMemoryReportSink(); + const { actions, ctx } = passingActions(appSandbox({ '/health': 200, '/feature': 200 })); + const id = await actions['verify-epic']!(ctx(reports)); + const payload = reports.getById(id)!.payload as { passed: boolean; reachability?: string }; + expect(payload.passed).toBe(true); + expect(payload.reachability).toBe('reachable'); + }); + + it('tests pass but feature endpoint is absent → epic fails (the FE-800 orphan)', async () => { + const reports = new InMemoryReportSink(); + // App boots and answers /health, but /feature is 404 — merged but not wired in. + const { actions, ctx } = passingActions(appSandbox({ '/health': 200 })); + const id = await actions['verify-epic']!(ctx(reports)); + const payload = reports.getById(id)!.payload as { passed: boolean; reachability?: string }; + expect(payload.passed).toBe(false); + expect(payload.reachability).toBe('not-reachable'); }); - it('not done when there are no verification targets (nothing proves it)', async () => { - const { done } = await evaluateVerificationTargets([], async () => true); - expect(done).toBe(false); + it('failing tests short-circuit the probe — no boot, unchanged unit verdict', async () => { + const reports = new InMemoryReportSink(); + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const epic = epicWithProbe(); + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: false, output: 'no runner', failureKind: 'infra' }; + }, + }, + createSession, + }); + // Point at a dir with no server.js: if the probe booted, it would error — it + // must not run because tests failed first. + const id = await actions['verify-epic']!({ slice, epic, plan, sandboxDir: tmpdir(), reports }); + const payload = reports.getById(id)!.payload as { + passed: boolean; + failureKind?: string; + reachability?: string; + }; + expect(payload.passed).toBe(false); + expect(payload.failureKind).toBe('infra'); + expect(payload.reachability).toBeUndefined(); + }); + + it('no probe target → unit-test verdict only (unchanged behavior)', async () => { + const reports = new InMemoryReportSink(); + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const epic: Epic = { + id: 'utils', + summary: 'Utilities', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'tests/utils.integration.test.ts' }], + }; + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: true, output: 'ok' }; + }, + }, + createSession, + }); + const id = await actions['verify-epic']!({ slice, epic, plan, sandboxDir: tmpdir(), reports }); + const payload = reports.getById(id)!.payload as { passed: boolean; reachability?: string }; + expect(payload.passed).toBe(true); + expect(payload.reachability).toBeUndefined(); + }); + + // ---- Half B: cook-time grounding seam ----------------------------------- + + function intentEpic(extra?: Partial): Epic { + return { + id: 'utils', + summary: 'Utilities', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'tests/utils.integration.test.ts' }], + reachability: { feature: 'the /feature route responds' }, + ...extra, + }; + } + + function groundedVerifyEpic(opts: { + sandboxDir: string; + epic: Epic; + groundProbe?: ProbeGrounder; + }): Promise<{ passed: boolean; reachability?: string }> { + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const reports = new InMemoryReportSink(); + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [opts.epic], slices: [slice] }; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: true, output: 'ok' }; + }, + }, + createSession, + groundProbe: opts.groundProbe, + }); + return actions['verify-epic']!({ + slice, + epic: opts.epic, + plan, + sandboxDir: opts.sandboxDir, + reports, + }).then((id) => reports.getById(id)!.payload as { passed: boolean; reachability?: string }); + } + + it('grounds a reachability intent into a concrete target, then probes it', async () => { + let seenFeature = ''; + const payload = await groundedVerifyEpic({ + sandboxDir: appSandbox({ '/health': 200, '/feature': 200 }), + epic: intentEpic(), + groundProbe: async (intent) => { + seenFeature = intent.feature; + return { boot: ['node', 'server.js'], readyPath: '/health', featurePath: '/feature' }; + }, + }); + expect(seenFeature).toContain('/feature'); + expect(payload.passed).toBe(true); + expect(payload.reachability).toBe('reachable'); + }); + + it('a reachability intent with no injected grounder is a no-op (unit verdict only)', async () => { + // sandbox has no app; if grounding ran and probed, it would error/fail. + const payload = await groundedVerifyEpic({ sandboxDir: tmpdir(), epic: intentEpic() }); + expect(payload.passed).toBe(true); + expect(payload.reachability).toBeUndefined(); + }); + + it('a grounder that throws is an infra fault — the epic fails, not silently passes', async () => { + const payload = await groundedVerifyEpic({ + sandboxDir: tmpdir(), + epic: intentEpic(), + groundProbe: async () => { + throw new Error('agent could not resolve wiring'); + }, + }); + expect(payload.passed).toBe(false); + expect(payload.reachability).toBe('infra'); }); - it('a throwing runner counts as a failed target', async () => { - const { done } = await evaluateVerificationTargets([{ target: 'x' }], async () => { - throw new Error('runner blew up'); + it('a concrete probe target wins over a reachability intent (Half A precedence)', async () => { + let grounderCalled = false; + const payload = await groundedVerifyEpic({ + sandboxDir: appSandbox({ '/health': 200, '/feature': 200 }), + epic: intentEpic({ + probe: { boot: ['node', 'server.js'], readyPath: '/health', featurePath: '/feature' }, + }), + groundProbe: async () => { + grounderCalled = true; + throw new Error('should not be called'); + }, }); - expect(done).toBe(false); + expect(grounderCalled).toBe(false); + expect(payload.reachability).toBe('reachable'); }); }); diff --git a/src/orchestrator/src/pi-actions.ts b/src/orchestrator/src/pi-actions.ts index 20526ada4..a114a3849 100644 --- a/src/orchestrator/src/pi-actions.ts +++ b/src/orchestrator/src/pi-actions.ts @@ -1,4 +1,3 @@ -import { spawn } from 'node:child_process'; import { mkdtempSync, readFileSync, rmSync } from 'node:fs'; import { tmpdir } from 'node:os'; import { dirname, join } from 'node:path'; @@ -14,10 +13,22 @@ import { SettingsManager, } from '@earendil-works/pi-coding-agent'; +import { buildProbeSpec, runProbe } from './app-probe.js'; +import type { CookEvent } from './presenter/events.js'; import { defaultToolchain, type Toolchain } from './project-profile.js'; import { createReport } from './report-helpers.js'; import { sliceLabel } from './slice-label.js'; -import type { ActionContext, ActionHandlers, Epic, Slice } from './types.js'; +import { runVerification, ToolchainTestRunner } from './test-runner.js'; +import type { + ActionContext, + ActionHandlers, + Epic, + ProbeGrounder, + ProbeResult, + ProbeTarget, + Slice, + TestRunner, +} from './types.js'; const __dirname = dirname(fileURLToPath(import.meta.url)); const promptsDir = __dirname.includes('dist') @@ -28,27 +39,30 @@ const promptsDir = __dirname.includes('dist') // Logging // --------------------------------------------------------------------------- -let t0 = 0; let _verbose = false; - -function elapsed(): string { - const s = ((Date.now() - t0) / 1000).toFixed(1); - return `${s}s`.padStart(7); -} +// Presentation boundary. Per-action progress flows to the CookBus as +// CookEvents; the presenter owns formatting (and the elapsed clock — +// I136-K). Defaults to a no-op so unit tests that ignore output run clean. +let _emit: (event: CookEvent) => void = () => {}; function log(icon: string, msg: string): void { - console.error(` ${elapsed()} ${icon} ${msg}`); + _emit({ kind: 'action', icon, message: msg }); } function logVerbose(output: string): void { if (!_verbose) return; - const trimmed = output.trim(); - if (!trimmed) return; - console.error(''); - for (const line of trimmed.split('\n')) { - console.error(` │ ${line}`); + // The presenter trims and skips blank output. + _emit({ kind: 'verbose', text: output }); +} + +/** Bracket a wait so it shows as a live pending activity; always closes. */ +async function withActivity(id: string, label: string, fn: () => Promise): Promise { + _emit({ kind: 'activity-start', id, label }); + try { + return await fn(); + } finally { + _emit({ kind: 'activity-end', id }); } - console.error(''); } // --------------------------------------------------------------------------- @@ -151,6 +165,9 @@ async function runPi( const timeoutMs = deps.timeoutMs ?? PI_TIMEOUT_MS; const maxOutput = deps.maxOutput ?? PI_MAX_OUTPUT; const start = Date.now(); + // Open a live wait so the (up to 5-minute) agent session isn't dead air. + _emit({ kind: 'activity-start', id: opts.label, label: opts.label }); + let heartbeatKb = 0; const isolatedDir = createAgentDir(); let cleanedAgentDir = false; @@ -206,6 +223,12 @@ async function runPi( } captured += delta; capturedBytes += deltaBytes; + // Throttled heartbeat — every 2 KB — so the spinner shows progress, not churn. + const kb = Math.floor(capturedBytes / 1024); + if (kb >= heartbeatKb + 2) { + heartbeatKb = kb; + _emit({ kind: 'activity-progress', id: opts.label, detail: `${kb} KB` }); + } } }); @@ -219,6 +242,9 @@ async function runPi( unsubscribe?.(); session?.dispose(); cleanupAgentDir(); + // Always close the wait — even on timeout / overflow / prompt error — so + // the spinner can never hang. + _emit({ kind: 'activity-end', id: opts.label }); } if (timedOut) throw piTimeoutError(timeoutMs); @@ -236,67 +262,27 @@ async function runPi( export { runPi }; -/** - * Decide whether a slice is done by executing its verification targets. `done` - * requires at least one target and every target passing — a slice with no - * runnable verification cannot be proven done (no requisite variety). This is - * the real oracle: it replaces the prior LLM verdict over criterion prose, - * which a standalone component or Ladle story could satisfy without the - * feature working. - */ -export async function evaluateVerificationTargets( - targets: readonly { target: string }[], - runTarget: (target: string) => Promise, -): Promise<{ done: boolean; results: Array<{ target: string; passed: boolean }> }> { - const results: Array<{ target: string; passed: boolean }> = []; - for (const t of targets) { - let passed = false; - try { - passed = await runTarget(t.target); - } catch { - passed = false; - } - results.push({ target: t.target, passed }); - } - return { done: results.length > 0 && results.every((r) => r.passed), results }; -} - -async function runTest(toolchain: Toolchain, target: string, sandboxDir: string): Promise { - return new Promise((resolve) => { - const [command, ...args] = toolchain.testCommand(target); - const child = spawn(command!, args, { - cwd: sandboxDir, - stdio: ['ignore', 'pipe', 'pipe'], - }); - const stdoutChunks: Buffer[] = []; - const stderrChunks: Buffer[] = []; - let settled = false; - - const finish = (passed: boolean): void => { - if (settled) return; - settled = true; - clearTimeout(timer); - const output = Buffer.concat([...stdoutChunks, ...stderrChunks]).toString('utf8'); - logVerbose(output); - resolve(passed); - }; - - const timer = setTimeout(() => { - child.kill('SIGTERM'); - finish(false); - }, 60_000); - - child.stdout?.on('data', (chunk: Buffer) => stdoutChunks.push(chunk)); - child.stderr?.on('data', (chunk: Buffer) => stderrChunks.push(chunk)); - child.on('error', () => finish(false)); - child.on('close', (code) => finish(code === 0)); - }); -} - function report(ctx: ActionContext, actor: string, event: string, payload: Record): string { return createReport(ctx.reports, { epicId: ctx.epic.id, sliceId: ctx.slice.id, actor, event, payload }); } +/** + * Resolve the epic's reachability probe target (FE-876): a concrete `epic.probe` + * wins (Half A — fixtures / explicit); otherwise a host-blind `epic.reachability` + * intent is ground into a `ProbeTarget` by the injected cook-time grounder + * (Half B). With no concrete target, no intent, or no grounder, there is nothing + * to probe — the epic falls back to the unit-test verdict alone. + */ +async function resolveProbeTarget( + epic: Epic, + sandboxDir: string, + ground: ProbeGrounder | undefined, +): Promise { + if (epic.probe) return epic.probe; + if (epic.reachability && ground) return ground(epic.reachability, sandboxDir); + return undefined; +} + // --------------------------------------------------------------------------- // Actions // --------------------------------------------------------------------------- @@ -319,25 +305,42 @@ export function epicVerifyTask(epic: Epic, toolchain: Toolchain): string { export function createPiActions(opts?: { verbose?: boolean; - runStart?: number; + /** Presentation sink. Per-action progress is emitted as CookEvents; defaults to no-op. */ + emit?: (event: CookEvent) => void; toolchain?: Toolchain; + testRunner?: TestRunner; + /** Inject the agent-session factory (tests stub it so no real session runs). */ + createSession?: SessionFactory; + /** + * Cook-time probe grounding (FE-876 Half B): resolve an epic's host-blind + * `reachability` intent into a concrete `ProbeTarget`. Absent → reachability + * intents are not enforced (the agent grounder lands with the pi-harness + * contract); concrete `epic.probe` targets work regardless. + */ + groundProbe?: ProbeGrounder; }): ActionHandlers { _verbose = opts?.verbose ?? false; - t0 = opts?.runStart ?? Date.now(); + _emit = opts?.emit ?? (() => {}); const toolchain = opts?.toolchain ?? defaultToolchain; + const testRunner = opts?.testRunner ?? new ToolchainTestRunner(toolchain); + const groundProbe = opts?.groundProbe; + const piDeps = opts?.createSession ? { createSession: opts.createSession } : {}; return { 'evaluate-done': async (ctx: ActionContext) => { const label = sliceLabel(ctx.slice); log('?', `evaluate ${label}`); - const { done, results } = await evaluateVerificationTargets(ctx.slice.verification, (target) => - runTest(toolchain, target, ctx.sandboxDir), + const { done, failureKind, results } = await withActivity( + `verify ${label}`, + `running tests · ${label}`, + () => runVerification(ctx.slice.verification, testRunner, ctx.sandboxDir), ); for (const r of results) { + logVerbose(r.output); log(r.passed ? '✓' : '✗', `verify ${r.target}`); } log(done ? '●' : '○', `verdict ${label} → ${done ? 'DONE' : 'NEEDS WORK'}`); - return report(ctx, 'evaluator', 'eval-done', { done, results }); + return report(ctx, 'evaluator', 'eval-done', { done, failureKind, results }); }, 'write-tests': async (ctx: ActionContext) => { @@ -345,14 +348,17 @@ export function createPiActions(opts?: { log('▸', `tests ${label}`); const task = sliceTestTask(ctx.slice, toolchain); - await runPi({ - label: `tests ${label}`, - model: 'claude-sonnet-4-6', - promptFile: join(promptsDir, 'test-writer.md'), - task, - sandboxDir: ctx.sandboxDir, - tools: toolsForAction('write-tests'), - }); + await runPi( + { + label: `tests ${label}`, + model: 'claude-sonnet-4-6', + promptFile: join(promptsDir, 'test-writer.md'), + task, + sandboxDir: ctx.sandboxDir, + tools: toolsForAction('write-tests'), + }, + piDeps, + ); return report(ctx, 'test-writer', 'tests-written', { sliceId: ctx.slice.id, @@ -365,14 +371,17 @@ export function createPiActions(opts?: { log('▸', `code ${label}`); const task = `Write code to make tests pass for slice "${ctx.slice.id}": ${ctx.slice.definition}\nVerification targets: ${ctx.slice.verification.map((v) => `${v.kind}: ${v.target}`).join(', ')}\nImplement the minimum code to make all tests pass.`; - await runPi({ - label: `code ${label}`, - model: 'claude-sonnet-4-6', - promptFile: join(promptsDir, 'code-writer.md'), - task, - sandboxDir: ctx.sandboxDir, - tools: toolsForAction('write-code'), - }); + await runPi( + { + label: `code ${label}`, + model: 'claude-sonnet-4-6', + promptFile: join(promptsDir, 'code-writer.md'), + task, + sandboxDir: ctx.sandboxDir, + tools: toolsForAction('write-code'), + }, + piDeps, + ); return report(ctx, 'code-writer', 'code-written', { sliceId: ctx.slice.id, @@ -390,25 +399,66 @@ export function createPiActions(opts?: { log('▸', `verify ${ctx.epic.id}`); const writeTask = epicVerifyTask(ctx.epic, toolchain); - await runPi({ - label: `verify ${ctx.epic.id} (write)`, - model: 'claude-sonnet-4-6', - promptFile: join(promptsDir, 'test-writer.md'), - task: writeTask, - sandboxDir: ctx.sandboxDir, - tools: toolsForAction('verify-epic'), - }); + await runPi( + { + label: `verify ${ctx.epic.id} (write)`, + model: 'claude-sonnet-4-6', + promptFile: join(promptsDir, 'test-writer.md'), + task: writeTask, + sandboxDir: ctx.sandboxDir, + tools: toolsForAction('verify-epic'), + }, + piDeps, + ); + + const { + done: testsPassed, + failureKind, + results, + } = await withActivity(`verify-epic ${ctx.epic.id}`, `running tests · ${ctx.epic.id}`, () => + runVerification(ctx.epic.verification, testRunner, ctx.sandboxDir), + ); + for (const r of results) { + logVerbose(r.output); + log(r.passed ? '✓' : '✗', `verify ${r.target}`); + } - let allPassed = true; - for (const v of ctx.epic.verification) { - const passed = await runTest(toolchain, v.target, ctx.sandboxDir); - log(passed ? '✓' : '✗', `verify ${v.target}`); - allPassed &&= passed; + // Integration oracle (FE-876): the epic is reachable only when the booted + // merged tree answers the feature endpoint. `not-reachable` is the FE-800 + // orphan (code merged but never wired into the running app); `infra` is a + // harness fault, not a wiring verdict. Gate the boot on tests passing — + // never boot a known-broken build. The probe target is either concrete + // (`epic.probe`, Half A) or cook-time-grounded from `epic.reachability` + // (Half B); a grounder that throws is itself an `infra` fault. + let probe: ProbeResult | undefined; + if (testsPassed) { + try { + const target = await resolveProbeTarget(ctx.epic, ctx.sandboxDir, groundProbe); + if (target) { + probe = await withActivity( + `probe ${ctx.epic.id}`, + `probing reachability · ${ctx.epic.id}`, + async () => runProbe(await buildProbeSpec(target), ctx.sandboxDir), + ); + } + } catch (err) { + probe = { kind: 'infra', reachable: false, output: `probe grounding failed: ${String(err)}` }; + } + if (probe) { + logVerbose(probe.output); + log( + probe.reachable ? '✓' : '✗', + `probe ${ctx.epic.id} → ${probe.kind}${probe.status === undefined ? '' : ` (${probe.status})`}`, + ); + } } + const passed = testsPassed && (probe === undefined || probe.reachable); - log(allPassed ? '●' : '✗', `epic ${ctx.epic.id} → ${allPassed ? 'PASS' : 'FAIL'}`); + log(passed ? '●' : '✗', `epic ${ctx.epic.id} → ${passed ? 'PASS' : 'FAIL'}`); return report(ctx, 'orchestrator', 'epic-verified', { - passed: allPassed, + passed, + failureKind, + ...(probe ? { reachability: probe.kind } : {}), }); }, }; diff --git a/src/orchestrator/src/plan-emitter.test.ts b/src/orchestrator/src/plan-emitter.test.ts index ac8e8ad1b..418fd22e5 100644 --- a/src/orchestrator/src/plan-emitter.test.ts +++ b/src/orchestrator/src/plan-emitter.test.ts @@ -12,6 +12,7 @@ import { emitPlanFromSnapshot, emitterWarningCategory, formatEmitterWarning } fr import { evaluatePlanShape } from './plan-eval.js'; import { loadPlan } from './plan-loader.js'; import type { CompletedSpecSnapshot } from './plan-projection.js'; +import type { ProfileDetection } from './project-detect.js'; const snapshot: CompletedSpecSnapshot = { requirements: [ @@ -305,6 +306,107 @@ describe('emitPlanFromSnapshot', () => { expect(result.plan.profile).toBe('node-vitest'); }); + it('brownfield detection resolves the profile and beats the spec profile', async () => { + const detect = (): ProfileDetection => ({ detected: true, profile: 'node-vitest', evidence: 'stub' }); + const result = await emitPlanFromSnapshot( + { ...snapshot, mode: 'brownfield', profile: 'brunch' }, + { runModel: draftModel(coveringDraft()), repoDir: '/repo', detect }, + ); + expect(result.plan.profile).toBe('node-vitest'); + }); + + it('brownfield co-locates generated tests in the repo\u2019s own test directory', async () => { + // node-vitest defaults to tests/{id}.test.ts, but a repo whose vitest + // include is narrowed to src/** can\u2019t run that path. detectTestDir reports + // where the repo already keeps tests; the emitted targets follow it. + const detect = (): ProfileDetection => ({ detected: true, profile: 'node-vitest', evidence: 'stub' }); + const result = await emitPlanFromSnapshot( + { ...snapshot, mode: 'brownfield' }, + { + runModel: draftModel(coveringDraft()), + repoDir: '/repo', + detect, + detectTestDir: () => 'src', + }, + ); + expect(result.plan.profile).toBe('node-vitest'); + for (const slice of result.plan.slices) { + expect(slice.verification).toEqual([{ kind: 'unit-test', target: `src/${slice.id}.test.ts` }]); + } + }); + + it('brownfield keeps the profile default when the repo has no tests to learn from', async () => { + const detect = (): ProfileDetection => ({ detected: true, profile: 'node-vitest', evidence: 'stub' }); + const result = await emitPlanFromSnapshot( + { ...snapshot, mode: 'brownfield' }, + { + runModel: draftModel(coveringDraft()), + repoDir: '/repo', + detect, + detectTestDir: () => null, + }, + ); + for (const slice of result.plan.slices) { + expect(slice.verification).toEqual([{ kind: 'unit-test', target: `tests/${slice.id}.test.ts` }]); + } + }); + + it('greenfield never relocates tests even with a repoDir (probes invariant)', async () => { + const result = await emitPlanFromSnapshot(snapshot, { + runModel: draftModel(coveringDraft()), + profile: 'node-vitest', + repoDir: '/repo', + detectTestDir: () => { + throw new Error('greenfield must not detect a test dir'); + }, + }); + for (const slice of result.plan.slices) { + expect(slice.verification).toEqual([{ kind: 'unit-test', target: `tests/${slice.id}.test.ts` }]); + } + }); + + it('the --profile flag beats detection and skips reading the repo', async () => { + const detect = (): ProfileDetection => { + throw new Error('detect should not run when --profile is set'); + }; + const result = await emitPlanFromSnapshot( + { ...snapshot, mode: 'brownfield' }, + { runModel: draftModel(coveringDraft()), profile: 'deno', repoDir: '/repo', detect }, + ); + expect(result.plan.profile).toBe('deno'); + }); + + it('a failed detection falls through to an explicit spec profile, not bun', async () => { + const detect = (): ProfileDetection => ({ detected: false, reason: 'no recognizable manifest' }); + const result = await emitPlanFromSnapshot( + { ...snapshot, mode: 'brownfield', profile: 'brunch' }, + { runModel: draftModel(coveringDraft()), repoDir: '/repo', detect }, + ); + expect(result.plan.profile).toBe('brunch'); + }); + + it('a failed detection with no spec/architect signal fails loudly instead of defaulting to bun', async () => { + const detect = (): ProfileDetection => ({ detected: false, reason: 'no recognizable manifest' }); + await expect( + emitPlanFromSnapshot( + { ...snapshot, mode: 'brownfield' }, + { runModel: draftModel(coveringDraft()), repoDir: '/repo', detect }, + ), + ).rejects.toThrow(/brunch detect/); + }); + + it('greenfield never detects even when a repoDir is supplied (protecting invariant)', async () => { + const detect = (): ProfileDetection => { + throw new Error('greenfield must not detect'); + }; + const result = await emitPlanFromSnapshot(snapshot, { + runModel: draftModel(coveringDraft()), + repoDir: '/repo', + detect, + }); + expect(result.plan.profile).toBe('bun'); + }); + it('round-trips the emitted plan (incl. writes) through loadPlan after YAML serialization', async () => { const result = await emitPlanFromSnapshot(snapshot, { runModel: draftModel(coveringDraft()) }); diff --git a/src/orchestrator/src/plan-emitter.ts b/src/orchestrator/src/plan-emitter.ts index 240b886db..c0cb482af 100644 --- a/src/orchestrator/src/plan-emitter.ts +++ b/src/orchestrator/src/plan-emitter.ts @@ -33,8 +33,9 @@ import { type PlanningEnrichment, type ReconciliationWarning, } from './plan-reconciliation.js'; -import { resolveToolchain, type ProfileId, type Toolchain } from './project-profile.js'; -import type { Plan } from './types.js'; +import { detectProfile, detectTestDir, type ProfileDetection } from './project-detect.js'; +import { resolveToolchain, withTestDir, type ProfileId, type Toolchain } from './project-profile.js'; +import type { Plan, PlanMode } from './types.js'; const EMPTY_ENRICHMENT: PlanningEnrichment = { sliceDependencies: [], @@ -80,8 +81,59 @@ export type EmitPlanOptions = { * one resolved from the selected profile (`resolveToolchain`). */ toolchain?: Toolchain; + /** + * Project directory to detect the toolchain from (`brunch-detect`). Used only + * for **brownfield** plans — greenfield has an empty worktree and never + * detects. When omitted, detection is skipped and the FE-843 chain is + * unchanged (back-compat for callers/tests that don't read a repo). + */ + repoDir?: string; + /** Injectable detector seam (tests). Defaults to `detectProfile`. */ + detect?: (repoDir: string) => ProfileDetection; + /** + * Injectable test-directory detector seam (tests). Defaults to + * `detectTestDir`. Brownfield-only; co-locates generated tests where the host + * repo already keeps its tests so a narrowed runner include glob still + * discovers them. + */ + detectTestDir?: (repoDir: string) => string | null; }; +/** + * Resolve the profile id stamped onto the emitted plan, with `brunch-detect` + * inserted as the brownfield front of the FE-843 chain: + * + * flag ≫ detected (brownfield) ≫ spec ≫ architect-classified ≫ bun + * + * Detection reads the real repo, so its identity beats spec prose. A loud + * detection failure must not silently fall to bun: it falls through to an + * explicit spec/architect choice if one exists, otherwise throws — the + * actionable failure `brunch-detect` promises instead of cooking a brownfield + * repo under the wrong toolchain. Greenfield (or brownfield without a repo dir) + * keeps the unchanged FE-843 chain. + */ +function resolveEmittedProfile(args: { + flag?: ProfileId; + mode: PlanMode; + repoDir?: string; + specProfile?: ProfileId; + classified: ProfileId | null; + detect: (repoDir: string) => ProfileDetection; +}): ProfileId { + // Explicit flag wins and short-circuits detection (no repo read). + if (args.flag) return args.flag; + + if (args.mode === 'brownfield' && args.repoDir !== undefined) { + const detected = args.detect(args.repoDir); + if (detected.detected) return detected.profile; + if (args.specProfile) return args.specProfile; + if (args.classified) return args.classified; + throw new Error(`brunch detect: ${detected.reason}`); + } + + return args.specProfile ?? args.classified ?? 'bun'; +} + export async function emitPlanFromSnapshot( snapshot: CompletedSpecSnapshot, options: EmitPlanOptions = {}, @@ -93,12 +145,31 @@ export async function emitPlanFromSnapshot( const architectResult = await architectPlan(projected, runModel, planningContext); - // Selection chain: explicit flag ≫ spec profile ≫ architect-classified ≫ - // bun. Resolved exactly once, here; both paths below stamp the result onto + // Selection chain: flag ≫ detected (brownfield) ≫ spec ≫ architect-classified + // ≫ bun. Resolved exactly once, here; both paths below stamp the result onto // the emitted plan. A failed architect simply skips its rung. - const classified = architectResult.status === 'succeeded' ? architectResult.draft.profile : null; - const profile: ProfileId = options.profile ?? projected.profile ?? classified ?? 'bun'; - const toolchain = options.toolchain ?? resolveToolchain(profile); + const classified: ProfileId | null = + architectResult.status === 'succeeded' ? (architectResult.draft.profile ?? null) : null; + const profile: ProfileId = resolveEmittedProfile({ + flag: options.profile, + mode: projected.mode, + repoDir: options.repoDir, + specProfile: projected.profile, + classified, + detect: options.detect ?? detectProfile, + }); + // Co-locate generated tests where the brownfield repo already keeps its own. + // Detection picks the runner (profile); this picks the *path*, because a + // profile's default test directory can fall outside the host runner's + // (narrowed) include glob and so be unrunnable — the FE-871 "No test files + // found" failure. Skipped when a toolchain is injected directly, for + // greenfield, or when no repo dir is available; null = no existing tests to + // learn from, so the profile default stands. + let toolchain = options.toolchain ?? resolveToolchain(profile); + if (options.toolchain === undefined && projected.mode === 'brownfield' && options.repoDir !== undefined) { + const testDir = (options.detectTestDir ?? detectTestDir)(options.repoDir); + if (testDir !== null) toolchain = withTestDir(toolchain, testDir); + } if (architectResult.status === 'failed') { return fallback(projected, profile, toolchain, architectResult, architectResult.reason); diff --git a/src/orchestrator/src/presenter.test.ts b/src/orchestrator/src/presenter.test.ts new file mode 100644 index 000000000..19b35a4d4 --- /dev/null +++ b/src/orchestrator/src/presenter.test.ts @@ -0,0 +1,32 @@ +import { describe, expect, it, vi } from 'vitest'; + +import { withCookBus } from './presenter.js'; +import { CookBus } from './presenter/bus.js'; + +describe('withCookBus', () => { + it('runs the work with a bus and disposes it afterward', async () => { + const dispose = vi.spyOn(CookBus.prototype, 'dispose'); + let seen: CookBus | undefined; + + await withCookBus('cook', async (bus) => { + seen = bus; + }); + + expect(seen).toBeInstanceOf(CookBus); + expect(dispose).toHaveBeenCalledTimes(1); + dispose.mockRestore(); + }); + + it('disposes the bus even when the work throws', async () => { + const dispose = vi.spyOn(CookBus.prototype, 'dispose'); + + await expect( + withCookBus('plan', async () => { + throw new Error('work boom'); + }), + ).rejects.toThrow('work boom'); + + expect(dispose).toHaveBeenCalledTimes(1); + dispose.mockRestore(); + }); +}); diff --git a/src/orchestrator/src/presenter.ts b/src/orchestrator/src/presenter.ts new file mode 100644 index 000000000..a3c3d6a25 --- /dev/null +++ b/src/orchestrator/src/presenter.ts @@ -0,0 +1,64 @@ +// Public entry point for the CLI presentation seam. +// +// The orchestrator emits `CookEvent`s to a `CookBus`; a presenter chosen +// by environment renders them. `reports.jsonl` stays the durable medium +// (D156-K) — CookEvents are ephemeral presentation only. External callers +// import from here; only this root reaches into `presenter/`. + +import { CookBus } from './presenter/bus.js'; +import type { Presenter } from './presenter/events.js'; +import { InkPresenter } from './presenter/ink/ink-presenter.js'; +import { PlainPresenter } from './presenter/plain.js'; +import { type PresenterCommand, type PresenterKind, selectPresenter } from './presenter/select.js'; +import { SilentPresenter } from './presenter/silent.js'; + +export { CookBus } from './presenter/bus.js'; +export type { CookEvent, Presenter } from './presenter/events.js'; +export { PlainPresenter } from './presenter/plain.js'; +export { SilentPresenter } from './presenter/silent.js'; +export { + type PresenterCommand, + type PresenterKind, + type SelectPresenterEnv, + selectPresenter, +} from './presenter/select.js'; + +export function makePresenter(kind: PresenterKind, command: PresenterCommand): Presenter { + if (kind === 'silent') return new SilentPresenter(); + if (kind === 'ink') return new InkPresenter(command); + return new PlainPresenter(); +} + +/** + * Own the bus lifecycle for one command run: build it, run the work, and + * always dispose it (which unmounts the Ink app) — even on throw. Entry points + * use this instead of scattering create/dispose, so the TUI can never be left + * mounted and hang the process. + */ +export async function withCookBus( + command: PresenterCommand, + fn: (bus: CookBus) => Promise, +): Promise { + const bus = createCookBus(command); + try { + await fn(bus); + } finally { + await bus.dispose(); + } +} + +/** Build a bus with the environment-selected presenter subscribed. */ +export function createCookBus( + command: PresenterCommand, + env: { isTTY?: boolean; ci?: boolean; reporterFlag?: PresenterKind } = {}, +): CookBus { + const kind = selectPresenter({ + command, + isTTY: env.isTTY ?? Boolean(process.stderr.isTTY), + ci: env.ci ?? Boolean(process.env.CI), + ...(env.reporterFlag ? { reporterFlag: env.reporterFlag } : {}), + }); + const bus = new CookBus(); + bus.subscribe(makePresenter(kind, command)); + return bus; +} diff --git a/src/orchestrator/src/presenter/bus.test.ts b/src/orchestrator/src/presenter/bus.test.ts new file mode 100644 index 000000000..b7e360f19 --- /dev/null +++ b/src/orchestrator/src/presenter/bus.test.ts @@ -0,0 +1,63 @@ +import { describe, expect, it, vi } from 'vitest'; + +import { CookBus } from './bus.js'; +import type { CookEvent, Presenter } from './events.js'; + +function recorder(): Presenter & { events: CookEvent[] } { + const events: CookEvent[] = []; + return { events, onEvent: (e) => events.push(e), dispose: () => {} }; +} + +const ev: CookEvent = { kind: 'plan-start', specId: 2, outDir: '/x' }; + +describe('CookBus', () => { + it('fans every event out to all subscribed presenters in order', () => { + const a = recorder(); + const b = recorder(); + const bus = new CookBus(); + bus.subscribe(a); + bus.subscribe(b); + + bus.emit(ev); + bus.emit({ kind: 'plan-written', path: '/p', epics: 1, slices: 2 }); + + expect(a.events.map((e) => e.kind)).toEqual(['plan-start', 'plan-written']); + expect(b.events).toEqual(a.events); + }); + + it('isolates a throwing presenter so it cannot abort the run or starve siblings', () => { + const warn = vi.spyOn(process, 'emitWarning').mockImplementation(() => {}); + const boom: Presenter = { + onEvent: () => { + throw new Error('render-boom'); + }, + dispose: () => {}, + }; + const ok = recorder(); + const bus = new CookBus(); + bus.subscribe(boom); + bus.subscribe(ok); + + expect(() => bus.emit(ev)).not.toThrow(); + expect(ok.events).toEqual([ev]); + expect(warn).toHaveBeenCalled(); + warn.mockRestore(); + }); + + it('disposes every presenter, swallowing dispose errors', async () => { + const disposed: string[] = []; + const boom: Presenter = { + onEvent: () => {}, + dispose: () => { + throw new Error('dispose-boom'); + }, + }; + const ok: Presenter = { onEvent: () => {}, dispose: () => void disposed.push('ok') }; + const bus = new CookBus(); + bus.subscribe(boom); + bus.subscribe(ok); + + await expect(bus.dispose()).resolves.toBeUndefined(); + expect(disposed).toEqual(['ok']); + }); +}); diff --git a/src/orchestrator/src/presenter/bus.ts b/src/orchestrator/src/presenter/bus.ts new file mode 100644 index 000000000..8320e4065 --- /dev/null +++ b/src/orchestrator/src/presenter/bus.ts @@ -0,0 +1,36 @@ +// Synchronous fan-out from the orchestrator to its presenters. +// +// One producer, many presenters. A presenter that throws must never +// abort the run or starve its siblings — failures are downgraded to a +// process warning. Exactly one event type, one consumer shape: a bespoke +// class is clearer here than a generic EventEmitter. + +import type { CookEvent, Presenter } from './events.js'; + +export class CookBus { + private readonly presenters: Presenter[] = []; + + subscribe(presenter: Presenter): void { + this.presenters.push(presenter); + } + + emit(event: CookEvent): void { + for (const presenter of this.presenters) { + try { + presenter.onEvent(event); + } catch (err) { + process.emitWarning(`presenter failed on "${event.kind}": ${String(err)}`); + } + } + } + + async dispose(): Promise { + for (const presenter of this.presenters) { + try { + await presenter.dispose(); + } catch { + // A presenter that fails to tear down must not mask the run's outcome. + } + } + } +} diff --git a/src/orchestrator/src/presenter/clock.test.ts b/src/orchestrator/src/presenter/clock.test.ts new file mode 100644 index 000000000..eabbd15e0 --- /dev/null +++ b/src/orchestrator/src/presenter/clock.test.ts @@ -0,0 +1,22 @@ +import { describe, expect, it } from 'vitest'; + +import { formatElapsed } from './clock.js'; + +describe('formatElapsed', () => { + it('renders whole seconds under a minute — no decimals', () => { + expect(formatElapsed(0)).toBe('0s'); + expect(formatElapsed(2500)).toBe('2s'); // floors, doesn't round + expect(formatElapsed(18_600)).toBe('18s'); + expect(formatElapsed(59_900)).toBe('59s'); + }); + + it('renders m:ss at and beyond a minute', () => { + expect(formatElapsed(60_000)).toBe('1m00s'); + expect(formatElapsed(62_000)).toBe('1m02s'); + expect(formatElapsed(305_000)).toBe('5m05s'); + }); + + it('clamps negatives to 0s', () => { + expect(formatElapsed(-100)).toBe('0s'); + }); +}); diff --git a/src/orchestrator/src/presenter/clock.ts b/src/orchestrator/src/presenter/clock.ts new file mode 100644 index 000000000..1fd0d03b4 --- /dev/null +++ b/src/orchestrator/src/presenter/clock.ts @@ -0,0 +1,37 @@ +// Elapsed-since-cook-start clock, owned by a presenter (I136-K). +// +// `now` is injectable so goldens and frame tests are deterministic. +// Format matches the pre-refactor `elapsed()`: one decimal second, +// right-padded to 7 columns. + +export interface ElapsedClock { + seed(runStart: number): void; + elapsed(): string; +} + +/** + * Human elapsed for a live, ticking indicator: whole seconds under a minute, + * `m:ss` above. Deliberately coarse — no decimals — so a fast re-render loop + * doesn't make the number flicker. + */ +export function formatElapsed(ms: number): string { + const total = Math.max(0, Math.floor(ms / 1000)); + if (total < 60) return `${total}s`; + const minutes = Math.floor(total / 60); + const seconds = total % 60; + return `${minutes}m${String(seconds).padStart(2, '0')}s`; +} + +export function createElapsedClock(now: () => number = () => Date.now()): ElapsedClock { + let runStart: number | undefined; + return { + seed(rs) { + runStart = rs; + }, + elapsed() { + if (runStart === undefined) runStart = now(); + const seconds = ((now() - runStart) / 1000).toFixed(1); + return `${seconds}s`.padStart(7); + }, + }; +} diff --git a/src/orchestrator/src/presenter/events.ts b/src/orchestrator/src/presenter/events.ts new file mode 100644 index 000000000..6be9e6629 --- /dev/null +++ b/src/orchestrator/src/presenter/events.ts @@ -0,0 +1,37 @@ +// The presentation event stream for `plan` / `cook` / `serve`. +// +// This is the single boundary the orchestrator emits to; presenters +// (plain / ink / silent) consume it. It is *ephemeral* — `reports.jsonl` +// remains the durable communication medium (D156-K); a CookEvent never +// carries durable truth, only what the user should see happen. +// +// The union grows arm-by-arm as surfaces are migrated off direct +// `console.error`. Slice 1 covers the existing post-hoc output; live +// `activity-start`/`activity-end` waits are slice 2. + +export type CookEvent = + // --- plan surface --- + | { kind: 'plan-start'; specId: number; outDir: string } + | { kind: 'plan-written'; path: string; epics: number; slices: number } + | { kind: 'plan-warnings'; messages: string[] } + // --- cook surface --- + // Seeds the presenter's elapsed clock; renders nothing itself. + | { kind: 'cook-start'; runStart: number } + // A per-action progress line; the presenter prepends elapsed-since-cook-start. + | { kind: 'action'; icon: string; message: string } + // Raw agent output, shown only when the emit site is in verbose mode. + | { kind: 'verbose'; text: string } + // A pre-formatted line rendered verbatim (banner / summary / promotion blocks). + | { kind: 'line'; text: string } + // --- live waits (slice 2b) --- + // Opens a pending activity: a long wait the user should see in progress. + | { kind: 'activity-start'; id: string; label: string } + // Updates the in-flight detail of an open activity (e.g. a pi token heartbeat). + | { kind: 'activity-progress'; id: string; detail: string } + // Closes the activity; the wait is over. + | { kind: 'activity-end'; id: string }; + +export interface Presenter { + onEvent(event: CookEvent): void; + dispose(): void | Promise; +} diff --git a/src/orchestrator/src/presenter/format.ts b/src/orchestrator/src/presenter/format.ts new file mode 100644 index 000000000..5dc04935a --- /dev/null +++ b/src/orchestrator/src/presenter/format.ts @@ -0,0 +1,40 @@ +// CookEvent → display lines. The single formatting authority, shared by the +// plain backend (writes each line to stderr) and the Ink backend (accumulates +// them into the activity log), so the two can never drift. `cook-start` seeds +// the clock and yields no lines. + +import type { ElapsedClock } from './clock.js'; +import type { CookEvent } from './events.js'; + +const RULE = ' ──────────────────────────────────────'; + +export function formatCookEvent(event: CookEvent, clock: ElapsedClock): string[] { + switch (event.kind) { + case 'plan-start': + return ['', ' brunch plan', RULE, ` spec ${event.specId}`, ` out ${event.outDir}`, '']; + case 'plan-written': + return [` ✓ plan ${event.path}`, ` ${event.epics} epics, ${event.slices} slices`, '']; + case 'plan-warnings': + if (event.messages.length === 0) return []; + return [` ${event.messages.length} warnings:`, ...event.messages.map((m) => ` ! ${m}`), '']; + case 'cook-start': + clock.seed(event.runStart); + return []; + case 'action': + return [` ${clock.elapsed()} ${event.icon} ${event.message}`]; + case 'verbose': { + const trimmed = event.text.trim(); + if (!trimmed) return []; + return ['', ...trimmed.split('\n').map((line) => ` │ ${line}`), '']; + } + case 'line': + return [event.text]; + case 'activity-start': + // Plain/CI can't animate; a single line breaks the silence at wait start. + return [` ${clock.elapsed()} ⋯ ${event.label}`]; + case 'activity-progress': + case 'activity-end': + // Live-only: the Ink panel reflects these; the existing completion log marks the end. + return []; + } +} diff --git a/src/orchestrator/src/presenter/ink/app.test.tsx b/src/orchestrator/src/presenter/ink/app.test.tsx new file mode 100644 index 000000000..667af51b6 --- /dev/null +++ b/src/orchestrator/src/presenter/ink/app.test.tsx @@ -0,0 +1,67 @@ +import { render } from 'ink-testing-library'; +import { describe, expect, it } from 'vitest'; + +import { RunStore } from '../run-store.js'; +import { App } from './app.js'; + +async function tick() { + await new Promise((resolve) => setTimeout(resolve, 0)); +} + +describe('Ink App', () => { + it('renders the wordmark header, brigade tracker, and recent activity', async () => { + const store = new RunStore('cook', () => 0); + const { lastFrame } = render(); + + store.push({ kind: 'cook-start', runStart: 0 }); + store.push({ kind: 'action', icon: '▸', message: 'tests slice-1' }); + await tick(); + + const frame = lastFrame() ?? ''; + // Wordmark header + command. + expect(frame).toContain('brunch'); + expect(frame).toContain('cook'); + // Brigade tracker shows every phase, with cook active (◐) once cooking. + expect(frame).toContain('prep'); + expect(frame).toContain('cook ◐'); + // Activity log carries the formatted action line. + expect(frame).toContain('tests slice-1'); + }); + + it('marks earlier phases done and reflects promotion as plate', async () => { + const store = new RunStore('serve', () => 0); + const { lastFrame } = render(); + + store.push({ kind: 'cook-start', runStart: 0 }); + store.push({ kind: 'line', text: ' ✓ promoted → cook/abc @ 1234abcd' }); + await tick(); + + const frame = lastFrame() ?? ''; + expect(frame).toContain('plate ◐'); + expect(frame).toContain('cook ✓'); + expect(frame).toContain('promoted'); + }); + + it('shows a pending activity with label + elapsed + detail, and clears it on end', async () => { + let clock = 1000; + const store = new RunStore('cook', () => clock); + const { lastFrame } = render( clock} />); + + store.push({ kind: 'activity-start', id: 'tests:slice-1', label: 'agent writing tests' }); + store.push({ kind: 'activity-progress', id: 'tests:slice-1', detail: '8 KB' }); + clock = 3500; // 2.5s elapsed + await tick(); + + let frame = lastFrame() ?? ''; + expect(frame).toContain('agent writing tests'); + expect(frame).toContain('2s'); // whole seconds — no jittery decimal + expect(frame).not.toContain('2.5s'); + expect(frame).toContain('8 KB'); + + store.push({ kind: 'activity-end', id: 'tests:slice-1' }); + await tick(); + + frame = lastFrame() ?? ''; + expect(frame).not.toContain('agent writing tests'); + }); +}); diff --git a/src/orchestrator/src/presenter/ink/app.tsx b/src/orchestrator/src/presenter/ink/app.tsx new file mode 100644 index 000000000..d67ba0315 --- /dev/null +++ b/src/orchestrator/src/presenter/ink/app.tsx @@ -0,0 +1,105 @@ +// The full-screen Ink view: brunch wordmark header, brigade phase tracker, and a +// bounded live activity log. A thin projection of RunStore — all folding +// lives in the store + the pure phase tracker, so this stays declarative. + +import { Box, Text } from 'ink'; +import { useEffect, useState, useSyncExternalStore } from 'react'; + +import { formatElapsed } from '../clock.js'; +import { BRIGADE, type BrigadePhase } from '../phase.js'; +import type { PendingActivity, RunStore } from '../run-store.js'; +import { BRUNCH_WORDMARK } from './wordmark.js'; + +const LOG_TAIL = 15; +const SPINNER = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']; +const TICK_MS = 250; + +function Header({ command }: { command: string }) { + return ( + + {BRUNCH_WORDMARK.map(({ ch, color }) => ( + + {ch} + + ))} + {command} + + ); +} + +const STATUS_ICON = { done: '✓', active: '◐', pending: '○' } as const; + +function Brigade({ phase }: { phase: BrigadePhase }) { + const active = BRIGADE.indexOf(phase); + return ( + + {BRIGADE.map((p, i) => { + const status = i < active ? 'done' : i === active ? 'active' : 'pending'; + const color = status === 'active' ? 'cyan' : status === 'done' ? 'green' : 'gray'; + return ( + + {p} {STATUS_ICON[status]} + {i < BRIGADE.length - 1 ? ' ' : ''} + + ); + })} + + ); +} + +function ActivityLog({ lines }: { lines: string[] }) { + return ( + + {lines.slice(-LOG_TAIL).map((line, i) => ( + {line === '' ? ' ' : line} + ))} + + ); +} + +function PendingPanel({ + pending, + now, + frame, +}: { + pending: PendingActivity[]; + now: () => number; + frame: string; +}) { + if (pending.length === 0) return null; + return ( + + {pending.map((a) => ( + + {frame} {a.label} · {formatElapsed(now() - a.startedAt)} + {a.detail ? ` · ${a.detail}` : ''} + + ))} + + ); +} + +export function App({ store, now = () => Date.now() }: { store: RunStore; now?: () => number }) { + const state = useSyncExternalStore(store.subscribe, store.getSnapshot, store.getSnapshot); + + // Tick only while something is pending, so the spinner/elapsed advance even + // between events; the interval is torn down as soon as the waits clear. + const [tick, setTick] = useState(0); + const hasPending = state.pending.length > 0; + useEffect(() => { + if (!hasPending) return; + const id = setInterval(() => setTick((t) => t + 1), TICK_MS); + return () => clearInterval(id); + }, [hasPending]); + + return ( + +
+ + + + + + + ); +} diff --git a/src/orchestrator/src/presenter/ink/ink-presenter.tsx b/src/orchestrator/src/presenter/ink/ink-presenter.tsx new file mode 100644 index 000000000..ae9e02c81 --- /dev/null +++ b/src/orchestrator/src/presenter/ink/ink-presenter.tsx @@ -0,0 +1,29 @@ +// The interactive full-screen backend. Renders the Ink App to STDERR (stdout +// stays reserved), feeding it from a RunStore. Thin glue: state + formatting +// live in RunStore / format / phase, all unit-tested without a terminal. + +import { render } from 'ink'; + +import type { CookEvent, Presenter } from '../events.js'; +import { RunStore } from '../run-store.js'; +import { App } from './app.js'; + +export class InkPresenter implements Presenter { + private readonly store: RunStore; + private readonly instance: ReturnType; + + constructor(command: string, now: () => number = () => Date.now()) { + this.store = new RunStore(command, now); + // Render to stderr so stdout stays clean for piping / agent JSONL. + this.instance = render(, { stdout: process.stderr }); + } + + onEvent(event: CookEvent): void { + this.store.push(event); + } + + async dispose(): Promise { + this.instance.unmount(); + await this.instance.waitUntilExit(); + } +} diff --git a/src/orchestrator/src/presenter/ink/wordmark.ts b/src/orchestrator/src/presenter/ink/wordmark.ts new file mode 100644 index 000000000..fc793939e --- /dev/null +++ b/src/orchestrator/src/presenter/ink/wordmark.ts @@ -0,0 +1,12 @@ +// The "brunch" wordmark for the TUI header, tinted with the brunch.ai brand +// gradient (HASH blue → indigo → violet, from the product mark). One hex per +// letter, left to right. The plain/CI backend stays untinted. + +export const BRUNCH_WORDMARK: readonly { ch: string; color: string }[] = [ + { ch: 'b', color: '#00BBFF' }, + { ch: 'r', color: '#0080FF' }, + { ch: 'u', color: '#0046FF' }, + { ch: 'n', color: '#3A36FF' }, + { ch: 'c', color: '#5424FF' }, + { ch: 'h', color: '#6D2BF6' }, +]; diff --git a/src/orchestrator/src/presenter/phase.test.ts b/src/orchestrator/src/presenter/phase.test.ts new file mode 100644 index 000000000..e3a78a813 --- /dev/null +++ b/src/orchestrator/src/presenter/phase.test.ts @@ -0,0 +1,40 @@ +import { describe, expect, it } from 'vitest'; + +import type { CookEvent } from './events.js'; +import { type BrigadePhase, nextPhase } from './phase.js'; + +function walk(events: CookEvent[]): BrigadePhase { + return events.reduce((phase, event) => nextPhase(phase, event), 'prep'); +} + +describe('nextPhase', () => { + it('lights recipe at plan-start and cook at cook-start', () => { + expect(nextPhase('prep', { kind: 'plan-start', specId: 1, outDir: '/x' })).toBe('recipe'); + expect(nextPhase('prep', { kind: 'cook-start', runStart: 0 })).toBe('cook'); + }); + + it('advances to taste on an epic/verify action and to plate on a promotion line', () => { + expect(nextPhase('cook', { kind: 'action', icon: '▸', message: 'verify api-auth' })).toBe('taste'); + expect(nextPhase('cook', { kind: 'action', icon: '●', message: 'epic api-auth → PASS' })).toBe( + 'taste', + ); + expect(nextPhase('taste', { kind: 'line', text: ' ✓ promoted → cook/abc @ 1234abcd' })).toBe('plate'); + }); + + it('never regresses to an earlier phase', () => { + // A per-slice action after taste must not pull the tracker back to cook. + expect(nextPhase('taste', { kind: 'action', icon: '▸', message: 'tests slice-2' })).toBe('taste'); + expect(nextPhase('plate', { kind: 'cook-start', runStart: 0 })).toBe('plate'); + }); + + it('walks a full cook run prep → cook → taste → plate', () => { + expect( + walk([ + { kind: 'cook-start', runStart: 0 }, + { kind: 'action', icon: '▸', message: 'tests slice-1' }, + { kind: 'action', icon: '▸', message: 'verify api-auth' }, + { kind: 'line', text: ' ✓ promoted → cook/abc @ 1234abcd' }, + ]), + ).toBe('plate'); + }); +}); diff --git a/src/orchestrator/src/presenter/phase.ts b/src/orchestrator/src/presenter/phase.ts new file mode 100644 index 000000000..e74b1e0b6 --- /dev/null +++ b/src/orchestrator/src/presenter/phase.ts @@ -0,0 +1,35 @@ +// The kitchen-brigade phase tracker — a pure, monotonic projection of the +// CookEvent stream. The brigade names are phase labels, not commands +// (PLAN.md): detect→prep, plan→recipe, orchestrate→cook, verify→taste, +// promote→plate, ship→serve. +// +// Slice 2a derives the phase coarsely from the post-hoc event vocabulary; +// precise in-flight transitions arrive with the activity-start signals in +// slice 2b. The tracker never regresses. + +import type { CookEvent } from './events.js'; + +export type BrigadePhase = 'prep' | 'recipe' | 'cook' | 'taste' | 'plate' | 'serve'; + +export const BRIGADE: readonly BrigadePhase[] = ['prep', 'recipe', 'cook', 'taste', 'plate', 'serve']; + +export function nextPhase(current: BrigadePhase, event: CookEvent): BrigadePhase { + const target = phaseFor(event); + if (!target) return current; + return BRIGADE.indexOf(target) > BRIGADE.indexOf(current) ? target : current; +} + +function phaseFor(event: CookEvent): BrigadePhase | undefined { + switch (event.kind) { + case 'plan-start': + return 'recipe'; + case 'cook-start': + return 'cook'; + case 'action': + return /^(verify|epic)/.test(event.message) ? 'taste' : undefined; + case 'line': + return event.text.includes('promoted') ? 'plate' : undefined; + default: + return undefined; + } +} diff --git a/src/orchestrator/src/presenter/plain.test.ts b/src/orchestrator/src/presenter/plain.test.ts new file mode 100644 index 000000000..d78b5dd69 --- /dev/null +++ b/src/orchestrator/src/presenter/plain.test.ts @@ -0,0 +1,103 @@ +import { describe, expect, it } from 'vitest'; + +import type { CookEvent } from './events.js'; +import { PlainPresenter } from './plain.js'; + +function render(events: CookEvent[]): string[] { + const lines: string[] = []; + const presenter = new PlainPresenter({ log: (line) => lines.push(line) }); + for (const event of events) presenter.onEvent(event); + return lines; +} + +/** Render with a fake clock so elapsed values are deterministic (I136-K). */ +function renderTimed(events: CookEvent[], nowValues: number[]): string[] { + const lines: string[] = []; + let i = 0; + const presenter = new PlainPresenter({ + log: (line) => lines.push(line), + now: () => nowValues[Math.min(i++, nowValues.length - 1)]!, + }); + for (const event of events) presenter.onEvent(event); + return lines; +} + +describe('PlainPresenter — plan surface', () => { + it('renders the plan banner byte-for-byte', () => { + expect(render([{ kind: 'plan-start', specId: 2, outDir: '/tmp/x' }])).toEqual([ + '', + ' brunch plan', + ' ──────────────────────────────────────', + ' spec 2', + ' out /tmp/x', + '', + ]); + }); + + it('renders the plan-written summary', () => { + expect(render([{ kind: 'plan-written', path: '/p/plan.yaml', epics: 1, slices: 2 }])).toEqual([ + ' ✓ plan /p/plan.yaml', + ' 1 epics, 2 slices', + '', + ]); + }); + + it('renders a warnings block with the printed count and one line per message', () => { + expect( + render([{ kind: 'plan-warnings', messages: ['cycle-break-dropped-edge: a→b', 'orphan: c'] }]), + ).toEqual([' 2 warnings:', ' ! cycle-break-dropped-edge: a→b', ' ! orphan: c', '']); + }); + + it('emits nothing for an empty warnings set', () => { + expect(render([{ kind: 'plan-warnings', messages: [] }])).toEqual([]); + }); +}); + +describe('PlainPresenter — cook surface', () => { + it('renders a verbatim line and nothing for cook-start', () => { + expect( + render([ + { kind: 'cook-start', runStart: 0 }, + { kind: 'line', text: ' brunch cook' }, + ]), + ).toEqual([' brunch cook']); + }); + + it('prepends elapsed-since-cook-start to an action line, padded like the original', () => { + // runStart 1000ms; clock reads 2500ms at the action → 1.5s elapsed. + expect( + renderTimed( + [ + { kind: 'cook-start', runStart: 1000 }, + { kind: 'action', icon: '▸', message: 'tests slice-1' }, + ], + [2500], + ), + ).toEqual([' 1.5s ▸ tests slice-1']); + }); + + it('keeps an inline duration in the action message untouched', () => { + expect( + renderTimed( + [ + { kind: 'cook-start', runStart: 0 }, + { kind: 'action', icon: '✓', message: 'write-tests (0.3s)' }, + ], + [12_300], + ), + ).toEqual([' 12.3s ✓ write-tests (0.3s)']); + }); + + it('renders the verbose block with a left border, blank-padded', () => { + expect(render([{ kind: 'verbose', text: 'line one\nline two' }])).toEqual([ + '', + ' │ line one', + ' │ line two', + '', + ]); + }); + + it('skips a verbose block whose text is blank', () => { + expect(render([{ kind: 'verbose', text: ' \n ' }])).toEqual([]); + }); +}); diff --git a/src/orchestrator/src/presenter/plain.ts b/src/orchestrator/src/presenter/plain.ts new file mode 100644 index 000000000..9a7186728 --- /dev/null +++ b/src/orchestrator/src/presenter/plain.ts @@ -0,0 +1,33 @@ +// Line-oriented presenter: CookEvent → stderr lines. +// +// The default backend (CI / non-TTY / piped) and the behavior-preserving +// reference: it reproduces the pre-refactor output of `plan` / `cook` / +// `serve` byte-for-byte. Formatting lives in `format.ts` (shared with the +// Ink backend); this class only owns the clock and the line sink, which +// defaults to `console.error` (stderr — stdout is reserved). + +import { createElapsedClock, type ElapsedClock } from './clock.js'; +import type { CookEvent, Presenter } from './events.js'; +import { formatCookEvent } from './format.js'; + +export type PlainPresenterOptions = { + log?: (line: string) => void; + /** Clock for the elapsed prefix; injectable for deterministic goldens (I136-K). */ + now?: () => number; +}; + +export class PlainPresenter implements Presenter { + private readonly log: (line: string) => void; + private readonly clock: ElapsedClock; + + constructor(options: PlainPresenterOptions = {}) { + this.log = options.log ?? ((line) => console.error(line)); + this.clock = createElapsedClock(options.now); + } + + onEvent(event: CookEvent): void { + for (const line of formatCookEvent(event, this.clock)) this.log(line); + } + + dispose(): void {} +} diff --git a/src/orchestrator/src/presenter/run-store.test.ts b/src/orchestrator/src/presenter/run-store.test.ts new file mode 100644 index 000000000..b7cda75d7 --- /dev/null +++ b/src/orchestrator/src/presenter/run-store.test.ts @@ -0,0 +1,70 @@ +import { describe, expect, it } from 'vitest'; + +import { RunStore } from './run-store.js'; + +describe('RunStore', () => { + it('folds cook events into phase + formatted lines, using the injected clock', () => { + const store = new RunStore('cook', () => 1500); + store.push({ kind: 'cook-start', runStart: 1000 }); + store.push({ kind: 'action', icon: '▸', message: 'tests slice-1' }); + + const state = store.getSnapshot(); + expect(state.command).toBe('cook'); + expect(state.phase).toBe('cook'); + // 1500 - 1000 = 0.5s, formatted exactly like the plain backend. + expect(state.lines).toEqual([' 0.5s ▸ tests slice-1']); + }); + + it('advances the brigade phase on a promotion line', () => { + const store = new RunStore('serve', () => 0); + store.push({ kind: 'cook-start', runStart: 0 }); + store.push({ kind: 'line', text: ' ✓ promoted → cook/abc @ 1234abcd' }); + expect(store.getSnapshot().phase).toBe('plate'); + }); + + it('keeps a stable snapshot reference and does not notify on a no-op event', () => { + const store = new RunStore('cook', () => 0); + store.push({ kind: 'cook-start', runStart: 0 }); + const before = store.getSnapshot(); + + let notified = 0; + store.subscribe(() => notified++); + // A second cook-start adds no lines and cannot advance the phase. + store.push({ kind: 'cook-start', runStart: 0 }); + + expect(store.getSnapshot()).toBe(before); + expect(notified).toBe(0); + }); + + it('notifies subscribers when state changes', () => { + const store = new RunStore('cook', () => 0); + let notified = 0; + store.subscribe(() => notified++); + store.push({ kind: 'line', text: ' brunch cook' }); + expect(notified).toBe(1); + }); + + it('tracks pending activities: start adds, progress updates detail, end removes', () => { + let clock = 1000; + const store = new RunStore('cook', () => clock); + store.push({ kind: 'activity-start', id: 'tests:slice-1', label: 'agent writing tests' }); + + let pending = store.getSnapshot().pending; + expect(pending).toHaveLength(1); + expect(pending[0]).toMatchObject({ id: 'tests:slice-1', label: 'agent writing tests', startedAt: 1000 }); + + store.push({ kind: 'activity-progress', id: 'tests:slice-1', detail: '8 KB' }); + expect(store.getSnapshot().pending[0]).toMatchObject({ detail: '8 KB' }); + + clock = 5000; + store.push({ kind: 'activity-end', id: 'tests:slice-1' }); + expect(store.getSnapshot().pending).toHaveLength(0); + }); + + it('does not put activity events into the scrolling line log', () => { + const store = new RunStore('cook', () => 0); + store.push({ kind: 'activity-start', id: 'a', label: 'booting app' }); + store.push({ kind: 'activity-end', id: 'a' }); + expect(store.getSnapshot().lines).toEqual([]); + }); +}); diff --git a/src/orchestrator/src/presenter/run-store.ts b/src/orchestrator/src/presenter/run-store.ts new file mode 100644 index 000000000..81e9add88 --- /dev/null +++ b/src/orchestrator/src/presenter/run-store.ts @@ -0,0 +1,79 @@ +// Observable run state for the Ink backend. +// +// Folds the CookEvent stream into { phase, lines } using the SAME formatter +// as the plain backend (so log bodies can't drift) and the pure brigade +// tracker. Exposes the subscribe/getSnapshot pair `useSyncExternalStore` +// needs; the snapshot identity is stable between no-op events. + +import { createElapsedClock, type ElapsedClock } from './clock.js'; +import type { CookEvent } from './events.js'; +import { formatCookEvent } from './format.js'; +import { type BrigadePhase, nextPhase } from './phase.js'; + +const MAX_LINES = 500; + +export interface PendingActivity { + id: string; + label: string; + detail?: string; + startedAt: number; +} + +export interface RunState { + command: string; + phase: BrigadePhase; + lines: string[]; + pending: PendingActivity[]; +} + +export class RunStore { + private state: RunState; + private readonly clock: ElapsedClock; + private readonly listeners = new Set<() => void>(); + + constructor( + command: string, + private readonly now: () => number = () => Date.now(), + ) { + this.clock = createElapsedClock(now); + this.state = { command, phase: 'prep', lines: [], pending: [] }; + } + + push(event: CookEvent): void { + if (event.kind === 'activity-start') { + this.commit({ + pending: [...this.state.pending, { id: event.id, label: event.label, startedAt: this.now() }], + }); + return; + } + if (event.kind === 'activity-progress') { + this.commit({ + pending: this.state.pending.map((a) => (a.id === event.id ? { ...a, detail: event.detail } : a)), + }); + return; + } + if (event.kind === 'activity-end') { + this.commit({ pending: this.state.pending.filter((a) => a.id !== event.id) }); + return; + } + + const added = formatCookEvent(event, this.clock); + const phase = nextPhase(this.state.phase, event); + if (added.length === 0 && phase === this.state.phase) return; + this.commit({ phase, lines: [...this.state.lines, ...added].slice(-MAX_LINES) }); + } + + private commit(patch: Partial): void { + this.state = { ...this.state, ...patch }; + for (const listener of this.listeners) listener(); + } + + getSnapshot = (): RunState => this.state; + + subscribe = (listener: () => void): (() => void) => { + this.listeners.add(listener); + return () => { + this.listeners.delete(listener); + }; + }; +} diff --git a/src/orchestrator/src/presenter/select.test.ts b/src/orchestrator/src/presenter/select.test.ts new file mode 100644 index 000000000..e5478f630 --- /dev/null +++ b/src/orchestrator/src/presenter/select.test.ts @@ -0,0 +1,29 @@ +import { describe, expect, it } from 'vitest'; + +import { selectPresenter } from './select.js'; + +describe('selectPresenter', () => { + it('honors an explicit reporter flag over every environment signal', () => { + expect(selectPresenter({ command: 'cook', isTTY: true, ci: false, reporterFlag: 'plain' })).toBe('plain'); + expect(selectPresenter({ command: 'agent', isTTY: false, ci: true, reporterFlag: 'ink' })).toBe('ink'); + expect(selectPresenter({ command: 'serve', isTTY: false, ci: true, reporterFlag: 'silent' })).toBe( + 'silent', + ); + }); + + it('forces silent for agent mode so stdout stays JSONL-clean', () => { + expect(selectPresenter({ command: 'agent', isTTY: true, ci: false })).toBe('silent'); + }); + + it('falls back to plain in CI or when stderr is not a TTY', () => { + expect(selectPresenter({ command: 'cook', isTTY: false, ci: false })).toBe('plain'); + expect(selectPresenter({ command: 'serve', isTTY: true, ci: true })).toBe('plain'); + expect(selectPresenter({ command: 'plan', isTTY: false, ci: true })).toBe('plain'); + }); + + it('selects the ink TUI only on an interactive non-CI TTY', () => { + expect(selectPresenter({ command: 'cook', isTTY: true, ci: false })).toBe('ink'); + expect(selectPresenter({ command: 'serve', isTTY: true, ci: false })).toBe('ink'); + expect(selectPresenter({ command: 'plan', isTTY: true, ci: false })).toBe('ink'); + }); +}); diff --git a/src/orchestrator/src/presenter/select.ts b/src/orchestrator/src/presenter/select.ts new file mode 100644 index 000000000..f53a547d8 --- /dev/null +++ b/src/orchestrator/src/presenter/select.ts @@ -0,0 +1,26 @@ +// Which presenter renders a CLI run, chosen from command + environment. +// +// Pure so the decision is testable without a real TTY. `ink` is the +// interactive full-screen TUI (slice 2); `plain` is line-oriented for +// CI / non-TTY / piped output; `silent` keeps stdout clean for the +// `agent` JSONL protocol. An explicit `--reporter` flag overrides the +// environment entirely. (`json` is intentionally not modeled yet — no +// consumer exists; add it when one does.) + +export type PresenterKind = 'ink' | 'plain' | 'silent'; + +export type PresenterCommand = 'plan' | 'cook' | 'serve' | 'agent'; + +export type SelectPresenterEnv = { + command: PresenterCommand; + isTTY: boolean; + ci: boolean; + reporterFlag?: PresenterKind; +}; + +export function selectPresenter(env: SelectPresenterEnv): PresenterKind { + if (env.reporterFlag) return env.reporterFlag; + if (env.command === 'agent') return 'silent'; + if (env.ci || !env.isTTY) return 'plain'; + return 'ink'; +} diff --git a/src/orchestrator/src/presenter/silent.ts b/src/orchestrator/src/presenter/silent.ts new file mode 100644 index 000000000..1121c718a --- /dev/null +++ b/src/orchestrator/src/presenter/silent.ts @@ -0,0 +1,9 @@ +// Renders nothing. Used for `brunch agent`, whose stdout is the JSONL +// protocol and must never carry presentation noise. + +import type { CookEvent, Presenter } from './events.js'; + +export class SilentPresenter implements Presenter { + onEvent(_event: CookEvent): void {} + dispose(): void {} +} diff --git a/src/orchestrator/src/project-detect.test.ts b/src/orchestrator/src/project-detect.test.ts new file mode 100644 index 000000000..0da4e0a6a --- /dev/null +++ b/src/orchestrator/src/project-detect.test.ts @@ -0,0 +1,189 @@ +import { mkdirSync, mkdtempSync, writeFileSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { dirname, join } from 'node:path'; + +import { describe, expect, it } from 'vitest'; + +import { detectProfile, detectTestDir } from './project-detect.js'; + +function repo(files: Record): string { + const dir = mkdtempSync(join(tmpdir(), 'detect-')); + for (const [name, contents] of Object.entries(files)) { + const path = join(dir, name); + mkdirSync(dirname(path), { recursive: true }); + writeFileSync(path, contents); + } + return dir; +} + +const pkg = (deps: Record): string => JSON.stringify({ devDependencies: deps }); + +describe('detectProfile maps real manifest/lockfile evidence to a registry profile', () => { + it('package.json with vitest → node-vitest', () => { + const result = detectProfile(repo({ 'package.json': pkg({ vitest: '^2.0.0' }) })); + expect(result).toMatchObject({ detected: true, profile: 'node-vitest' }); + }); + + it('package.json with jest → node-jest', () => { + const result = detectProfile(repo({ 'package.json': pkg({ jest: '^29.0.0' }) })); + expect(result).toMatchObject({ detected: true, profile: 'node-jest' }); + }); + + it('package.json with no test framework → node-test (built-in runner)', () => { + const result = detectProfile(repo({ 'package.json': pkg({ typescript: '^5.0.0' }) })); + expect(result).toMatchObject({ detected: true, profile: 'node-test' }); + }); + + it('a bun lockfile → bun (and wins over package.json deps)', () => { + const result = detectProfile(repo({ 'bun.lockb': '', 'package.json': pkg({ vitest: '^2.0.0' }) })); + expect(result).toMatchObject({ detected: true, profile: 'bun', evidence: 'bun.lockb' }); + }); + + it('a deno config → deno (even alongside a package.json for npm specifiers)', () => { + const result = detectProfile(repo({ 'deno.json': '{}', 'package.json': pkg({}) })); + expect(result).toMatchObject({ detected: true, profile: 'deno', evidence: 'deno.json' }); + }); + + it('every successful detection carries the evidence that selected it', () => { + const result = detectProfile(repo({ 'package.json': pkg({ vitest: '^2.0.0' }) })); + expect(result.detected && result.evidence).toContain('vitest'); + }); +}); + +describe('detectProfile fails loudly rather than defaulting silently', () => { + it('package.json declaring BOTH vitest and jest → ambiguous, not detected', () => { + // The cheap check resolves a single clear signal; two runners is genuinely + // ambiguous and must not be silently resolved by check-order. + const result = detectProfile(repo({ 'package.json': pkg({ vitest: '^2.0.0', jest: '^29.0.0' }) })); + expect(result.detected).toBe(false); + expect(!result.detected && result.reason).toMatch(/ambiguous/i); + expect(!result.detected && result.reason).toMatch(/--profile/); + }); + + it('a non-JS project (Python/Go) → not detected, actionable reason listing valid profiles', () => { + // No language-detection engine: any repo without JS/TS evidence falls to the + // same actionable catch-all (brunch only supports the registry's JS profiles). + const nonJsRepos: Record[] = [ + { 'pyproject.toml': '[project]\nname = "x"\n' }, + { 'go.mod': 'module x\n' }, + ]; + for (const files of nonJsRepos) { + const result = detectProfile(repo(files)); + expect(result.detected).toBe(false); + expect(!result.detected && result.reason).toMatch(/could not detect/); + expect(!result.detected && result.reason).toMatch(/node-vitest/); + } + }); + + it('an unrecognized directory → not detected, actionable reason', () => { + const result = detectProfile(repo({ 'README.md': '# hi\n' })); + expect(result.detected).toBe(false); + expect(!result.detected && result.reason).toMatch(/could not detect/); + }); + + it('a malformed package.json is still treated as a Node project (node-test)', () => { + const result = detectProfile(repo({ 'package.json': '{ not json' })); + expect(result).toMatchObject({ detected: true, profile: 'node-test' }); + }); +}); + +describe('detectTestDir learns the test directory from existing test files', () => { + it('returns the full directory tests cluster in, not just the top segment', () => { + const dir = repo({ + 'src/lib/bar.test.ts': '', + 'src/lib/qux.test.ts': '', + 'src/foo.test.ts': '', + 'src/lib/baz.ts': '', + }); + // src/lib has 2 test files, src has 1 → the deeper, dominant dir wins. + expect(detectTestDir(dir)).toBe('src/lib'); + }); + + it('returns a deep monorepo test root so a package-rooted include still covers it', () => { + const dir = repo({ + 'packages/app/src/a.test.ts': '', + 'packages/app/src/b.test.ts': '', + 'packages/lib/src/c.test.ts': '', + }); + expect(detectTestDir(dir)).toBe('packages/app/src'); + }); + + it('picks the dominant directory when tests are split across several', () => { + const dir = repo({ + 'src/a.test.ts': '', + 'src/b.test.ts': '', + 'src/c.test.ts': '', + 'tests/d.test.ts': '', + }); + expect(detectTestDir(dir)).toBe('src'); + }); + + it('recognizes .spec. and jsx/tsx/mjs/cjs test files', () => { + expect(detectTestDir(repo({ 'app/x.spec.tsx': '' }))).toBe('app'); + expect(detectTestDir(repo({ 'app/x.test.mjs': '' }))).toBe('app'); + }); + + it('ignores node_modules and other build/vendor directories', () => { + const dir = repo({ + 'node_modules/pkg/dep.test.ts': '', + 'dist/out.test.ts': '', + 'src/real.test.ts': '', + }); + expect(detectTestDir(dir)).toBe('src'); + }); + + it('returns null when the repo has no test files to learn from', () => { + expect(detectTestDir(repo({ 'src/index.ts': '', 'package.json': '{}' }))).toBeNull(); + }); + + it('returns an empty directory for test files sitting directly at the repo root', () => { + expect(detectTestDir(repo({ 'root.test.ts': '' }))).toBe(''); + }); +}); + +describe('detectProfile resolves the runner from workspace packages in a monorepo', () => { + it('finds vitest in a workspace package when the root declares no runner', () => { + const dir = repo({ + 'package.json': JSON.stringify({ workspaces: ['packages/*'] }), + 'packages/app/package.json': pkg({ vitest: '^2.0.0' }), + 'packages/lib/package.json': pkg({ typescript: '^5.0.0' }), + }); + expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-vitest' }); + }); + + it('finds the runner via a pnpm-workspace.yaml package list', () => { + const dir = repo({ + 'package.json': JSON.stringify({ name: 'root' }), + 'pnpm-workspace.yaml': "packages:\n - 'packages/*'\n", + 'packages/web/package.json': pkg({ jest: '^29.0.0' }), + }); + expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-jest' }); + }); + + it('a root runner wins without scanning (and a workspace cannot make it ambiguous)', () => { + const dir = repo({ + 'package.json': JSON.stringify({ workspaces: ['packages/*'], devDependencies: { vitest: '^2.0.0' } }), + 'packages/legacy/package.json': pkg({ jest: '^29.0.0' }), + }); + expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-vitest' }); + }); + + it('workspaces collectively declaring both runners is ambiguous, not silently picked', () => { + const dir = repo({ + 'package.json': JSON.stringify({ workspaces: ['packages/*'] }), + 'packages/a/package.json': pkg({ vitest: '^2.0.0' }), + 'packages/b/package.json': pkg({ jest: '^29.0.0' }), + }); + const result = detectProfile(dir); + expect(result.detected).toBe(false); + expect(!result.detected && result.reason).toMatch(/ambiguous/i); + }); + + it('a literal (non-wildcard) workspace directory is resolved', () => { + const dir = repo({ + 'package.json': JSON.stringify({ workspaces: ['apps/web'] }), + 'apps/web/package.json': pkg({ vitest: '^2.0.0' }), + }); + expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-vitest' }); + }); +}); diff --git a/src/orchestrator/src/project-detect.ts b/src/orchestrator/src/project-detect.ts new file mode 100644 index 000000000..b745105e5 --- /dev/null +++ b/src/orchestrator/src/project-detect.ts @@ -0,0 +1,243 @@ +// Brownfield toolchain detection (FE-871): read the real repo and resolve it to +// a registry `ProfileId`, so cook can run a real project's tests without a human +// guessing the stack. This is the brownfield-only *front* of the FE-843 selection +// chain (`flag ≫ detected ≫ spec ≫ architect ≫ bun`); greenfield never detects +// (an empty worktree has nothing to read). +// +// Detection is evidence-first and deliberately conservative — the cheap +// "which lockfile/manifest is present" check, not a language-detection engine. +// One clear supported signal resolves; ambiguous evidence (two test runners) or +// no recognizable JS/TS toolchain returns an actionable `{detected:false}` reason +// rather than silently defaulting to bun — a wrong-but-silent toolchain produces +// unrunnable tests, the exact failure mode this closes. + +import { type Dirent, existsSync, readdirSync, readFileSync } from 'node:fs'; +import { join } from 'node:path'; + +import { PROFILE_IDS, type ProfileId } from './project-profile.js'; + +/** A successful detection names the profile and the evidence that selected it. */ +export type ProfileDetection = + | { detected: true; profile: ProfileId; evidence: string } + | { detected: false; reason: string }; + +function fileExists(dir: string, name: string): boolean { + return existsSync(join(dir, name)); +} + +/** Dependency names declared in a repo's package.json (deps + devDeps), or null if unreadable. */ +function readPackageJsonDeps(dir: string): Set | null { + const path = join(dir, 'package.json'); + if (!existsSync(path)) return null; + try { + const pkg = JSON.parse(readFileSync(path, 'utf8')) as { + dependencies?: Record; + devDependencies?: Record; + }; + return new Set([...Object.keys(pkg.dependencies ?? {}), ...Object.keys(pkg.devDependencies ?? {})]); + } catch { + // A present-but-malformed package.json is evidence of a JS project we can't + // read — treat it as a Node project with no detectable framework. + return new Set(); + } +} + +/** + * Workspace globs declared by a monorepo root — npm/yarn `workspaces` (array or + * `{ packages }`) or pnpm `pnpm-workspace.yaml`. Empty when the repo is not a + * declared monorepo. Scoping to *declared* workspaces (not every package.json on + * disk) keeps a stray nested project — a docs prototype, an example app — from + * poisoning runner detection. + */ +function readWorkspaceGlobs(repoDir: string): string[] { + const pkgPath = join(repoDir, 'package.json'); + if (existsSync(pkgPath)) { + try { + const pkg = JSON.parse(readFileSync(pkgPath, 'utf8')) as { + workspaces?: string[] | { packages?: string[] }; + }; + const ws = pkg.workspaces; + if (Array.isArray(ws)) return ws; + if (ws && Array.isArray(ws.packages)) return ws.packages; + } catch { + // Malformed root package.json: fall through to the pnpm manifest. + } + } + const pnpmPath = join(repoDir, 'pnpm-workspace.yaml'); + if (existsSync(pnpmPath)) { + try { + const globs: string[] = []; + for (const line of readFileSync(pnpmPath, 'utf8').split('\n')) { + const match = /^\s*-\s*['"]?([^'"#]+?)['"]?\s*$/.exec(line); + if (match) globs.push(match[1].trim()); + } + return globs; + } catch { + return []; + } + } + return []; +} + +/** + * Resolve a workspace glob to concrete package directories. Handles the two + * forms that cover virtually all real monorepos — a literal directory (`apps/web`) + * and a single-level wildcard (`packages/*`). Deeper/exotic globs are skipped: + * this is the cheap evidence check, not a glob engine. + */ +function resolveWorkspaceDirs(repoDir: string, glob: string): string[] { + const trimmed = glob.replace(/\/+$/, ''); + if (trimmed.endsWith('/*')) { + const base = trimmed.slice(0, -2); + try { + return readdirSync(join(repoDir, base), { withFileTypes: true }) + .filter((entry) => entry.isDirectory()) + .map((entry) => join(base, entry.name)); + } catch { + return []; + } + } + return trimmed.includes('*') ? [] : [trimmed]; +} + +/** Union of dependency names declared across a monorepo root's workspace packages. */ +function collectWorkspaceDeps(repoDir: string): Set { + const deps = new Set(); + for (const glob of readWorkspaceGlobs(repoDir)) { + for (const wsDir of resolveWorkspaceDirs(repoDir, glob)) { + const wsDeps = readPackageJsonDeps(join(repoDir, wsDir)); + if (wsDeps) for (const dep of wsDeps) deps.add(dep); + } + } + return deps; +} + +/** + * Detect the toolchain `ProfileId` for a repo by introspecting its manifests and + * lockfiles. Precedence is lockfile/config evidence first (most authoritative), + * then package.json dependencies, then a catch-all failure. `--profile` (handled + * upstream in the selection chain) always overrides this. + */ +export function detectProfile(repoDir: string): ProfileDetection { + // Bun: its lockfile is unambiguous evidence of the bun test runner. + if (fileExists(repoDir, 'bun.lockb')) return { detected: true, profile: 'bun', evidence: 'bun.lockb' }; + if (fileExists(repoDir, 'bun.lock')) return { detected: true, profile: 'bun', evidence: 'bun.lock' }; + + // Deno: config or lockfile. Checked before package.json because Deno repos may + // also carry a package.json for npm specifiers. + for (const name of ['deno.json', 'deno.jsonc', 'deno.lock']) { + if (fileExists(repoDir, name)) return { detected: true, profile: 'deno', evidence: name }; + } + + // Node/TypeScript: pick the runner from declared dependencies. + const rootDeps = readPackageJsonDeps(repoDir); + if (rootDeps !== null) { + // Root deps are most authoritative. Only when the root declares no runner do + // we widen to the monorepo's workspace packages — a monorepo root often holds + // just tooling while the runner lives in each package. A repo that already + // resolves at the root never pays the workspace scan and can't be made + // ambiguous by a workspace. + let deps = rootDeps; + let source = 'package.json'; + if (!rootDeps.has('vitest') && !rootDeps.has('jest')) { + const wsDeps = collectWorkspaceDeps(repoDir); + if (wsDeps.has('vitest') || wsDeps.has('jest')) { + deps = wsDeps; + source = 'workspace package.json'; + } + } + + const hasVitest = deps.has('vitest'); + const hasJest = deps.has('jest'); + // Two declared runners is genuinely ambiguous — picking one by check-order + // would silently run the wrong command. Fail loud and let `--profile` decide. + if (hasVitest && hasJest) { + return { + detected: false, + reason: `${source} declares both vitest and jest — ambiguous test runner. Pass --profile to pick node-vitest or node-jest.`, + }; + } + if (hasVitest) { + return { detected: true, profile: 'node-vitest', evidence: `${source} devDependency vitest` }; + } + if (hasJest) { + return { detected: true, profile: 'node-jest', evidence: `${source} devDependency jest` }; + } + // No third-party runner declared → the built-in node:test runner needs none. + return { + detected: true, + profile: 'node-test', + evidence: 'package.json with no test-framework dependency', + }; + } + + // No JS/TS evidence (could be a Python/Go/unknown repo — brunch only supports + // the registry's JS toolchains). Fail with an actionable reason rather than a + // silent default; the agent's bash can't substitute since the test runner reads + // the stamped profile with no agent in the loop. + return { + detected: false, + reason: `could not detect a supported toolchain in ${repoDir} (no package.json, deno config, or bun lockfile). Pass --profile to select one of: ${PROFILE_IDS.join(', ')}.`, + }; +} + +/** A test file the host runner already discovers; `.test.`/`.spec.` in js/ts/jsx/tsx. */ +const TEST_FILE_RE = /\.(test|spec)\.[cm]?[jt]sx?$/; + +/** Directories never worth walking for test-layout evidence. */ +const SKIP_DIRS = new Set(['node_modules', '.git', 'dist', 'build', '.brunch', '.next', 'coverage']); + +/** Bound the walk so a pathological tree can't stall plan emission. */ +const MAX_WALK_DEPTH = 8; + +/** + * Discover the top-level directory a brownfield repo already keeps its tests in, + * by sampling existing test files rather than parsing the host runner's config. + * + * A test runner's config (e.g. vitest's `test.include`) is executable TS/JS — + * there is no cheap, reliable way to read its globs statically. But the repo's + * *existing* test files are ground truth: whatever config the host uses already + * discovers and runs them, so co-locating cook's generated slice tests in the + * same top-level directory guarantees the same discovery covers them. This + * closes the brownfield failure where a profile's default `tests/{id}.test.ts` + * path falls outside a repo whose vitest `include` is narrowed to `src/**` + * (vitest then reports "No test files found" for an explicitly-named file). + * + * Returns the POSIX-relative directory tests cluster in (e.g. `'src'`, or + * `'packages/app/src'` in a monorepo), or `null` when the repo has no test files + * to learn from — cook then keeps the profile's default path. The *full* + * directory (not just the top segment) is returned so a monorepo whose runner + * include is rooted deep (e.g. a per-package `src` glob) still gets a covered + * path. + */ +export function detectTestDir(repoDir: string): string | null { + // Tally test files by their full directory relative to the repo root. Root + // tests use relDir '' so generated targets strip the profile's default tests/ + // prefix and stay at the repo root. Keys are POSIX paths so the emitted target + // matches profile conventions regardless of host separator. + const counts = new Map(); + + const walk = (dir: string, depth: number, relDir: string): void => { + if (depth > MAX_WALK_DEPTH) return; + let entries: Dirent[]; + try { + entries = readdirSync(dir, { withFileTypes: true }); + } catch { + return; + } + for (const entry of entries) { + if (entry.isDirectory()) { + if (SKIP_DIRS.has(entry.name) || entry.name.startsWith('.')) continue; + walk(join(dir, entry.name), depth + 1, relDir === '' ? entry.name : `${relDir}/${entry.name}`); + } else if (entry.isFile() && TEST_FILE_RE.test(entry.name)) { + counts.set(relDir, (counts.get(relDir) ?? 0) + 1); + } + } + }; + walk(repoDir, 0, ''); + + if (counts.size === 0) return null; + // Dominant directory wins; ties broken by name (shallower/earlier first) for + // determinism. + return [...counts].sort((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]))[0][0]; +} diff --git a/src/orchestrator/src/project-profile.test.ts b/src/orchestrator/src/project-profile.test.ts index b13f4d76c..63f4915e7 100644 --- a/src/orchestrator/src/project-profile.test.ts +++ b/src/orchestrator/src/project-profile.test.ts @@ -7,6 +7,7 @@ import { PROFILES, resolveToolchain, UnknownProfileError, + withTestDir, } from './project-profile.js'; describe('toolchain target shaping', () => { @@ -136,3 +137,30 @@ describe('parseProfileId', () => { expect(() => parseProfileId('toString')).toThrow(UnknownProfileError); }); }); + +describe('withTestDir relocates test targets while preserving the filename convention', () => { + it('moves a tests/-default profile into the detected directory', () => { + const relocated = withTestDir(PROFILES['node-vitest'].toolchain, 'src'); + expect(relocated.sliceTarget('req-180')).toBe('src/req-180.test.ts'); + expect(relocated.epicTarget('epic-1')).toBe('src/epic-1.integration.test.ts'); + }); + + it('relocates the root-co-located brunch profile into a directory', () => { + const relocated = withTestDir(brunchProfile.toolchain, 'src'); + expect(relocated.sliceTarget('req-180')).toBe('src/req-180.test.ts'); + }); + + it('strips a trailing slash from the directory', () => { + expect(withTestDir(bunProfile.toolchain, 'pkg/').sliceTarget('s1')).toBe('pkg/s1.test.ts'); + }); + + it('an empty or "." directory places tests at the repo root', () => { + expect(withTestDir(bunProfile.toolchain, '').sliceTarget('s1')).toBe('s1.test.ts'); + expect(withTestDir(bunProfile.toolchain, '.').sliceTarget('s1')).toBe('s1.test.ts'); + }); + + it('leaves the test command untouched (only the target path changes)', () => { + const relocated = withTestDir(PROFILES['node-vitest'].toolchain, 'src'); + expect(relocated.testCommand('src/x.test.ts')).toEqual(['npx', 'vitest', 'run', 'src/x.test.ts']); + }); +}); diff --git a/src/orchestrator/src/project-profile.ts b/src/orchestrator/src/project-profile.ts index 9d9573da6..7d88bbb31 100644 --- a/src/orchestrator/src/project-profile.ts +++ b/src/orchestrator/src/project-profile.ts @@ -130,3 +130,26 @@ export function resolveToolchain(profile?: ProfileId): Toolchain { } export const defaultToolchain: Toolchain = bunProfile.toolchain; + +/** + * Relocate a toolchain's test targets into `dir`, preserving the profile's + * filename convention (`{id}.test.ts`, `{id}.integration.test.ts`). Brownfield + * detection uses this to co-locate cook's generated tests in the directory the + * host repo already keeps its tests — see `detectTestDir`. A profile's default + * test directory (e.g. `tests/`) can fall outside a repo's narrowed runner + * include glob, making the chosen path unrunnable; relocating to the repo's own + * test directory keeps it discoverable. `dir` of `''`/`'.'` strips the prefix + * (tests at the repo root). + */ +export function withTestDir(toolchain: Toolchain, dir: string): Toolchain { + const cleaned = dir.replace(/\/+$/, ''); + const relocate = (target: string): string => { + const basename = target.slice(target.lastIndexOf('/') + 1); + return cleaned === '' || cleaned === '.' ? basename : `${cleaned}/${basename}`; + }; + return { + ...toolchain, + sliceTarget: (sliceId) => relocate(toolchain.sliceTarget(sliceId)), + epicTarget: (epicId) => relocate(toolchain.epicTarget(epicId)), + }; +} diff --git a/src/orchestrator/src/promote-run.test.ts b/src/orchestrator/src/promote-run.test.ts index 7d4342b1a..e79a719e7 100644 --- a/src/orchestrator/src/promote-run.test.ts +++ b/src/orchestrator/src/promote-run.test.ts @@ -5,7 +5,7 @@ import { join } from 'node:path'; import { afterEach, describe, expect, it } from 'vitest'; -import { promoteGreenfieldRun } from './promote-run.js'; +import { promoteBrownfieldRun, promoteGreenfieldRun } from './promote-run.js'; const dirs: string[] = []; const GIT_TEST_TIMEOUT_MS = 20_000; @@ -45,6 +45,23 @@ describe('promoteGreenfieldRun', () => { expect(result.branch.length).toBeGreaterThan(0); }); + it('captures the dependency manifest + lockfile in the promoted commit (reproducible tree)', () => { + // FE-872 acceptance 2 (greenfield): the cook agent installs deps via bash; + // promotion must capture the manifest + lockfile it produced so the promoted + // tree is reproducible — pinned as an invariant, not left incidental to the + // blanket copy. Asserted via `git ls-files` (tracked, not merely present). + const sandbox = makeSandbox(); + writeFileSync(join(sandbox, 'package.json'), '{"name":"cooked","devDependencies":{"vitest":"^3"}}\n'); + writeFileSync(join(sandbox, 'bun.lock'), '{ "lockfileVersion": 1 }\n'); + + const target = tmpTarget(); + promoteGreenfieldRun({ sandboxDir: sandbox, target, runId: 'r1', force: false }); + + const tracked = execFileSync('git', ['ls-files'], { cwd: target, encoding: 'utf8' }); + expect(tracked).toContain('package.json'); + expect(tracked).toContain('bun.lock'); + }); + it('refuses a non-empty target without --force', () => { const sandbox = makeSandbox(); const target = tmpTarget(); @@ -188,3 +205,153 @@ describe('promoteGreenfieldRun', () => { GIT_TEST_TIMEOUT_MS, ); }); + +describe('promoteBrownfieldRun', () => { + const id = ['-c', 'user.name=t', '-c', 'user.email=t@e']; + + // A user repo on `main` with a base commit, plus a cook/ branch at the + // same base (as `git worktree add -b cook/ … HEAD` would create). + function userRepo(): { dir: string; baseHead: string } { + const dir = mkdtempSync(join(tmpdir(), 'cook-userrepo-')); + dirs.push(dir); + execFileSync('git', ['init', '-q', '-b', 'main'], { cwd: dir }); + writeFileSync(join(dir, 'app.ts'), 'export const v = 1;\n'); + writeFileSync(join(dir, '.gitignore'), 'node_modules/\n'); + execFileSync('git', ['add', '.'], { cwd: dir }); + execFileSync('git', [...id, 'commit', '-q', '-m', 'base'], { cwd: dir }); + execFileSync('git', ['branch', 'cook/r1'], { cwd: dir }); + const baseHead = execFileSync('git', ['rev-parse', 'HEAD'], { cwd: dir, encoding: 'utf8' }).trim(); + return { dir, baseHead }; + } + + // The composed cook result: a full tree (base + the cook delta). + function composedTree(): string { + const d = mkdtempSync(join(tmpdir(), 'cook-composed-')); + dirs.push(d); + writeFileSync(join(d, 'app.ts'), 'export const v = 2;\n'); // modified + writeFileSync(join(d, 'feature.ts'), 'export const f = true;\n'); // added + writeFileSync(join(d, '.gitignore'), 'node_modules/\n'); + mkdirSync(join(d, 'node_modules')); + writeFileSync(join(d, 'node_modules', 'dep.js'), 'junk\n'); // gitignored — must not land + return d; + } + + it( + 'commits the composed tree onto cook/, leaving the active branch and working tree untouched', + () => { + const { dir, baseHead } = userRepo(); + const tree = composedTree(); + const branchesBefore = execFileSync('git', ['branch', '--list'], { cwd: dir, encoding: 'utf8' }); + + const result = promoteBrownfieldRun({ sourceDir: dir, sourceTreeDir: tree, runId: 'r1' }); + + // cook/r1 advanced by one commit on top of the base. + expect(result.branch).toBe('cook/r1'); + expect(result.commit).not.toBe(baseHead); + const parent = execFileSync('git', ['rev-parse', 'cook/r1^'], { cwd: dir, encoding: 'utf8' }).trim(); + expect(parent).toBe(baseHead); + + // The commit's tree carries the delta — and not the gitignored deps. + const files = execFileSync('git', ['ls-tree', '-r', '--name-only', 'cook/r1'], { + cwd: dir, + encoding: 'utf8', + }); + expect(files).toContain('feature.ts'); + expect(files).toContain('app.ts'); + expect(files).not.toContain('node_modules'); + const appAtCook = execFileSync('git', ['show', 'cook/r1:app.ts'], { cwd: dir, encoding: 'utf8' }); + expect(appAtCook).toContain('v = 2'); + + // The user's active branch (main), HEAD, working tree, and index are untouched. + expect(execFileSync('git', ['rev-parse', 'HEAD'], { cwd: dir, encoding: 'utf8' }).trim()).toBe( + baseHead, + ); + expect( + execFileSync('git', ['symbolic-ref', '--short', 'HEAD'], { cwd: dir, encoding: 'utf8' }).trim(), + ).toBe('main'); + expect(readFileSync(join(dir, 'app.ts'), 'utf8')).toContain('v = 1'); + expect(existsSync(join(dir, 'feature.ts'))).toBe(false); + expect(execFileSync('git', ['status', '--porcelain'], { cwd: dir, encoding: 'utf8' })).toBe(''); + // Only cook/r1 moved — no stray branches. + expect(execFileSync('git', ['branch', '--list'], { cwd: dir, encoding: 'utf8' })).toBe(branchesBefore); + }, + GIT_TEST_TIMEOUT_MS, + ); + + it('throws when the cook/ branch is absent (must be created by the worktree)', () => { + const { dir } = userRepo(); + const tree = composedTree(); + expect(() => promoteBrownfieldRun({ sourceDir: dir, sourceTreeDir: tree, runId: 'missing' })).toThrow( + /cook\/missing/, + ); + }); + + it( + 'works in the real linked-worktree topology — the live sandbox worktree is left to be discarded, the main checkout untouched', + () => { + // Mirror production: cook/r1 exists *because* a linked worktree checked it out. + const dir = mkdtempSync(join(tmpdir(), 'cook-userrepo-')); + dirs.push(dir); + execFileSync('git', ['init', '-q', '-b', 'main'], { cwd: dir }); + writeFileSync(join(dir, 'app.ts'), 'export const v = 1;\n'); + writeFileSync(join(dir, '.gitignore'), 'node_modules/\n'); + execFileSync('git', ['add', '.'], { cwd: dir }); + execFileSync('git', [...id, 'commit', '-q', '-m', 'base'], { cwd: dir }); + const wt = join(dir, 'wt'); + execFileSync('git', ['worktree', 'add', '-q', '-b', 'cook/r1', wt, 'HEAD'], { cwd: dir }); + const baseHead = execFileSync('git', ['rev-parse', 'HEAD'], { cwd: dir, encoding: 'utf8' }).trim(); + + const result = promoteBrownfieldRun({ sourceDir: dir, sourceTreeDir: composedTree(), runId: 'r1' }); + + // Only cook/r1 moved (one commit on the base). + expect(execFileSync('git', ['rev-parse', 'cook/r1^'], { cwd: dir, encoding: 'utf8' }).trim()).toBe( + baseHead, + ); + expect(execFileSync('git', ['show', 'cook/r1:app.ts'], { cwd: dir, encoding: 'utf8' })).toContain( + 'v = 2', + ); + // The main checkout is wholly untouched. + expect(execFileSync('git', ['rev-parse', 'HEAD'], { cwd: dir, encoding: 'utf8' }).trim()).toBe( + baseHead, + ); + expect( + execFileSync('git', ['symbolic-ref', '--short', 'HEAD'], { cwd: dir, encoding: 'utf8' }).trim(), + ).toBe('main'); + expect(readFileSync(join(dir, 'app.ts'), 'utf8')).toContain('v = 1'); + // tracked files untouched (the linked `wt/` dir is an expected untracked entry). + expect( + execFileSync('git', ['status', '--porcelain', '--untracked-files=no'], { + cwd: dir, + encoding: 'utf8', + }), + ).toBe(''); + expect(result.commit).not.toBe(baseHead); + }, + GIT_TEST_TIMEOUT_MS, + ); + + it('stages tracked deletions — a file removed in the composed tree is removed in the cook commit', () => { + const dir = mkdtempSync(join(tmpdir(), 'cook-userrepo-')); + dirs.push(dir); + execFileSync('git', ['init', '-q', '-b', 'main'], { cwd: dir }); + writeFileSync(join(dir, 'keep.ts'), 'keep\n'); + writeFileSync(join(dir, 'old.ts'), 'remove me\n'); + execFileSync('git', ['add', '.'], { cwd: dir }); + execFileSync('git', [...id, 'commit', '-q', '-m', 'base'], { cwd: dir }); + execFileSync('git', ['branch', 'cook/r1'], { cwd: dir }); + + // Composed tree drops old.ts. + const tree = mkdtempSync(join(tmpdir(), 'cook-composed-')); + dirs.push(tree); + writeFileSync(join(tree, 'keep.ts'), 'keep\n'); + + promoteBrownfieldRun({ sourceDir: dir, sourceTreeDir: tree, runId: 'r1' }); + + const files = execFileSync('git', ['ls-tree', '-r', '--name-only', 'cook/r1'], { + cwd: dir, + encoding: 'utf8', + }); + expect(files).toContain('keep.ts'); + expect(files).not.toContain('old.ts'); + }); +}); diff --git a/src/orchestrator/src/promote-run.ts b/src/orchestrator/src/promote-run.ts index 3b1124d9e..fa6d410a4 100644 --- a/src/orchestrator/src/promote-run.ts +++ b/src/orchestrator/src/promote-run.ts @@ -1,6 +1,7 @@ import { execFileSync } from 'node:child_process'; -import { cpSync, existsSync, mkdirSync, readdirSync, realpathSync } from 'node:fs'; -import { basename, isAbsolute, relative, resolve } from 'node:path'; +import { cpSync, existsSync, mkdirSync, mkdtempSync, readdirSync, realpathSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { basename, isAbsolute, join, relative, resolve } from 'node:path'; export type PromoteResult = { target: string; branch: string; commit: string }; @@ -11,8 +12,16 @@ export type PromoteOptions = { force: boolean; }; -function git(args: string[], cwd: string): string { - return execFileSync('git', args, { cwd, encoding: 'utf8', stdio: ['ignore', 'pipe', 'pipe'] }).trim(); +export type BrownfieldPromoteOptions = { + /** The user's repo root the brownfield cook ran against (a worktree of it). */ + sourceDir: string; + /** The composed final tree to land (from `promotionSourceDir`). */ + sourceTreeDir: string; + runId: string; +}; + +function git(args: string[], cwd: string, env?: NodeJS.ProcessEnv): string { + return execFileSync('git', args, { cwd, env, encoding: 'utf8', stdio: ['ignore', 'pipe', 'pipe'] }).trim(); } // Deterministic committer so promotion never depends on (or mutates) global git config. @@ -110,3 +119,65 @@ export function promoteGreenfieldRun(opts: PromoteOptions): PromoteResult { const commit = git(['rev-parse', 'HEAD'], target); return { target, branch, commit }; } + +/** + * Land a completed *brownfield* run's composed tree onto the `cook/` + * branch of the user's repo as one reviewable commit — the brownfield analogue + * of `promoteGreenfieldRun`. The brownfield sandbox was created with + * `git worktree add -b cook/ … HEAD`, so the branch already exists at the + * base the run started from; this commits the result on top of it via plumbing + * (`commit-tree` + compare-and-swap `update-ref`) using a throwaway index and an + * external work-tree, so the user's real working tree, index, and active branch + * are never touched. Merging `cook/` into the working branch stays the + * user's call — promotion never freelances into it. + */ +export function promoteBrownfieldRun(opts: BrownfieldPromoteOptions): PromoteResult { + const sourceDir = resolve(opts.sourceDir); + const sourceTreeDir = resolve(opts.sourceTreeDir); + const branch = `cook/${opts.runId}`; + const ref = `refs/heads/${branch}`; + + // The branch must already exist (the sandbox branched it from HEAD); its tip is + // the parent we commit on top of and the CAS expected-value for update-ref. + let parent: string; + try { + parent = git(['rev-parse', '--verify', ref], sourceDir); + } catch { + throw new Error( + `Brownfield promotion expects an existing ${branch} branch in ${sourceDir} (created by the cook worktree).`, + ); + } + + // Absolute git dir so a throwaway index + external work-tree can target the + // user's object store without depending on cwd. + const gitDir = resolve(sourceDir, git(['rev-parse', '--git-dir'], sourceDir)); + const tmp = mkdtempSync(join(tmpdir(), 'brunch-promote-')); + const env: NodeJS.ProcessEnv = { ...process.env, GIT_INDEX_FILE: join(tmp, 'index') }; + const plumb = ['--git-dir', gitDir, '--work-tree', sourceTreeDir]; + try { + // Seed the index from the base, then stage the composed tree as the delta — + // adds, modifications, and deletions, all relative to the base commit. + git([...plumb, 'read-tree', parent], sourceDir, env); + git([...plumb, 'add', '-A'], sourceDir, env); + const tree = git(['--git-dir', gitDir, 'write-tree'], sourceDir, env); + const commit = git( + [ + ...COMMIT_IDENTITY, + '--git-dir', + gitDir, + 'commit-tree', + tree, + '-p', + parent, + '-m', + `cook: ${opts.runId}`, + ], + sourceDir, + env, + ); + git(['--git-dir', gitDir, 'update-ref', ref, commit, parent], sourceDir, env); + return { target: sourceDir, branch, commit }; + } finally { + rmSync(tmp, { recursive: true, force: true }); + } +} diff --git a/src/orchestrator/src/test-runner.test.ts b/src/orchestrator/src/test-runner.test.ts index 9d28e1fe9..c6f5fef46 100644 --- a/src/orchestrator/src/test-runner.test.ts +++ b/src/orchestrator/src/test-runner.test.ts @@ -14,7 +14,8 @@ import { join } from 'node:path'; import { afterEach, describe, expect, it } from 'vitest'; import { bunProfile, type Toolchain } from './project-profile.js'; -import { ToolchainTestRunner } from './test-runner.js'; +import { classifyTestFailure, runVerification, ToolchainTestRunner } from './test-runner.js'; +import type { TestResult, TestRunner } from './types.js'; const bun = bunProfile.toolchain; @@ -83,3 +84,132 @@ describe('ToolchainTestRunner honors the toolchain test command', () => { expect(failed.passed).toBe(false); }); }); + +describe('classifyTestFailure (infra vs test)', () => { + it('a spawn failure (missing runner binary) is infra', () => { + expect(classifyTestFailure('', true)).toBe('infra'); + }); + + it('a shell "command not found" is infra even with a normal exit', () => { + expect(classifyTestFailure('sh: 1: vitest: command not found', false)).toBe('infra'); + expect(classifyTestFailure("'jest' is not recognized as an internal or external command", false)).toBe( + 'infra', + ); + }); + + it('an assertion failure with no toolchain signal is a test failure', () => { + expect(classifyTestFailure('expect(received).toBe(expected)\n\n1 fail', false)).toBe('test'); + }); + + it('a missing *module* stays a test failure (ambiguous with TDD red), not infra', () => { + // A red test importing source that does not exist yet must not be mislabeled + // infra and skipped. + expect(classifyTestFailure("Cannot find module './widget' from 'widget.test.ts'", false)).toBe('test'); + }); +}); + +describe('ToolchainTestRunner stamps failureKind', () => { + function fakeToolchain(testCommand: (target: string) => string[]): Toolchain { + return { + sliceTarget: (id) => id, + epicTarget: (id) => id, + testCommand, + testConventions: 'fake', + }; + } + + it('a missing runner binary surfaces as failureKind "infra"', async () => { + const missing = fakeToolchain(() => ['definitely-not-a-real-binary-xyz', 'arg']); + const result = await new ToolchainTestRunner(missing).run('x', process.cwd()); + expect(result.passed).toBe(false); + expect(result.failureKind).toBe('infra'); + }); + + it('an assertion failure surfaces as failureKind "test"', async () => { + const fail = fakeToolchain(() => ['node', '-e', 'process.exit(1)']); + const result = await new ToolchainTestRunner(fail).run('x', process.cwd()); + expect(result.passed).toBe(false); + expect(result.failureKind).toBe('test'); + }); + + it('a runner output cap error is still a test failure, not missing-toolchain infra', async () => { + const noisy = fakeToolchain(() => [ + process.execPath, + '-e', + 'process.stdout.write("x".repeat(2 * 1024 * 1024)); process.exit(1);', + ]); + const result = await new ToolchainTestRunner(noisy).run('x', process.cwd()); + expect(result.passed).toBe(false); + expect(result.failureKind).toBe('test'); + }); + + it('a passing run carries no failureKind', async () => { + const pass = fakeToolchain(() => ['node', '-e', 'process.exit(0)']); + const result = await new ToolchainTestRunner(pass).run('x', process.cwd()); + expect(result.passed).toBe(true); + expect(result.failureKind).toBeUndefined(); + }); +}); + +describe('runVerification — the single verdict + aggregate seam', () => { + // Replays a fixed sequence of results across targets so the verdict and the + // infra-dominates aggregate can be pinned without spawning real runners. + function seqRunner(results: readonly TestResult[]): TestRunner { + let i = 0; + return { + async run() { + return results[i++ % results.length]!; + }, + }; + } + + it('done only when ≥1 target exists and every target passes', async () => { + const { done, failureKind } = await runVerification( + [{ target: 'a' }, { target: 'b' }], + seqRunner([{ passed: true, output: 'ok' }]), + '/tmp', + ); + expect(done).toBe(true); + expect(failureKind).toBeUndefined(); + }); + + it('not done with zero targets (nothing proves it)', async () => { + const { done, results } = await runVerification([], seqRunner([{ passed: true, output: 'ok' }]), '/tmp'); + expect(done).toBe(false); + expect(results).toEqual([]); + }); + + it('a plain assertion failure aggregates to "test"', async () => { + const { done, failureKind } = await runVerification( + [{ target: 'a' }], + seqRunner([{ passed: false, output: 'FAIL', failureKind: 'test' }]), + '/tmp', + ); + expect(done).toBe(false); + expect(failureKind).toBe('test'); + }); + + it('infra dominates: one infra failure makes the whole verdict infra', async () => { + const { done, failureKind } = await runVerification( + [{ target: 'a' }, { target: 'b' }], + seqRunner([ + { passed: false, output: 'assert', failureKind: 'test' }, + { passed: false, output: 'no runner', failureKind: 'infra' }, + ]), + '/tmp', + ); + expect(done).toBe(false); + expect(failureKind).toBe('infra'); + }); + + it('a runner that throws is treated as an infra failure, not a swallowed pass', async () => { + const throwing: TestRunner = { + async run() { + throw new Error('runner blew up'); + }, + }; + const { done, failureKind } = await runVerification([{ target: 'x' }], throwing, '/tmp'); + expect(done).toBe(false); + expect(failureKind).toBe('infra'); + }); +}); diff --git a/src/orchestrator/src/test-runner.ts b/src/orchestrator/src/test-runner.ts index 634c5d48d..e450de019 100644 --- a/src/orchestrator/src/test-runner.ts +++ b/src/orchestrator/src/test-runner.ts @@ -1,7 +1,35 @@ import { spawnSync } from 'node:child_process'; import { defaultToolchain, type Toolchain } from './project-profile.js'; -import type { TestResult, TestRunner } from './types.js'; +import type { + TestFailureKind, + TestResult, + TestRunner, + VerificationOutcome, + VerificationResult, +} from './types.js'; + +// Shell-reported "the runner binary doesn't exist" — the cross-platform spawn +// `error` (ENOENT) is the primary signal; these catch the case where a shell +// wrapper swallows that into a normal non-zero exit instead. +const RUNNER_MISSING_PATTERNS: readonly RegExp[] = [ + /command not found/i, + /is not recognized as an internal or external command/i, +]; + +/** + * Classify a **failed** test run as `infra` (the toolchain broke) vs `test` (the + * code failed its assertions). Deliberately conservative: only an unambiguous + * "the runner itself isn't there" signal counts as infra — a spawn failure + * (missing binary) or a shell "command not found". Everything else is `test`, + * because a missing *module* is ambiguous with a legitimate TDD red (a test + * importing source that doesn't exist yet), and misrouting a real failure as + * "infra noise" would silently skip it. + */ +export function classifyTestFailure(output: string, spawnFailed: boolean): TestFailureKind { + if (spawnFailed) return 'infra'; + return RUNNER_MISSING_PATTERNS.some((re) => re.test(output)) ? 'infra' : 'test'; +} export class ToolchainTestRunner implements TestRunner { constructor(private readonly toolchain: Toolchain = defaultToolchain) {} @@ -20,6 +48,43 @@ export class ToolchainTestRunner implements TestRunner { const output = [result.stdout, result.stderr, result.error ? String(result.error) : ''] .filter(Boolean) .join(''); - return { passed: result.status === 0, output }; + const passed = result.status === 0; + if (passed) return { passed, output }; + // `spawnSync.error` also covers timeout / ENOBUFS after the runner started; + // only ENOENT proves the runner binary itself is missing. + const runnerMissing = result.error != null && (result.error as NodeJS.ErrnoException).code === 'ENOENT'; + return { passed, output, failureKind: classifyTestFailure(output, runnerMissing) }; + } +} + +/** + * The single verification seam: run every target through one `TestRunner` and + * fold the per-target results into one verdict. This is the one place the + * "≥1 target and all pass" oracle rule and the infra-dominates aggregate live, + * so `evaluate-done`, `verify-epic`, and the net `run-tests` handler can't drift + * apart (they each used to re-implement this). A runner that throws is treated + * as an `infra` failure — a harness fault, not a code assertion. + */ +export async function runVerification( + targets: readonly { target: string }[], + runner: TestRunner, + sandboxDir: string, +): Promise { + const results: VerificationResult[] = []; + for (const t of targets) { + try { + results.push({ target: t.target, ...(await runner.run(t.target, sandboxDir)) }); + } catch (err) { + results.push({ target: t.target, passed: false, output: String(err), failureKind: 'infra' }); + } } + const done = results.length > 0 && results.every((r) => r.passed); + // infra (toolchain broke) dominates a plain test failure — if anything failed + // to even run, that's the actionable signal. Undefined when the verdict passed. + const failureKind: TestFailureKind | undefined = done + ? undefined + : results.some((r) => r.failureKind === 'infra') + ? 'infra' + : 'test'; + return { done, failureKind, results }; } diff --git a/src/orchestrator/src/types.ts b/src/orchestrator/src/types.ts index fa52802f1..e74eb851c 100644 --- a/src/orchestrator/src/types.ts +++ b/src/orchestrator/src/types.ts @@ -14,8 +14,44 @@ export type Epic = { summary: string; depends_on: string[]; verification: Verification[]; + /** + * Integration-oracle (FE-876) reachability target — a *concrete* probe + * (boot argv + paths). When present it is used directly; this is the Half-A + * path (fixtures / explicit targets). `not-reachable` is the FE-800 orphan + * (code merged but never wired into the running app). Absent + no + * `reachability` → unit-test verdict only (unchanged behavior). + */ + probe?: ProbeTarget; + /** + * Integration-oracle (FE-876) Half B — host-blind reachability *intent* the + * architect emits (D160-K: planning stays host-blind). Cook-time grounding + * resolves it into a concrete `ProbeTarget` by reading the worktree, via the + * injected `ProbeGrounder` (the dispatch-seam piece). `probe` takes precedence + * when both are set; intent without an injected grounder is a no-op (the + * grounder lands with the pi-harness contract). + */ + reachability?: ReachabilityIntent; }; +/** + * A host-blind description of what must be reachable once the feature is wired, + * e.g. "the GET /health endpoint returns 200 and the new feature route + * responds". The architect emits this without knowing the boot command or port; + * cook-time grounding turns it into a concrete `ProbeTarget`. + */ +export type ReachabilityIntent = { + feature: string; +}; + +/** + * Cook-time grounding seam (FE-876 Half B, dispatch seam): resolve a host-blind + * `ReachabilityIntent` into a concrete `ProbeTarget` by reading the merged + * worktree. Injected into `createPiActions` so the agent dispatch is swappable + * and tests can stub it; the production implementation (an `execute`-mode agent + * that reads the worktree) lands with the pi-harness contract. + */ +export type ProbeGrounder = (intent: ReachabilityIntent, sandboxDir: string) => Promise; + export type Slice = { id: string; epic_id: string; @@ -95,15 +131,101 @@ export type ActionHandlers = Record; // Test runner — deterministic, orchestrator-owned // --------------------------------------------------------------------------- +/** + * Why a failed test run failed. `infra` = the toolchain itself broke (the test + * runner binary is missing / deps never installed) — a different fix than `test` + * = the code under test failed its assertions. Distinguishing them stops the + * cook loop from sending the code-writer to "fix the code" when nothing was ever + * installed (`TestResult.passed` alone collapsed both into one failure). + */ +export type TestFailureKind = 'infra' | 'test'; + export type TestResult = { passed: boolean; output: string; + /** Set only when `passed` is false; classifies the failure. */ + failureKind?: TestFailureKind; }; export interface TestRunner { run(target: string, sandboxDir: string): Promise; } +/** One verification target's outcome: its id plus the runner's `TestResult`. */ +export type VerificationResult = { target: string } & TestResult; + +/** + * The verdict over a set of verification targets. `done` is the single oracle + * rule — at least one target and every target passing (no requisite variety + * otherwise). `failureKind` is the aggregate over the failed targets: `infra` + * (the toolchain broke) dominates a plain `test` failure, because a run that + * never executed is the actionable signal. Undefined when `done`. + */ +export type VerificationOutcome = { + done: boolean; + failureKind?: TestFailureKind; + results: VerificationResult[]; +}; + +// --------------------------------------------------------------------------- +// App runtime probe — real *app* execution, the analogue of test execution +// --------------------------------------------------------------------------- + +/** + * The verdict of booting the host app and exercising one feature endpoint: + * - `reachable` — the app answered the feature probe (wired into the running app) + * - `not-reachable` — the app booted but the feature endpoint is absent (the + * FE-800 orphan: a module that exists but isn't wired in) + * - `infra` — the app never booted / never became ready (a different + * fix than "feature absent", mirroring `TestFailureKind`) + */ +export type ProbeOutcomeKind = 'reachable' | 'not-reachable' | 'infra'; + +/** + * What the probe needs to boot + exercise an app. The boot argv and URLs are + * **inputs** (later supplied by cook-time grounding), not a per-stack boot + * engine — the harness owns the deterministic check, the boot mechanics may + * lean on the agent's `bash`. + */ +export type ProbeSpec = { + /** Argv that boots the app in the sandbox (e.g. `['node','server.js']`). */ + boot: readonly string[]; + /** URL polled until the app accepts connections (any HTTP response = ready). */ + readyUrl: string; + /** URL whose response decides feature reachability. */ + featureUrl: string; + /** Extra env for the boot process (e.g. a chosen `PORT`). */ + env?: Record; +}; + +/** + * The harness-resolvable shape of a probe: boot argv + the *paths* to poll and + * exercise, before a concrete port is bound. `buildProbeSpec` turns this into a + * `ProbeSpec` by allocating a free port — the deterministic, harness-owned piece + * (a hardcoded port collides under parallel cook). Cook-time grounding later + * supplies the argv + paths; the harness never guesses them. + */ +export type ProbeTarget = { + /** Argv that boots the app in the sandbox (e.g. `['node','server.js']`). */ + boot: readonly string[]; + /** Path polled until the app accepts connections (e.g. `/health`). */ + readyPath: string; + /** Path whose response decides feature reachability (e.g. `/feature`). */ + featurePath: string; + /** Extra env for the boot process; the allocated `PORT` is added on top. */ + env?: Record; +}; + +export type ProbeResult = { + kind: ProbeOutcomeKind; + /** Convenience: `kind === 'reachable'`. */ + reachable: boolean; + /** HTTP status of the feature probe, when the app answered. */ + status?: number; + /** Boot output + diagnostics, for the run report. */ + output: string; +}; + // --------------------------------------------------------------------------- // Orchestrator seam // --------------------------------------------------------------------------- diff --git a/src/server/cli.ts b/src/server/cli.ts index 0a7d3f418..00f133fca 100644 --- a/src/server/cli.ts +++ b/src/server/cli.ts @@ -20,6 +20,45 @@ const launchCwd = process.env.BRUNCH_LAUNCH_CWD || process.cwd(); loadLocalEnvFile(launchCwd); +/** + * Shared completed-spec gate for the spec-driven commands (`plan`, `serve`): + * parse → open the project DB → assert the spec exists and is planning-ready → + * run the command body → always close the DB. Parsing is passed as a thunk so a + * parse error is reported through the same `Failed to run brunch ` + * channel and exit code as the spec/DB errors. Keeps the two commands from + * drifting on the gate while leaving each command's parsing and body its own. + */ +async function withCompletedSpec( + command: string, + parse: () => O, + run: ( + opts: O, + ctx: { + project: ReturnType; + snapshot: ReturnType; + }, + ) => Promise, +): Promise { + let db: ReturnType | undefined; + try { + const opts = parse(); + const project = resolveBrunchProject(launchCwd); + db = createDb(project.dbPath); + if (!getSpecification(db, opts.specificationId)) { + throw new Error(`specification ${opts.specificationId} not found`); + } + const snapshot = buildCompletedSpecSnapshot(db, opts.specificationId); + assertCompletedSpecReadyForPlanning(db, opts.specificationId, snapshot); + await run(opts, { project, snapshot }); + } catch (error) { + const message = error instanceof Error ? error.message : String(error); + console.error(`Failed to run brunch ${command}: ${message}`); + process.exit(1); + } finally { + db?.$client.close(); + } +} + if (rawArgs[0] === '--version' || rawArgs[0] === '-V') { const pkgPath = join(dirname(fileURLToPath(import.meta.url)), '../../package.json'); const { version } = JSON.parse(readFileSync(pkgPath, 'utf8')) as { version: string }; @@ -38,6 +77,9 @@ if (args.has('--help') || args.has('-h') || args.has('help')) { console.log( ' plan [flags] Emit .brunch/cook/specs//plan.yaml from a completed specification.', ); + console.log( + ' serve [flags] One shot: plan then cook a completed specification (no manual steps).', + ); console.log(''); console.log('Environment:'); console.log(' ANTHROPIC_API_KEY Required. Brunch will not start without it; it powers the'); @@ -98,37 +140,63 @@ exitIfAnthropicApiKeyMissing(); if (rawArgs[0] === 'cook') { const { parseCookArgs, runCook } = await import('../orchestrator/src/cook-cli.js'); + const { withCookBus } = await import('../orchestrator/src/presenter.js'); const opts = parseCookArgs(rawArgs.slice(1)); - runCook(opts).catch((error) => { + // withCookBus disposes the bus (unmounts the Ink app) in finally so the TTY run exits. + await withCookBus('cook', (bus) => runCook(opts, bus)).catch((error) => { console.error('Failed to run brunch cook:', error); process.exit(1); }); +} else if (rawArgs[0] === 'serve') { + const { runPlan } = await import('./plan-runner.js'); + const { runCook } = await import('../orchestrator/src/cook-cli.js'); + const { parseServeArgs, runServe } = await import('./serve-runner.js'); + const { withCookBus } = await import('../orchestrator/src/presenter.js'); + await withCookBus('serve', (bus) => + withCompletedSpec( + 'serve', + () => parseServeArgs(rawArgs.slice(1)), + async (opts, { project, snapshot }) => { + // Cook runs against the same dir the plan was written to (launchCwd); see + // serveCookOptions — runCook reads opts.dir raw, so serve must thread it. + await runServe(opts, launchCwd, { + plan: () => + runPlan({ + specificationId: opts.specificationId, + snapshot, + outDir: launchCwd, + verbose: opts.verbose, + profile: opts.profile, + // Brownfield detection reads the launch cwd (the user's repo); greenfield ignores it. + repoDir: project.cwd, + bus, + }), + cook: (cookOpts) => runCook(cookOpts, bus), + }); + }, + ), + ); } else if (rawArgs[0] === 'plan') { const { parsePlanArgs, runPlan } = await import('./plan-runner.js'); - let db: ReturnType | undefined; - try { - const opts = parsePlanArgs(rawArgs.slice(1), launchCwd); - const project = resolveBrunchProject(launchCwd); - db = createDb(project.dbPath); - if (!getSpecification(db, opts.specificationId)) { - throw new Error(`specification ${opts.specificationId} not found`); - } - const snapshot = buildCompletedSpecSnapshot(db, opts.specificationId); - assertCompletedSpecReadyForPlanning(db, opts.specificationId, snapshot); - await runPlan({ - specificationId: opts.specificationId, - snapshot, - outDir: opts.outDir, - verbose: opts.verbose, - profile: opts.profile, - }); - } catch (error) { - const message = error instanceof Error ? error.message : String(error); - console.error(`Failed to run brunch plan: ${message}`); - process.exit(1); - } finally { - db?.$client.close(); - } + const { withCookBus } = await import('../orchestrator/src/presenter.js'); + await withCookBus('plan', (bus) => + withCompletedSpec( + 'plan', + () => parsePlanArgs(rawArgs.slice(1), launchCwd), + async (opts, { project, snapshot }) => { + await runPlan({ + specificationId: opts.specificationId, + snapshot, + outDir: opts.outDir, + verbose: opts.verbose, + profile: opts.profile, + // Brownfield detection reads the launch cwd (the user's repo); greenfield ignores it. + repoDir: project.cwd, + bus, + }); + }, + ), + ); } else if (rawArgs[0] === 'agent') { const project = resolveBrunchProject(launchCwd); const db = createDb(project.dbPath); diff --git a/src/server/plan-runner.test.ts b/src/server/plan-runner.test.ts index 1c75341b4..d4344947d 100644 --- a/src/server/plan-runner.test.ts +++ b/src/server/plan-runner.test.ts @@ -15,9 +15,19 @@ import { parse as parseYaml } from 'yaml'; import type { ArchitectDraft, RunModel } from '../orchestrator/src/plan-architect.js'; import type { CompletedSpecSnapshot } from '../orchestrator/src/plan-projection.js'; +import { CookBus } from '../orchestrator/src/presenter/bus.js'; +import { PlainPresenter } from '../orchestrator/src/presenter/plain.js'; import type { Plan } from '../orchestrator/src/types.js'; import { parsePlanArgs, runPlan } from './plan-runner.js'; +/** A bus wired to a capturing PlainPresenter — the golden stderr stream. */ +function captureBus(): { bus: CookBus; lines: string[] } { + const lines: string[] = []; + const bus = new CookBus(); + bus.subscribe(new PlainPresenter({ log: (line) => lines.push(line) })); + return { bus, lines }; +} + describe('parsePlanArgs', () => { it('parses , --out=, --verbose', () => { const opts = parsePlanArgs(['2', '--out=/tmp/x', '--verbose']); @@ -130,7 +140,7 @@ describe('runPlan', () => { it('writes .brunch/cook/plan.yaml and hides synthesis events at default verbosity', async () => { const { snapshot, dir, runModel } = makeRunWithCycle(); - const stderrLines: string[] = []; + const { bus, lines: stderrLines } = captureBus(); await runPlan({ specificationId: 2, @@ -138,7 +148,7 @@ describe('runPlan', () => { outDir: dir, verbose: false, runModel, - log: (line) => stderrLines.push(line), + bus, }); const planPath = join(dir, '.brunch', 'cook', 'specs', '2', 'plan.yaml'); @@ -163,7 +173,7 @@ describe('runPlan', () => { verbose: false, profile: 'node-vitest', runModel, - log: () => {}, + bus: new CookBus(), }); const planPath = join(dir, '.brunch', 'cook', 'specs', '2', 'plan.yaml'); @@ -173,7 +183,7 @@ describe('runPlan', () => { it('shows synthesis events when --verbose is set', async () => { const { snapshot, dir, runModel } = makeRunWithCycle(); - const stderrLines: string[] = []; + const { bus, lines: stderrLines } = captureBus(); await runPlan({ specificationId: 2, @@ -181,7 +191,7 @@ describe('runPlan', () => { outDir: dir, verbose: true, runModel, - log: (line) => stderrLines.push(line), + bus, }); expect(stderrLines.some((line) => line.includes('cycle-break-dropped-edge'))).toBe(true); @@ -200,7 +210,7 @@ describe('runPlan', () => { const runModel: RunModel = async () => { throw new Error('llm-boom'); }; - const stderrLines: string[] = []; + const { bus, lines: stderrLines } = captureBus(); await runPlan({ specificationId: 2, @@ -208,7 +218,7 @@ describe('runPlan', () => { outDir: dir, verbose: false, runModel, - log: (line) => stderrLines.push(line), + bus, }); const planPath = join(dir, '.brunch', 'cook', 'specs', '2', 'plan.yaml'); diff --git a/src/server/plan-runner.ts b/src/server/plan-runner.ts index ef05e46bf..5cf8732d1 100644 --- a/src/server/plan-runner.ts +++ b/src/server/plan-runner.ts @@ -22,6 +22,7 @@ import { type EmitterWarning, } from '../orchestrator/src/plan-emitter.js'; import type { CompletedSpecSnapshot } from '../orchestrator/src/plan-projection.js'; +import type { CookBus } from '../orchestrator/src/presenter/bus.js'; import { parseProfileId, type ProfileId } from '../orchestrator/src/project-profile.js'; import { parseSpecId, specPlanPath } from '../orchestrator/src/spec-plan-paths.js'; @@ -73,25 +74,26 @@ export type RunPlanArgs = { verbose: boolean; /** Toolchain profile override (`--profile`); wins over the spec's profile. */ profile?: ProfileId; + /** + * Project directory the toolchain is detected from for brownfield plans + * (`brunch-detect`). The CLI passes the launch cwd; greenfield ignores it. + */ + repoDir?: string; /** Injectable LLM seam. Defaults to the production anthropic adapter via the emitter. */ runModel?: RunModel; - /** Injectable stderr writer. Defaults to `console.error`. */ - log?: (line: string) => void; + /** Presentation boundary. The orchestrator emits CookEvents; a presenter renders them. */ + bus: CookBus; }; export async function runPlan(args: RunPlanArgs): Promise { - const log = args.log ?? ((line: string) => console.error(line)); + const { bus } = args; - log(''); - log(' brunch plan'); - log(' ──────────────────────────────────────'); - log(` spec ${args.specificationId}`); - log(` out ${args.outDir}`); - log(''); + bus.emit({ kind: 'plan-start', specId: args.specificationId, outDir: args.outDir }); const result = await emitPlanFromSnapshot(args.snapshot, { ...(args.runModel ? { runModel: args.runModel } : {}), ...(args.profile ? { profile: args.profile } : {}), + ...(args.repoDir ? { repoDir: args.repoDir } : {}), }); // Spec-scoped output path. Each spec gets its own subdir so multiple @@ -104,21 +106,18 @@ export async function runPlan(args: RunPlanArgs): Promise { mkdirSync(dirname(planPath), { recursive: true }); writeFileSync(planPath, stringifyYaml(result.plan)); - log(` ✓ plan ${planPath}`); - log(` ${result.plan.epics.length} epics, ${result.plan.slices.length} slices`); - log(''); + bus.emit({ + kind: 'plan-written', + path: planPath, + epics: result.plan.epics.length, + slices: result.plan.slices.length, + }); // Audit-weight display: failure + transformation always; synthesis // only when --verbose. The header counts only what we print so the // number on screen matches the lines below it. const printed = result.warnings.filter((warning) => shouldPrint(warning, args.verbose)); - if (printed.length > 0) { - log(` ${printed.length} warnings:`); - for (const warning of printed) { - log(` ! ${formatEmitterWarning(warning)}`); - } - log(''); - } + bus.emit({ kind: 'plan-warnings', messages: printed.map((warning) => formatEmitterWarning(warning)) }); } function shouldPrint(warning: EmitterWarning, verbose: boolean): boolean { diff --git a/src/server/serve-runner.test.ts b/src/server/serve-runner.test.ts new file mode 100644 index 000000000..5df7d900f --- /dev/null +++ b/src/server/serve-runner.test.ts @@ -0,0 +1,133 @@ +import { resolve } from 'node:path'; + +import { describe, expect, it } from 'vitest'; + +import type { CookOptions } from '../orchestrator/src/cook-cli.js'; +import { parseServeArgs, runServe, serveCookOptions } from './serve-runner.js'; + +describe('parseServeArgs', () => { + it('requires a positive integer specId', () => { + expect(() => parseServeArgs([])).toThrow(/Missing /); + expect(() => parseServeArgs(['0'])).toThrow(/positive integer/); + expect(() => parseServeArgs(['abc'])).toThrow(/positive integer/); + expect(parseServeArgs(['7']).specificationId).toBe(7); + }); + + it('maps the flags it owns and rejects unknown ones', () => { + const opts = parseServeArgs([ + '12', + '--out=dist', + '--force', + '--profile=node-vitest', + '--policy=parallel', + '--max-retries=5', + '--petrinaut-stream', + '--petrinaut-url=https://x/brunch', + '--petrinaut-lanes=mechanical', + '--petrinaut-fold=color', + '--no-petrinaut-open', + '--verbose', + ]); + expect(opts).toMatchObject({ + specificationId: 12, + outDir: 'dist', + force: true, + profile: 'node-vitest', + policy: 'parallel', + maxRetries: 5, + petrinautStream: true, + petrinautUrl: 'https://x/brunch', + petrinautLanes: 'mechanical', + petrinautFold: 'color', + petrinautOpen: false, + verbose: true, + }); + expect(() => parseServeArgs(['1', '--nope'])).toThrow(/Unknown flag/); + expect(() => parseServeArgs(['1', '2'])).toThrow(/Unexpected positional/); + }); + + it('rejects petrinaut companion flags unless streaming is enabled', () => { + expect(() => parseServeArgs(['1', '--petrinaut-url=https://x/brunch'])).toThrow( + /--petrinaut-url requires --petrinaut-stream/, + ); + expect(() => parseServeArgs(['1', '--no-petrinaut-open'])).toThrow( + /--no-petrinaut-open requires --petrinaut-stream/, + ); + }); + + it('defaults the optional flags', () => { + const opts = parseServeArgs(['3']); + expect(opts).toMatchObject({ + outDir: undefined, + force: false, + profile: undefined, + policy: 'serial', + maxRetries: 3, + petrinautStream: false, + petrinautOpen: true, + verbose: false, + }); + }); +}); + +describe('serveCookOptions', () => { + it('sets specId so cook reads the just-emitted plan, and forwards --out as the promote target', () => { + const cook = serveCookOptions( + parseServeArgs(['9', '--out=out', '--force', '--policy=parallel']), + '/proj', + ); + expect(cook.specId).toBe(9); + expect(cook.outDir).toBe(resolve('/proj', 'out')); + expect(cook.force).toBe(true); + expect(cook.policy).toBe('parallel'); + // cook reads opts.dir raw (no launch-cwd default — that's parseCookArgs only), + // so serve must thread the resolved dir the plan was written to, not ''. + expect(cook.dir).toBe('/proj'); + }); + + it('leaves absolute --out paths absolute', () => { + const cook = serveCookOptions(parseServeArgs(['9', '--out=/tmp/out']), '/proj'); + expect(cook.outDir).toBe('/tmp/out'); + }); + + it('omits outDir when serve had none (brownfield promotes automatically)', () => { + const cook = serveCookOptions(parseServeArgs(['9']), '/proj'); + expect(cook.outDir).toBeUndefined(); + }); +}); + +describe('runServe', () => { + it('plans then cooks, passing the mapped cook options', async () => { + const calls: string[] = []; + let cookSaw: CookOptions | undefined; + await runServe(parseServeArgs(['4', '--out=dist']), '/proj', { + plan: async () => { + calls.push('plan'); + }, + cook: async (o) => { + calls.push('cook'); + cookSaw = o; + }, + }); + expect(calls).toEqual(['plan', 'cook']); + expect(cookSaw?.specId).toBe(4); + expect(cookSaw?.outDir).toBe(resolve('/proj', 'dist')); + // cook runs against the same dir the plan was written to. + expect(cookSaw?.dir).toBe('/proj'); + }); + + it('does not cook if planning fails', async () => { + let cooked = false; + await expect( + runServe(parseServeArgs(['4']), '/proj', { + plan: async () => { + throw new Error('plan boom'); + }, + cook: async () => { + cooked = true; + }, + }), + ).rejects.toThrow(/plan boom/); + expect(cooked).toBe(false); + }); +}); diff --git a/src/server/serve-runner.ts b/src/server/serve-runner.ts new file mode 100644 index 000000000..090798637 --- /dev/null +++ b/src/server/serve-runner.ts @@ -0,0 +1,166 @@ +// `brunch serve ` — the Arc-1 capstone: one shot from a completed spec +// to a promoted cook result, no manual steps. It is pure glue over the existing +// `brunch plan` and `brunch cook` paths: emit the plan, then cook it. The only +// real logic here is arg parsing + the flag→stage mapping (serve's `--out` is +// the *promote* target → cook; `--profile` stamps the plan), so those are the +// testable units; the db/snapshot wiring stays in `cli.ts`. + +import { resolve } from 'node:path'; + +import type { CookOptions } from '../orchestrator/src/cook-cli.js'; +import { parseProfileId, type ProfileId } from '../orchestrator/src/project-profile.js'; + +export type ServeOptions = { + specificationId: number; + /** Greenfield promote target (→ cook `--out`); brownfield promotes automatically. */ + outDir?: string; + force: boolean; + /** Toolchain profile override; stamped into the emitted plan. */ + profile?: ProfileId; + verbose: boolean; + // Petrinaut + execution flags, forwarded to cook. + petrinautStream: boolean; + petrinautUrl?: string; + petrinautLanes: 'both' | 'mechanical'; + petrinautFold: 'color' | 'identity'; + petrinautOpen: boolean; + policy: 'serial' | 'parallel'; + maxRetries: number; +}; + +const USAGE = + 'Usage: brunch serve [--out=] [--force] [--profile=] [--policy=serial|parallel] [--max-retries=] [--petrinaut-stream] [--petrinaut-url=] [--petrinaut-lanes=both|mechanical] [--petrinaut-fold=color|identity] [--no-petrinaut-open] [--verbose]'; + +export function parseServeArgs(args: string[]): ServeOptions { + let specIdRaw: string | undefined; + let outDir: string | undefined; + let force = false; + let profile: ProfileId | undefined; + let verbose = false; + let petrinautStream = false; + let petrinautUrl: string | undefined; + let petrinautLanes: 'both' | 'mechanical' = 'both'; + let petrinautFold: 'color' | 'identity' = 'identity'; + let petrinautOpen = true; + let policy: 'serial' | 'parallel' = 'serial'; + let maxRetries = 3; + let sawPetrinautUrl = false; + let sawNoPetrinautOpen = false; + + for (const arg of args) { + if (arg.startsWith('--out=')) { + outDir = arg.slice('--out='.length); + } else if (arg === '--force') { + force = true; + } else if (arg.startsWith('--profile=')) { + profile = parseProfileId(arg.slice('--profile='.length)); + } else if (arg.startsWith('--policy=')) { + const val = arg.slice('--policy='.length); + if (val !== 'serial' && val !== 'parallel') + throw new Error(`Unknown policy: ${val}. Use serial or parallel.`); + policy = val; + } else if (arg.startsWith('--max-retries=')) { + const parsed = Number.parseInt(arg.slice('--max-retries='.length), 10); + if (!Number.isFinite(parsed) || parsed < 0) + throw new Error(`Invalid --max-retries value. Must be a non-negative integer.`); + maxRetries = parsed; + } else if (arg === '--petrinaut-stream') { + petrinautStream = true; + } else if (arg.startsWith('--petrinaut-url=')) { + petrinautUrl = arg.slice('--petrinaut-url='.length); + sawPetrinautUrl = true; + } else if (arg.startsWith('--petrinaut-lanes=')) { + const val = arg.slice('--petrinaut-lanes='.length); + if (val !== 'both' && val !== 'mechanical') + throw new Error(`Unknown --petrinaut-lanes value: ${val}. Use both or mechanical.`); + petrinautLanes = val; + } else if (arg.startsWith('--petrinaut-fold=')) { + const val = arg.slice('--petrinaut-fold='.length); + if (val !== 'color' && val !== 'identity') + throw new Error(`Unknown --petrinaut-fold value: ${val}. Use color or identity.`); + petrinautFold = val; + } else if (arg === '--no-petrinaut-open') { + petrinautOpen = false; + sawNoPetrinautOpen = true; + } else if (arg === '--verbose' || arg === '-v') { + verbose = true; + } else if (arg.startsWith('-')) { + throw new Error(`Unknown flag "${arg}". ${USAGE}`); + } else if (specIdRaw === undefined) { + specIdRaw = arg; + } else { + throw new Error(`Unexpected positional argument "${arg}". ${USAGE}`); + } + } + + if (specIdRaw === undefined) throw new Error(`Missing . ${USAGE}`); + const specificationId = Number.parseInt(specIdRaw, 10); + if (!Number.isInteger(specificationId) || specificationId <= 0) { + throw new Error(`Invalid "${specIdRaw}": expected a positive integer. ${USAGE}`); + } + if (sawPetrinautUrl && !petrinautStream) { + throw new Error('--petrinaut-url requires --petrinaut-stream'); + } + if (sawNoPetrinautOpen && !petrinautStream) { + throw new Error('--no-petrinaut-open requires --petrinaut-stream'); + } + + return { + specificationId, + outDir, + force, + profile, + verbose, + petrinautStream, + petrinautUrl, + petrinautLanes, + petrinautFold, + petrinautOpen, + policy, + maxRetries, + }; +} + +/** + * Map serve options to the cook stage. `specId` is set so cook reads the plan + * just emitted (not an auto-picked older one); serve's `--out` becomes cook's + * greenfield promote target (brownfield promotes automatically regardless). + * + * `cookDir` is the resolved launch cwd the plan was written under. `runCook` + * reads `opts.dir` raw — the launch-cwd default lives only in `parseCookArgs`, + * which serve bypasses — so cook would otherwise resolve the plan path against + * `process.cwd()` and clone `''` for brownfield. Threading the same dir the plan + * used keeps the two stages pointed at one directory (SPEC R46). + */ +export function serveCookOptions(opts: ServeOptions, cookDir: string): CookOptions { + return { + dir: cookDir, + policy: opts.policy, + maxRetries: opts.maxRetries, + verbose: opts.verbose, + petrinautFold: opts.petrinautFold, + petrinautLanes: opts.petrinautLanes, + petrinautStream: opts.petrinautStream, + ...(opts.petrinautUrl ? { petrinautUrl: opts.petrinautUrl } : {}), + petrinautOpen: opts.petrinautOpen, + ...(opts.outDir ? { outDir: resolve(cookDir, opts.outDir) } : {}), + force: opts.force, + specId: opts.specificationId, + }; +} + +/** + * Sequence the two stages: emit the plan, then cook it. Cook only runs if + * planning succeeded — a failed plan short-circuits with nothing cooked. Both + * stages are injected so the db/snapshot/agent side effects stay in `cli.ts` + * and this orchestration is unit-testable. `cookDir` is the resolved launch cwd + * the plan was written under, threaded into the cook options. + */ +export async function runServe( + opts: ServeOptions, + cookDir: string, + deps: { plan: () => Promise; cook: (cookOpts: CookOptions) => Promise }, +): Promise { + await deps.plan(); + await deps.cook(serveCookOptions(opts, cookDir)); +}