diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 110e20fa..3a966077 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -31,7 +31,7 @@ Open http://localhost:5173. | Variable | Required | Description | |---|---|---| | `ANTHROPIC_API_KEY` | Yes | Anthropic API key | -| `ANTHROPIC_MODEL` | No | Interviewer model (default: `claude-sonnet-4-20250514`) | +| `ANTHROPIC_MODEL` | No | Interviewer model (default: `claude-opus-4-6`) | | `OBSERVER_MODEL` | No | Observer model (default: `claude-haiku-4-5-20251001`) | | `BRUNCH_DB` | No | Override the default project-local SQLite path for dev workflows | | `BRUNCH_PORT` | No | Backend port override | diff --git a/memory/PLAN.md b/memory/PLAN.md index 92b02668..f7fd6935 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -59,7 +59,7 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen 4. `dogfood-spike` (ln-spike) — **(done — 2026-06-16)** ran a real brownfield cook (hand-authored 2-slice plan: feature + wiring, `node:http` app) against a throwaway git repo. **Verdict:** chain works end-to-end (CoW worktree, clean-tree gate, per-slice→`__epic__` merge composed the wiring, TDD red/green, working branch untouched); the agent wired the feature reachable and **self-authored a genuine boot-and-probe** integration test (imports the real entry, `listen(0)`, `http.get('/health')`, asserts not-404). Orphan did **not** reproduce — but reachability was **agent-discretion, not enforced** → confirms the *value* of `integration-oracle`/`app-runtime-probe` (independent, unshortcuttable reachability). Two refinements surfaced: the probe should own the boot mechanism (the agent had to invent a `.js→.ts` resolve hook), and dep-install was unexercised (zero-dep app). Bonus: the `Cannot find module` TDD red was handled as a test-red (not infra) — validates FE-872 slice 1 live. 5. `app-runtime-probe` — **(slices 1–2 landed — FE-875, `runProbe` + `buildProbeSpec`)** build + boot + exercise the host app; the concrete reachability mechanism `integration-oracle` depends on (without it, "reachable" collapses back to "a test that imports the module"). Slice 1: boot + HTTP probe + reachable/not-reachable/infra classification + teardown. Slice 2: harness-owned `ProbeSpec` resolution — `buildProbeSpec(ProbeTarget)` allocates a free ephemeral port and assembles ready/feature URLs from boot-argv + *paths*, so a hardcoded port can't collide under parallel cook (the boot test's hand-rolled port dance is now the production primitive it dogfoods). Stays off the dispatch seam: argv + paths are inputs cook-time grounding will supply; the harness owns only the port pick + URL/env assembly (loopback-only; best-effort ephemeral port with an acknowledged TOCTOU window, no retry framework). Every probe HTTP call (readiness poll + feature request) carries a per-call `AbortSignal.timeout` so a server that accepts a connection but never responds can't hang the probe (and the cook) past the deadline; timeouts are overridable for tests. Remaining: mode-awareness, integration-oracle gating (where the `ProbeTarget` argv/paths come from = `integration-oracle` #6). 6. `integration-oracle` — **(Half A + Half B seam landed — FE-876)** oracle asserts product reachability via `app-runtime-probe`. Half A (off-seam): `Epic.probe?: ProbeTarget` folds a `runProbe` result into the `verify-epic` verdict — after slices merge into `__epic__//`, the epic is `done` only when tests pass **and** the feature is reachable; `not-reachable` is the FE-800 orphan, `infra` is a harness fault. Probe gated behind tests passing (never boot a known-broken build); absent → unchanged unit verdict; reachability rides the existing `report.passed` routing. Half B seam: host-blind `Epic.reachability?: ReachabilityIntent` (architect-emittable, D160-K) + an injectable `ProbeGrounder` (`createPiActions({ groundProbe })`) that cook-time-resolves intent → concrete `ProbeTarget` by reading the worktree; `verify-epic` resolves via `probe ?? ground(reachability)`, a grounder that throws is an `infra` fault (visible, not a silent pass), intent without a grounder is an inert no-op. **Remaining (dispatch seam, lands atomically with the pi-harness contract):** the production `ProbeGrounder` (an `execute`-mode agent that reads the worktree) + architect emission of `reachability` intent — deferred together so intent is enforced the moment it's emitted (avoids perturbing the 3 reference fixtures). Runs in the FE-738 semantic lane. Promotes FE-800's integration-blind follow-on to a frontier. *(grounder impl depends on `agent-extension-host`)* -7. `brownfield-promotion` — **(landed — FE-877, `promoteBrownfieldRun`)** commit a completed brownfield cook result onto the repo's own `cook/` branch as one reviewable commit; extends FE-827's greenfield promotion to brownfield and closes the cook-codebase-mode follow-on (the result no longer sits uncommitted in the worktree). Git plumbing only (`commit-tree` + CAS `update-ref`, parent = the existing `cook/` base, throwaway index + external work-tree), so the user's active branch, working tree, and index are never touched; gitignored deps don't land. Reuses `promotionSourceDir` to compose the tree across slice layouts. Auto-runs on a completed brownfield cook (no `--out` needed); merging into the working branch stays the **user's** call. Unblocks FE-872's brownfield dep-delta capture. +7. `brownfield-promotion` — **(landed — FE-877, `promoteBrownfieldRun`)** commit a completed brownfield cook result onto the repo's own `cook/` branch as one reviewable commit; extends FE-827's greenfield promotion to brownfield and closes the cook-codebase-mode follow-on (the result no longer sits uncommitted in the worktree). Git plumbing only (`commit-tree` + CAS `update-ref`, parent = the existing `cook/` base, throwaway index + external work-tree), so the user's active branch, working tree, and index are never touched; gitignored deps don't land. Reuses `promotionSourceDir` to compose the tree across slice layouts. Auto-runs on a completed brownfield cook (no `--out` needed); merging into the working branch stays the **user's** call. Unblocks FE-872's brownfield dep-delta capture. **Follow-on (FE-864, `landCookBranch`):** `brunch serve --land` opt-in softens that default — after promotion it merges `cook/` into the repo's active branch as serve's final step, but refuses on a dirty tree / detached HEAD and aborts on conflict (cook branch always left intact). Plain `cook` and default `serve` are unchanged, so the "never freelance into the working branch" invariant holds unless the user explicitly asks. 8. `brunch-ship` — **(landed — FE-878, `brunch serve`)** one-shot `brunch serve ` = `plan ` then `cook --spec=` (cook reads the plan just emitted), no manual steps. Pure glue, no new orchestration: serve's `--out` is the *promote* target → cook (brownfield auto-promotes via FE-877 regardless), `--profile` stamps the plan, petrinaut/policy/retry flags forward to cook, `--verbose` to both; a failed plan short-circuits (nothing cooked). Testable units `parseServeArgs` + `runServe` (stages injected); db/snapshot wiring stays in `cli.ts`. Cook's `dir` is threaded from the resolved launch cwd (the dir the plan was written to) — `runCook` reads `opts.dir` raw, so serve must supply it rather than rely on the `parseCookArgs`-only default (R46). **Closes Arc 1.** **Runtime umbrella + semantic substrate:** diff --git a/src/orchestrator/src/app-probe.test.ts b/src/orchestrator/src/app-probe.test.ts index 8e4a76ea..e5cd62b9 100644 --- a/src/orchestrator/src/app-probe.test.ts +++ b/src/orchestrator/src/app-probe.test.ts @@ -126,6 +126,42 @@ describe('runProbe bounds its HTTP calls so a hung app cannot hang the probe', ( }); }); +describe('runProbe bounds its HTTP calls so a hung app cannot hang the probe', () => { + // A server that accepts connections (and the HTTP request) but never sends a + // response — the case the wall-clock deadline alone can't catch, because a + // bare `await fetch` would block forever between deadline checks. + const neverResponds = (readyRoutes: Record = {}): string => + `const http = require('node:http');\n` + + `const ready = ${JSON.stringify(readyRoutes)};\n` + + `http.createServer((req, res) => {\n` + + ` if (ready[req.url] !== undefined) { res.writeHead(ready[req.url]); res.end('ok'); return; }\n` + + ` /* otherwise: never respond */\n` + + `}).listen(Number(process.env.PORT), '127.0.0.1');\n`; + + it('a ready path that accepts connections but never responds → infra within the deadline', async () => { + const spec = await buildProbeSpec({ + boot: ['node', 'server.js'], + readyPath: '/health', + featurePath: '/feature', + }); + const dir = sandbox(neverResponds()); + const result = await runProbe(spec, dir, { readyTimeoutMs: 600, readyAttemptMs: 150 }); + expect(result.kind).toBe('infra'); + }); + + it('a booted app whose feature endpoint never responds → infra, not a hang', async () => { + const spec = await buildProbeSpec({ + boot: ['node', 'server.js'], + readyPath: '/health', + featurePath: '/feature', + }); + const dir = sandbox(neverResponds({ '/health': 200 })); + const result = await runProbe(spec, dir, { requestTimeoutMs: 300 }); + expect(result.kind).toBe('infra'); + expect(result.output).toMatch(/feature probe request failed/); + }); +}); + describe('runProbe tears the boot process down', () => { it('the booted app is no longer listening after the probe returns', async () => { const { spec, dir } = await specFor({ '/health': 200, '/feature': 200 }); diff --git a/src/orchestrator/src/cook-cli.ts b/src/orchestrator/src/cook-cli.ts index 585dfe4a..dff098aa 100644 --- a/src/orchestrator/src/cook-cli.ts +++ b/src/orchestrator/src/cook-cli.ts @@ -15,7 +15,7 @@ import { createPiActions } from './pi-actions.js'; import { loadPlan } from './plan-loader.js'; import type { CookBus } from './presenter.js'; import { resolveToolchain } from './project-profile.js'; -import { promoteBrownfieldRun, promoteGreenfieldRun } from './promote-run.js'; +import { landCookBranch, promoteBrownfieldRun, promoteGreenfieldRun } from './promote-run.js'; import { parseSpecId, resolveLatestSpecPlanPath, specPlanPath, specsRootDir } from './spec-plan-paths.js'; import { ToolchainTestRunner } from './test-runner.js'; import type { Plan, PlanMode } from './types.js'; @@ -49,6 +49,12 @@ export type CookOptions = { outDir?: string; /** Allow promoting into a non-empty target (otherwise refused). */ force: boolean; + /** + * Brownfield only: after promotion, merge `cook/` into the repo's active + * branch as the final step. Set by `serve --land`; plain `cook` never sets it, + * keeping promotion's hands-off default intact unless the user opts in. + */ + landBranch?: boolean; /** * Explicit specification id whose emitted plan (under * `/.brunch/cook/specs//plan.yaml`) should be cooked. @@ -120,6 +126,10 @@ export function parseCookArgs(args: string[]): CookOptions { verbose = true; } else if (!arg.startsWith('-')) { dir = arg; + } else { + // Reject unknown flags instead of silently ignoring them (e.g. --spec-id + // is not a flag; the spec selector is --spec=). + throw new Error(`Unknown flag "${arg}". Run "brunch --help" for cook usage.`); } } @@ -426,28 +436,21 @@ export async function runCook(opts: CookOptions, bus: CookBus): Promise { cliFlag: opts.petrinautUrl, env: { PETRINAUT_URL: process.env.PETRINAUT_URL }, }); - if ('error' in resolvedUrl) { - line(resolvedUrl.error); - process.exit(1); - } + // Throw, never process.exit — the caller (withCookBus) must dispose the + // presenter (unmount Ink) before the error is printed, or the TUI hangs. + if ('error' in resolvedUrl) throw new Error(resolvedUrl.error); petrinautUrl = resolvedUrl.url; streamPort = resolvePetrinautStreamPort({ PORT: process.env.PORT }); } const resolved = resolveCookPlan(opts.dir, opts.specId); - if (resolved.kind === 'error') { - line(resolved.message); - process.exit(1); - } + if (resolved.kind === 'error') throw new Error(resolved.message); const plan = loadPlan(resolved.planPath); // Worktree strategy follows the plan's spec-derived mode, not its location. const sandbox = resolveSandboxPlan(plan.mode, resolved.sourceDir); - if (sandbox.kind === 'error') { - line(sandbox.message); - process.exit(1); - } + if (sandbox.kind === 'error') throw new Error(sandbox.message); // Single shared tree only for serial greenfield (parallel would race on it); // every other case isolates slices per-slice. @@ -483,6 +486,13 @@ export async function runCook(opts: CookOptions, bus: CookBus): Promise { // Seed the presenter's elapsed clock; per-action progress carries no // pre-formatted timing — the presenter owns it (I136-K). bus.emit({ kind: 'cook-start', runStart }); + // Seed the slice grid up front so queued work is visible before it starts. + bus.emit({ + kind: 'run-shape', + epics: plan.epics.map((e) => ({ id: e.id })), + slices: plan.slices.map((s) => ({ id: s.id, epicId: s.epic_id })), + maxRetries: opts.maxRetries, + }); const actions = createPiActions({ verbose: opts.verbose, emit: (event) => bus.emit(event), @@ -511,6 +521,7 @@ export async function runCook(opts: CookOptions, bus: CookBus): Promise { reports, testRunner, policy: { maxRetries: opts.maxRetries }, + emit: (event) => bus.emit(event), sandboxMode: sandbox.kind === 'codebase' ? 'codebase' : 'fixture', sliceLayout, runId, @@ -569,9 +580,26 @@ export async function runCook(opts: CookOptions, bus: CookBus): Promise { runId, }), ); - line( - ` ✓ promoted → ${promoted.branch} @ ${promoted.commit.slice(0, 8)} (merge it into your branch when ready)`, - ); + if (opts.landBranch) { + const landed = promoting(`landing → ${promoted.branch} into the active branch`, () => + landCookBranch({ sourceDir: sandbox.sourceDir, runId }), + ); + if (landed.kind === 'landed') { + line(` ✓ promoted + landed ${promoted.branch} onto ${landed.branch} (${landed.mode})`); + } else if (landed.kind === 'refused') { + line( + ` ✓ promoted → ${promoted.branch} @ ${promoted.commit.slice(0, 8)} (not landed: working tree ${landed.reason}; merge it when ready)`, + ); + } else { + line( + ` ✓ promoted → ${promoted.branch} @ ${promoted.commit.slice(0, 8)} (not landed: merge conflict on ${landed.branch}; resolve with \`git merge ${promoted.branch}\`)`, + ); + } + } else { + line( + ` ✓ promoted → ${promoted.branch} @ ${promoted.commit.slice(0, 8)} (merge it into your branch when ready)`, + ); + } line(''); } catch (err) { line(` ✗ promotion failed: ${err instanceof Error ? err.message : String(err)}`); @@ -615,6 +643,9 @@ export async function runCook(opts: CookOptions, bus: CookBus): Promise { } } + // Run complete (after promotion) — lights the brigade's `serve` phase, or + // pins a halt summary with the reason when it did not complete. + bus.emit({ kind: 'cook-done', ok, ...(result.reason ? { reason: result.reason } : {}) }); recordCookExitStatus(ok); return; } finally { diff --git a/src/orchestrator/src/cow-copy.ts b/src/orchestrator/src/cow-copy.ts index bbd90104..5b19b22b 100644 --- a/src/orchestrator/src/cow-copy.ts +++ b/src/orchestrator/src/cow-copy.ts @@ -1,5 +1,5 @@ import { spawnSync } from 'node:child_process'; -import { cpSync, existsSync, readdirSync } from 'node:fs'; +import { cpSync, existsSync, readdirSync, symlinkSync } from 'node:fs'; import { join, resolve } from 'node:path'; /** @@ -23,16 +23,24 @@ export function cowCopy(src: string, dest: string): void { /** Top-level names skipped when CoW-copying into cook sandboxes. */ export const COW_COPY_DEFAULT_EXCLUDE = new Set(['.git', '.brunch']); +const NO_SYMLINKS: ReadonlySet = new Set(); + /** - * CoW-copy top-level entries from `sourceDir` that are absent in `destDir` + * Provision top-level entries from `sourceDir` that are absent in `destDir` * (untracked/gitignored dirs like `node_modules/`, `dist/`). Skips names in * `exclude` and entries already present in the destination (typically tracked * files materialized by `git worktree add`). + * + * Names in `symlink` are linked to the source entry instead of copied — used to + * share a single read-only `node_modules/` across slice sandboxes rather than + * paying a CoW copy per slice. Everything else is CoW-copied (lazy on APFS / + * reflink filesystems, deep copy otherwise). */ export function copyMissingTopLevelEntries( sourceDir: string, destDir: string, exclude: ReadonlySet = COW_COPY_DEFAULT_EXCLUDE, + symlink: ReadonlySet = NO_SYMLINKS, ): void { const source = resolve(sourceDir); const dest = resolve(destDir); @@ -40,6 +48,11 @@ export function copyMissingTopLevelEntries( if (exclude.has(entry)) continue; const destPath = join(dest, entry); if (existsSync(destPath)) continue; - cowCopy(join(source, entry), destPath); + const sourcePath = join(source, entry); + if (symlink.has(entry)) { + symlinkSync(sourcePath, destPath); + } else { + cowCopy(sourcePath, destPath); + } } } diff --git a/src/orchestrator/src/engine-contract.test.ts b/src/orchestrator/src/engine-contract.test.ts index 5dd81ad9..a097f505 100644 --- a/src/orchestrator/src/engine-contract.test.ts +++ b/src/orchestrator/src/engine-contract.test.ts @@ -17,6 +17,7 @@ import { createNetFolding } from './petrinaut-fold.js'; import type { SdcpnFile } from './petrinaut-sdcpn.js'; import { type BrunchExecutionExportFrame, createPetrinautStreamBus } from './petrinaut-stream-bus.js'; import { reduceBrunchExecutionExport } from './petrinaut-stream-export.js'; +import type { CookEvent } from './presenter/events.js'; import { InMemoryReportSink } from './report-sink.js'; import type { ActionContext, ActionHandlers, OrchestratorInput, Plan, RunCtx, TestRunner } from './types.js'; @@ -262,6 +263,24 @@ describe('Engine contract test #1 — single epic, single slice, happy path', () ]); }); + it('emits slice grid events around net-level test runs', async () => { + const fakes = createFakes(); + const events: CookEvent[] = []; + await create().run({ + plan: simplePlan, + sandboxDir: '/tmp/fake', + actions: fakes.actions, + reports: fakes.reports, + testRunner: fakes.testRunner, + policy: { maxRetries: 3 }, + emit: (event) => events.push(event), + }); + expect(events.filter((e) => e.kind === 'slice')).toEqual([ + { kind: 'slice', id: 'slice-1', epicId: 'epic-1', status: 'running', step: 'verify' }, + { kind: 'slice', id: 'slice-1', epicId: 'epic-1', status: 'passed' }, + ]); + }); + it('report sink contains expected lines', async () => { const fakes = createFakes(); await create().run({ diff --git a/src/orchestrator/src/epic-sandbox-merge.test.ts b/src/orchestrator/src/epic-sandbox-merge.test.ts index 14cdb91c..455055de 100644 --- a/src/orchestrator/src/epic-sandbox-merge.test.ts +++ b/src/orchestrator/src/epic-sandbox-merge.test.ts @@ -1,9 +1,11 @@ import { execFileSync } from 'node:child_process'; import { existsSync, + lstatSync, mkdirSync, mkdtempSync, readFileSync, + readlinkSync, rmSync, symlinkSync, writeFileSync, @@ -14,6 +16,7 @@ import { dirname, join } from 'node:path'; import { afterEach, describe, expect, it } from 'vitest'; import { + ensureSliceWorktree, epicIdsForEpicVerifyMerge, mergeCompletedSlicesIntoTree, mergeSlicesIntoEpicSandbox, @@ -274,19 +277,31 @@ describe('seedSliceFromParentWorktree', () => { expect(readFileSync(join(sliceDir, 'src/a.ts'), 'utf8')).toBe('export const a = 1;\n'); }); - it('untracked content arrives via CoW copy from the parent', () => { + it('untracked content (other than node_modules) arrives via CoW copy from the parent', () => { const { parent, addUntracked } = makeGitParentWorktree('r2'); - // Simulate node_modules / generated artifacts present in the parent - // worktree but NOT tracked by git. - addUntracked('node_modules/dep/index.js', 'module.exports = 1;\n'); + // Simulate generated artifacts present in the parent worktree but NOT + // tracked by git. `dist/` is copied (a slice may rebuild it independently). addUntracked('dist/bundle.js', 'console.log("bundle");\n'); const sliceDir = seedSliceFromParentWorktree(parent, 'only', singleSlicePlan, 'r2'); - expect(readFileSync(join(sliceDir, 'node_modules/dep/index.js'), 'utf8')).toBe('module.exports = 1;\n'); + expect(lstatSync(join(sliceDir, 'dist')).isSymbolicLink()).toBe(false); expect(readFileSync(join(sliceDir, 'dist/bundle.js'), 'utf8')).toBe('console.log("bundle");\n'); }); + it('shares node_modules via a symlink to the parent rather than copying it', () => { + const { parent, addUntracked } = makeGitParentWorktree('r2b'); + addUntracked('node_modules/dep/index.js', 'module.exports = 1;\n'); + + const sliceDir = seedSliceFromParentWorktree(parent, 'only', singleSlicePlan, 'r2b'); + + const linkPath = join(sliceDir, 'node_modules'); + expect(lstatSync(linkPath).isSymbolicLink()).toBe(true); + expect(readlinkSync(linkPath)).toBe(join(parent, 'node_modules')); + // Resolves transparently for pi-actions reading deps through the link. + expect(readFileSync(join(linkPath, 'dep/index.js'), 'utf8')).toBe('module.exports = 1;\n'); + }); + it('slice worktree is checked out on a slice-level cook branch', () => { const { parent } = makeGitParentWorktree('r3'); @@ -343,6 +358,53 @@ describe('seedSliceFromParentWorktree', () => { ); }); +describe('ensureSliceWorktree', () => { + const dirs: string[] = []; + afterEach(() => { + for (const d of dirs) rmSync(d, { recursive: true, force: true }); + dirs.length = 0; + }); + + const singleSlicePlan: Plan = { + mode: 'brownfield', + epics: [{ id: 'e1', summary: '', depends_on: [], verification: [] }], + slices: [{ id: 'only', epic_id: 'e1', definition: '', depends_on: [], verification: [] }], + }; + + function makeGitParentWorktree(runId: string): string { + const source = mkdtempSync(join(tmpdir(), 'cook-source-')); + dirs.push(source); + execFileSync('git', ['init', '-q', '-b', 'main'], { cwd: source }); + execFileSync('git', ['config', 'user.email', 'test@example.com'], { cwd: source }); + execFileSync('git', ['config', 'user.name', 'Test'], { cwd: source }); + writeFileSync(join(source, 'README.md'), '# project\n'); + execFileSync('git', ['add', '.'], { cwd: source }); + execFileSync('git', ['commit', '-q', '-m', 'initial'], { cwd: source }); + + const runDir = mkdtempSync(join(tmpdir(), 'cook-run-')); + dirs.push(runDir); + const parent = join(runDir, 'worktree'); + execFileSync('git', ['worktree', 'add', '-q', '-b', `cook/${runId}`, parent, 'HEAD'], { cwd: source }); + return parent; + } + + it( + 'creates the slice worktree on first call and is a no-op on repeat (rework-safe)', + () => { + const parent = makeGitParentWorktree('r1'); + + const first = ensureSliceWorktree(parent, 'only', singleSlicePlan, 'r1'); + expect(existsSync(join(first, 'README.md'))).toBe(true); + + // Second call must not throw (seedSliceFromParentWorktree would, via its + // path-availability assertion) and must return the same dir. + const second = ensureSliceWorktree(parent, 'only', singleSlicePlan, 'r1'); + expect(second).toBe(first); + }, + GIT_TEST_TIMEOUT_MS, + ); +}); + describe('mergeSlicesIntoEpicSandbox', () => { const dirs: string[] = []; afterEach(() => { diff --git a/src/orchestrator/src/epic-sandbox-merge.ts b/src/orchestrator/src/epic-sandbox-merge.ts index 9bd2afb0..7e5e1f31 100644 --- a/src/orchestrator/src/epic-sandbox-merge.ts +++ b/src/orchestrator/src/epic-sandbox-merge.ts @@ -251,15 +251,47 @@ export function seedSliceFromParentWorktree( ); // 2. CoW-copy whatever's in the parent worktree but NOT in the slice - // worktree yet — i.e. untracked / gitignored content (`node_modules/`, - // `dist/`, etc.) that pi-actions might need at runtime. + // worktree yet — i.e. untracked / gitignored content (`dist/`, etc.) that + // pi-actions might need at runtime. `node_modules/` is symlinked to the + // parent's single copy instead of duplicated per slice (see + // SHAREABLE_TOP_LEVEL_ENTRIES); `walkFiles` skips symlinks, so the shared + // tree is never re-walked during dependency seeding, merge, or promotion. const excludedNames = new Set(['.git', '.brunch', EPIC_MERGE_SEGMENT]); for (const s of plan.slices) excludedNames.add(s.id); - copyMissingTopLevelEntries(parentSandboxDir, sliceDir, excludedNames); + copyMissingTopLevelEntries(parentSandboxDir, sliceDir, excludedNames, SHAREABLE_TOP_LEVEL_ENTRIES); return sliceDir; } +/** + * Top-level gitignored entries shared across slice sandboxes via symlink rather + * than CoW-copied per slice. `node_modules/` is install output that pi-actions + * read (resolve deps, run tests/build) but do not author, so a single + * parent-owned copy linked into each slice removes N-1 redundant tree copies. + * Build caches under it (`.cache`, `.vite`) become shared too — acceptable for + * cook's transient runs; revisit if a tool needs per-slice write isolation. + */ +const SHAREABLE_TOP_LEVEL_ENTRIES: ReadonlySet = new Set(['node_modules']); + +/** + * Idempotent codebase-mode slice worktree provisioning: create the git worktree + * on first call, no-op if it already exists. Called from `resolveSliceCwd` on + * every fire (action, run-tests, assess) and across reworks, so it must tolerate + * repeats. Provisioning is synchronous (`execFileSync`), so concurrent fires of + * distinct slices under the parallel policy serialize on the JS thread — no two + * `git worktree add` invocations against the shared object store overlap. + */ +export function ensureSliceWorktree( + parentSandboxDir: string, + sliceId: string, + plan: Plan, + runId: string, +): string { + const sliceDir = resolveSliceWorktreeDir(parentSandboxDir, sliceId); + if (existsSync(sliceDir)) return sliceDir; + return seedSliceFromParentWorktree(parentSandboxDir, sliceId, plan, runId); +} + /** Copy completed dependency slice worktrees into `slice`'s sandbox (plan order). */ export function seedSliceSandboxFromDeps( parentSandboxDir: string, diff --git a/src/orchestrator/src/net-compiler.ts b/src/orchestrator/src/net-compiler.ts index ef8e11cb..dac28cde 100644 --- a/src/orchestrator/src/net-compiler.ts +++ b/src/orchestrator/src/net-compiler.ts @@ -5,12 +5,9 @@ // 3. compilePlan(input, ctx) → PetriNet (convenience wrapper) // --------------------------------------------------------------------------- -import { mkdirSync } from 'node:fs'; - import { + ensureSliceWorktree, mergeSlicesIntoEpicSandbox, - resolveSliceWorktreeDir, - seedSliceFromParentWorktree, seedSliceSandboxFromDeps, sliceIdsForEpicVerifyMerge, } from './epic-sandbox-merge.js'; @@ -556,35 +553,30 @@ export function wireHandlers(blueprint: NetBlueprint, input: OrchestratorInput, net.addPlace(place); } - // Runtime filesystem preparation lives in wireHandlers so every action/test - // cwd exists before any transition can fire. This is the one intentional side - // effect in the wiring pass; a future prepareRunFilesystem step can split it - // out if more provisioning responsibilities accumulate. - // Per-slice dirs are parallel-safe; dependency seeding happens at fire time. - // In codebase mode, seed each slice dir with the parent worktree's contents - // (the source repo's HEAD via `git worktree add`) so pi-actions can modify - // existing code instead of writing into an empty dir. + // Per-slice sandboxes are provisioned lazily at fire time (in resolveSliceCwd), + // not eagerly here: a run that touches 2 of 8 slices pays for 2 worktrees, not + // 8. Each slice dir is an independent root, so concurrent fires of distinct + // slices never contend; repeat fires of the same slice (rework) are idempotent. // 'shared' (serial greenfield): all slices accrete into the run sandbox. // 'per-slice': each slice gets its own git worktree (codebase) or plain dir // (greenfield parallel), merged into __epic__ for verification. + // Fail fast on the missing-runId precondition rather than at first fire. const sliceLayout = input.sliceLayout ?? 'per-slice'; - if (input.sandboxMode === 'codebase') { - if (!input.runId) { - throw new Error('codebase mode requires input.runId (used to name slice-level git branches)'); - } - for (const slice of plan.slices) { - seedSliceFromParentWorktree(input.sandboxDir, slice.id, plan, input.runId); - } - } else if (sliceLayout === 'per-slice') { - for (const slice of plan.slices) { - mkdirSync(resolveSliceWorktreeDir(input.sandboxDir, slice.id), { recursive: true }); - } + const { runId } = input; + if (input.sandboxMode === 'codebase' && !runId) { + throw new Error('codebase mode requires input.runId (used to name slice-level git branches)'); } - const resolveSliceCwd = (slice: Slice): string => - sliceLayout === 'shared' - ? input.sandboxDir - : seedSliceSandboxFromDeps(input.sandboxDir, plan, slice, { preserveExisting: true }); + const resolveSliceCwd = (slice: Slice): string => { + if (sliceLayout === 'shared') return input.sandboxDir; + // Codebase mode: materialize the slice's git worktree (HEAD checkout + + // symlinked node_modules) on first touch so pi-actions modify existing code + // rather than an empty dir; greenfield per-slice gets a plain dir below. + if (input.sandboxMode === 'codebase') { + ensureSliceWorktree(input.sandboxDir, slice.id, plan, runId!); + } + return seedSliceSandboxFromDeps(input.sandboxDir, plan, slice, { preserveExisting: true }); + }; // Register transitions with wired fire handlers for (const skel of blueprint.transitions) { @@ -715,6 +707,7 @@ export function wireHandlers(blueprint: NetBlueprint, input: OrchestratorInput, const deferred = (async () => { const slice = plan.slices.find((s) => s.id === sliceId)!; const sandboxDir = resolveSliceCwd(slice); + input.emit?.({ kind: 'slice', id: sliceId, epicId, status: 'running', step: 'verify' }); // Shared verification seam: same verdict rule + infra-dominates // aggregate as evaluate-done / verify-epic (FE-872 unification). const { @@ -738,6 +731,7 @@ export function wireHandlers(blueprint: NetBlueprint, input: OrchestratorInput, const tok: Token = { ...inputToken, reportId }; if (passed) { + input.emit?.({ kind: 'slice', id: sliceId, epicId, status: 'passed' }); return [ { place: intermediatePlace, token: tok }, { place: budgetPlace, token: { ...baseToken, retryCount: 0 } }, @@ -749,6 +743,7 @@ export function wireHandlers(blueprint: NetBlueprint, input: OrchestratorInput, // infra failure, name that cause — "retry exhaustion" would // misdirect the reader to the code. ctx.sliceOutcomes.set(sliceId, { sliceId, status: 'halted' }); + input.emit?.({ kind: 'slice', id: sliceId, epicId, status: 'failed' }); const haltReason = failureKind === 'infra' ? `Slice ${sliceId} toolchain/install failure during verification` @@ -760,6 +755,7 @@ export function wireHandlers(blueprint: NetBlueprint, input: OrchestratorInput, }, ]; } + input.emit?.({ kind: 'slice', id: sliceId, epicId, status: 'failed' }); return [ { place: intermediatePlace, token: tok }, { place: budgetPlace, token: { ...baseToken, retryCount: retryCount + 1 } }, diff --git a/src/orchestrator/src/pi-actions.test.ts b/src/orchestrator/src/pi-actions.test.ts index acc3325d..abe190bc 100644 --- a/src/orchestrator/src/pi-actions.test.ts +++ b/src/orchestrator/src/pi-actions.test.ts @@ -8,9 +8,11 @@ import { afterEach, describe, expect, it } from 'vitest'; import { createPiActions, epicVerifyTask, + instrumentToolDefinition, runPi, type SessionFactory, sliceTestTask, + toolLabel, toolsForAction, } from './pi-actions.js'; import type { CookEvent } from './presenter/events.js'; @@ -150,6 +152,282 @@ describe('evaluate-done / verify-epic share the runner seam — failureKind is v expect(events.filter((e) => e.kind === 'activity-start')).toHaveLength(1); expect(events.filter((e) => e.kind === 'activity-end')).toHaveLength(1); }); + + it('marks writer slices failed when pi throws before reporting', async () => { + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const createSession = (async () => { + throw new Error('session boom'); + }) as unknown as SessionFactory; + + for (const action of ['write-tests', 'write-code'] as const) { + const events: CookEvent[] = []; + const actions = createPiActions({ createSession, emit: (e) => events.push(e) }); + + await expect(actions[action]!(ctx(new InMemoryReportSink()))).rejects.toThrow(/session boom/); + + expect(events.filter((e) => e.kind === 'slice')).toEqual([ + { + kind: 'slice', + id: 'chunk', + epicId: 'utils', + status: 'running', + step: action === 'write-tests' ? 'tests' : 'code', + }, + { + kind: 'slice', + id: 'chunk', + epicId: 'utils', + status: 'failed', + reason: action === 'write-tests' ? 'test authoring failed' : 'code authoring failed', + }, + ]); + } + }); +}); + +describe('verify-epic integration oracle (FE-876) — reachability folds into the epic verdict', () => { + const probeDirs: string[] = []; + afterEach(() => { + for (const dir of probeDirs.splice(0)) rmSync(dir, { recursive: true, force: true }); + }); + + // A real zero-dep app that answers `routes` (path → status); 404 otherwise. + function appSandbox(routes: Record): string { + const dir = mkdtempSync(join(tmpdir(), 'verify-epic-probe-')); + probeDirs.push(dir); + writeFileSync( + join(dir, 'server.js'), + `const http = require('node:http');\n` + + `const routes = ${JSON.stringify(routes)};\n` + + `http.createServer((req, res) => {\n` + + ` const status = routes[req.url] ?? 404;\n` + + ` res.writeHead(status); res.end(String(status));\n` + + `}).listen(Number(process.env.PORT), '127.0.0.1');\n`, + ); + return dir; + } + + function epicWithProbe(): Epic { + return { + id: 'utils', + summary: 'Utilities', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'tests/utils.integration.test.ts' }], + probe: { boot: ['node', 'server.js'], readyPath: '/health', featurePath: '/feature' }, + }; + } + + function passingActions(sandboxDir: string): { + actions: ReturnType; + ctx: (reports: InMemoryReportSink) => ActionContext; + } { + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const epic = epicWithProbe(); + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: true, output: 'ok' }; + }, + }, + createSession, + }); + return { actions, ctx: (reports) => ({ slice, epic, plan, sandboxDir, reports }) }; + } + + it('tests pass + feature reachable → epic passes (reachable)', async () => { + const reports = new InMemoryReportSink(); + const { actions, ctx } = passingActions(appSandbox({ '/health': 200, '/feature': 200 })); + const id = await actions['verify-epic']!(ctx(reports)); + const payload = reports.getById(id)!.payload as { passed: boolean; reachability?: string }; + expect(payload.passed).toBe(true); + expect(payload.reachability).toBe('reachable'); + }); + + it('tests pass but feature endpoint is absent → epic fails (the FE-800 orphan)', async () => { + const reports = new InMemoryReportSink(); + // App boots and answers /health, but /feature is 404 — merged but not wired in. + const { actions, ctx } = passingActions(appSandbox({ '/health': 200 })); + const id = await actions['verify-epic']!(ctx(reports)); + const payload = reports.getById(id)!.payload as { passed: boolean; reachability?: string }; + expect(payload.passed).toBe(false); + expect(payload.reachability).toBe('not-reachable'); + }); + + it('failing tests short-circuit the probe — no boot, unchanged unit verdict', async () => { + const reports = new InMemoryReportSink(); + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const epic = epicWithProbe(); + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: false, output: 'no runner', failureKind: 'infra' }; + }, + }, + createSession, + }); + // Point at a dir with no server.js: if the probe booted, it would error — it + // must not run because tests failed first. + const id = await actions['verify-epic']!({ slice, epic, plan, sandboxDir: tmpdir(), reports }); + const payload = reports.getById(id)!.payload as { + passed: boolean; + failureKind?: string; + reachability?: string; + }; + expect(payload.passed).toBe(false); + expect(payload.failureKind).toBe('infra'); + expect(payload.reachability).toBeUndefined(); + }); + + it('no probe target → unit-test verdict only (unchanged behavior)', async () => { + const reports = new InMemoryReportSink(); + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const epic: Epic = { + id: 'utils', + summary: 'Utilities', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'tests/utils.integration.test.ts' }], + }; + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: true, output: 'ok' }; + }, + }, + createSession, + }); + const id = await actions['verify-epic']!({ slice, epic, plan, sandboxDir: tmpdir(), reports }); + const payload = reports.getById(id)!.payload as { passed: boolean; reachability?: string }; + expect(payload.passed).toBe(true); + expect(payload.reachability).toBeUndefined(); + }); + + // ---- Half B: cook-time grounding seam ----------------------------------- + + function intentEpic(extra?: Partial): Epic { + return { + id: 'utils', + summary: 'Utilities', + depends_on: [], + verification: [{ kind: 'integration-test', target: 'tests/utils.integration.test.ts' }], + reachability: { feature: 'the /feature route responds' }, + ...extra, + }; + } + + function groundedVerifyEpic(opts: { + sandboxDir: string; + epic: Epic; + groundProbe?: ProbeGrounder; + }): Promise<{ passed: boolean; reachability?: string }> { + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const reports = new InMemoryReportSink(); + const fake = makeFakeSession({ emit: 'wrote the integration test' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const slice: Slice = { + id: 'chunk', + epic_id: 'utils', + definition: 'Add chunk()', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/chunk.test.ts' }], + }; + const plan: Plan = { mode: 'greenfield', epics: [opts.epic], slices: [slice] }; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: true, output: 'ok' }; + }, + }, + createSession, + groundProbe: opts.groundProbe, + }); + return actions['verify-epic']!({ + slice, + epic: opts.epic, + plan, + sandboxDir: opts.sandboxDir, + reports, + }).then((id) => reports.getById(id)!.payload as { passed: boolean; reachability?: string }); + } + + it('grounds a reachability intent into a concrete target, then probes it', async () => { + let seenFeature = ''; + const payload = await groundedVerifyEpic({ + sandboxDir: appSandbox({ '/health': 200, '/feature': 200 }), + epic: intentEpic(), + groundProbe: async (intent) => { + seenFeature = intent.feature; + return { boot: ['node', 'server.js'], readyPath: '/health', featurePath: '/feature' }; + }, + }); + expect(seenFeature).toContain('/feature'); + expect(payload.passed).toBe(true); + expect(payload.reachability).toBe('reachable'); + }); + + it('a reachability intent with no injected grounder is a no-op (unit verdict only)', async () => { + // sandbox has no app; if grounding ran and probed, it would error/fail. + const payload = await groundedVerifyEpic({ sandboxDir: tmpdir(), epic: intentEpic() }); + expect(payload.passed).toBe(true); + expect(payload.reachability).toBeUndefined(); + }); + + it('a grounder that throws is an infra fault — the epic fails, not silently passes', async () => { + const payload = await groundedVerifyEpic({ + sandboxDir: tmpdir(), + epic: intentEpic(), + groundProbe: async () => { + throw new Error('agent could not resolve wiring'); + }, + }); + expect(payload.passed).toBe(false); + expect(payload.reachability).toBe('infra'); + }); + + it('a concrete probe target wins over a reachability intent (Half A precedence)', async () => { + let grounderCalled = false; + const payload = await groundedVerifyEpic({ + sandboxDir: appSandbox({ '/health': 200, '/feature': 200 }), + epic: intentEpic({ + probe: { boot: ['node', 'server.js'], readyPath: '/health', featurePath: '/feature' }, + }), + groundProbe: async () => { + grounderCalled = true; + throw new Error('should not be called'); + }, + }); + expect(grounderCalled).toBe(false); + expect(payload.reachability).toBe('reachable'); + }); }); describe('verify-epic integration oracle (FE-876) — reachability folds into the epic verdict', () => { @@ -508,7 +786,7 @@ function makeFakeSession(behavior: { emit?: string | readonly unknown[]; hang?: describe('runPi drives an in-process pi session (no subprocess)', () => { const baseOpts = (sandboxDir: string, tools: string) => ({ label: 'tests slice-1', - model: 'claude-sonnet-4-6', + model: 'claude-opus-4-6', promptFile: join(promptsDir, 'test-writer.md'), task: 'do the thing', sandboxDir, @@ -760,7 +1038,7 @@ describe('runPi — real LLM self-containment smoke', () => { try { await runPi({ label: 'smoke', - model: 'claude-sonnet-4-6', + model: 'claude-opus-4-6', promptFile, task: 'Use the write tool to create a file named hello.txt in the current directory containing exactly: BRUNCH_SELF_CONTAINED', sandboxDir, @@ -774,3 +1052,169 @@ describe('runPi — real LLM self-containment smoke', () => { 120_000, ); }); + +describe('toolLabel — what the agent is doing', () => { + it('labels file tools by path, bash by command, grep/find by pattern', () => { + expect(toolLabel('edit', { path: 'src/auth/token.ts' })).toBe('edit src/auth/token.ts'); + expect(toolLabel('write', { path: 'tests/x.test.ts' })).toBe('write tests/x.test.ts'); + expect(toolLabel('bash', { command: 'bun test' })).toBe('bash bun test'); + expect(toolLabel('grep', { pattern: 'RefreshToken' })).toBe('grep RefreshToken'); + }); + + it('falls back to the bare tool name when no recognized target is present', () => { + expect(toolLabel('read', {})).toBe('read'); + expect(toolLabel('bash', undefined)).toBe('bash'); + }); + + it('truncates long labels with an ellipsis', () => { + const long = toolLabel('edit', { path: 'a/'.repeat(60) }); + expect(long.endsWith('…')).toBe(true); + expect(long.length).toBeLessThanOrEqual(56); + }); +}); + +describe('instrumentToolDefinition — observe then delegate', () => { + function fakeTool(name: string, run: (...args: unknown[]) => unknown) { + return { name, execute: run } as unknown as Parameters[0]; + } + + it('emits a label from the params, then delegates with the same args and result', () => { + const seen: unknown[] = []; + const labels: string[] = []; + const def = fakeTool('edit', (...args) => { + seen.push(...args); + return 'tool-result'; + }); + + instrumentToolDefinition(def, (label) => labels.push(label)); + const out = def.execute('call-1', { path: 'src/a.ts' }, undefined, undefined, {} as never); + + expect(labels).toEqual(['edit src/a.ts']); + expect(out).toBe('tool-result'); // delegation result preserved + expect(seen).toEqual(['call-1', { path: 'src/a.ts' }, undefined, undefined, {}]); // same args + }); + + it('never lets an observation error break the tool call', () => { + const def = fakeTool('bash', () => 'ok'); + instrumentToolDefinition(def, () => { + throw new Error('observer boom'); + }); + expect(def.execute('id', { command: 'echo hi' }, undefined, undefined, {} as never)).toBe('ok'); + }); +}); + +describe('action handlers emit slice grid events', () => { + const slice: Slice = { + id: 'login', + epic_id: 'api', + definition: 'Login', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/login.test.ts' }], + }; + const epic: Epic = { id: 'api', summary: 'API', depends_on: [], verification: [] }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + const ctx = (): ActionContext => ({ + slice, + epic, + plan, + sandboxDir: '/tmp/unused', + reports: new InMemoryReportSink(), + }); + type SliceEvent = Extract; + const sliceEvents = (events: CookEvent[]) => events.filter((e): e is SliceEvent => e.kind === 'slice'); + + it('evaluate-done emits running(verify) then passed for a DONE verdict', async () => { + const events: CookEvent[] = []; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: true, output: 'ok' }; + }, + }, + emit: (e) => events.push(e), + }); + await actions['evaluate-done']!(ctx()); + expect(sliceEvents(events).map((s) => [s.id, s.status, s.step])).toEqual([ + ['login', 'running', 'verify'], + ['login', 'passed', undefined], + ]); + }); + + it('evaluate-done emits failed for a NEEDS-WORK verdict', async () => { + const events: CookEvent[] = []; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: false, output: 'nope' }; + }, + }, + emit: (e) => events.push(e), + }); + await actions['evaluate-done']!(ctx()); + expect(sliceEvents(events).at(-1)).toMatchObject({ status: 'failed' }); + }); + + it('write-tests emits running(tests) keyed by the slice id', async () => { + process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session'; + const events: CookEvent[] = []; + const fake = makeFakeSession({ emit: 'wrote tests' }); + const createSession = (async () => ({ session: fake.session })) as unknown as SessionFactory; + const actions = createPiActions({ createSession, emit: (e) => events.push(e) }); + await actions['write-tests']!(ctx()); + expect(sliceEvents(events)[0]).toMatchObject({ + id: 'login', + epicId: 'api', + status: 'running', + step: 'tests', + }); + }); +}); + +describe('evaluate-done failure carries a reason', () => { + const slice: Slice = { + id: 'login', + epic_id: 'api', + definition: 'L', + depends_on: [], + verification: [{ kind: 'unit-test', target: 'tests/l.test.ts' }], + }; + const epic: Epic = { id: 'api', summary: 'API', depends_on: [], verification: [] }; + const plan: Plan = { mode: 'greenfield', epics: [epic], slices: [slice] }; + const ctx = (): ActionContext => ({ + slice, + epic, + plan, + sandboxDir: '/tmp/x', + reports: new InMemoryReportSink(), + }); + type SliceEvent = Extract; + const lastSlice = (events: CookEvent[]) => events.filter((e): e is SliceEvent => e.kind === 'slice').at(-1); + + it('maps a test failure to "tests failed"', async () => { + const events: CookEvent[] = []; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: false, output: 'fail', failureKind: 'test' }; + }, + }, + emit: (e) => events.push(e), + }); + await actions['evaluate-done']!(ctx()); + expect(lastSlice(events)).toMatchObject({ status: 'failed', reason: 'tests failed' }); + }); + + it('maps an infra failure to "infra error"', async () => { + const events: CookEvent[] = []; + const actions = createPiActions({ + testRunner: { + async run() { + return { passed: false, output: 'no runner', failureKind: 'infra' }; + }, + }, + emit: (e) => events.push(e), + }); + await actions['evaluate-done']!(ctx()); + expect(lastSlice(events)).toMatchObject({ status: 'failed', reason: 'infra error' }); + }); +}); diff --git a/src/orchestrator/src/pi-actions.ts b/src/orchestrator/src/pi-actions.ts index a114a384..bd4ffdf1 100644 --- a/src/orchestrator/src/pi-actions.ts +++ b/src/orchestrator/src/pi-actions.ts @@ -5,12 +5,20 @@ import { fileURLToPath } from 'node:url'; import { AuthStorage, - type CreateAgentSessionOptions, createAgentSession, + createBashToolDefinition, + createEditToolDefinition, + createFindToolDefinition, + createGrepToolDefinition, + createLsToolDefinition, + createReadToolDefinition, + createWriteToolDefinition, + type CreateAgentSessionOptions, DefaultResourceLoader, ModelRegistry, SessionManager, SettingsManager, + type ToolDefinition, } from '@earendil-works/pi-coding-agent'; import { buildProbeSpec, runProbe } from './app-probe.js'; @@ -55,6 +63,79 @@ function logVerbose(output: string): void { _emit({ kind: 'verbose', text: output }); } +const HEARTBEAT_MAX = 56; + +/** The agent's most recent non-empty line, tail-truncated for a one-line wait heartbeat. */ +function latestLine(text: string): string { + const lines = text.split('\n'); + for (let i = lines.length - 1; i >= 0; i--) { + const line = lines[i]!.trim(); + if (line) return line.length > HEARTBEAT_MAX ? `…${line.slice(-HEARTBEAT_MAX)}` : line; + } + return ''; +} + +// --------------------------------------------------------------------------- +// Tool-call observability — show what the agent is *doing* (editing X, running +// bash, reading Y), not just what it's saying. We can't observe tool calls via +// session.subscribe (that stream is text/lifecycle only), so we supply the +// built-in tools ourselves and wrap their execute to emit a heartbeat. The +// createXToolDefinition builders bake in the real config (mutation queue, +// truncation defaults), so wrapping + delegating preserves behavior exactly. +// --------------------------------------------------------------------------- + +// Inferred so each builder keeps its own tool-schema generic; the heterogeneous +// list is erased to the base ToolDefinition at the single wrap point below. +const TOOL_DEF_BUILDERS = { + read: createReadToolDefinition, + write: createWriteToolDefinition, + edit: createEditToolDefinition, + bash: createBashToolDefinition, + grep: createGrepToolDefinition, + find: createFindToolDefinition, + ls: createLsToolDefinition, +} as const; + +/** A one-line "what the agent is doing" label from a tool name + its params. */ +export function toolLabel(name: string, params: unknown): string { + const p = (params && typeof params === 'object' ? params : {}) as Record; + const target = [p.path, p.command, p.pattern].find( + (v): v is string => typeof v === 'string' && v.length > 0, + ); + const label = target ? `${name} ${target}` : name; + return label.length > HEARTBEAT_MAX ? `${label.slice(0, HEARTBEAT_MAX - 1)}…` : label; +} + +/** Wrap a tool definition's execute to emit a heartbeat, then delegate unchanged. */ +export function instrumentToolDefinition( + def: ToolDefinition, + onUse: (label: string) => void, +): ToolDefinition { + const original = def.execute.bind(def); + def.execute = ((...args: Parameters) => { + // Observation must never break a tool call. + try { + onUse(toolLabel(def.name, args[1])); + } catch { + /* ignore */ + } + return original(...args); + }) as typeof def.execute; + return def; +} + +function buildInstrumentedTools( + names: string[], + cwd: string, + onUse: (label: string) => void, +): ToolDefinition[] { + return names.flatMap((name) => { + const build = TOOL_DEF_BUILDERS[name as keyof typeof TOOL_DEF_BUILDERS]; + if (!build) return []; + return [instrumentToolDefinition(build(cwd) as ToolDefinition, onUse)]; + }); +} + /** Bracket a wait so it shows as a live pending activity; always closes. */ async function withActivity(id: string, label: string, fn: () => Promise): Promise { _emit({ kind: 'activity-start', id, label }); @@ -69,7 +150,7 @@ async function withActivity(id: string, label: string, fn: () => Promise): // Pi dispatch // --------------------------------------------------------------------------- -const PI_TIMEOUT_MS = 300_000; +const PI_TIMEOUT_MS = 600_000; // Output cap — the timeout alone won't stop a fast, chatty agent. const PI_MAX_OUTPUT = 10 * 1024 * 1024; @@ -91,6 +172,9 @@ interface RunPiOpts { task: string; sandboxDir: string; tools: string; + /** Activity id for the live wait/heartbeat. Defaults to `label`; set to the + * slice id so the heartbeat lands on that slice's grid row. */ + activityId?: string; } /** The pi SDK session factory — injectable so the drive loop is testable without a model or network. */ @@ -139,6 +223,18 @@ async function buildSessionOptions(opts: RunPiOpts, isolatedDir: string): Promis }); await resourceLoader.reload(); + // Supply the built-in tools ourselves (instrumented), instead of the `tools` + // name allowlist, so each tool call emits a "what the agent is doing" + // heartbeat into the current wait. `noTools:'builtin'` drops the default + // read/bash/edit/write so they aren't double-registered. + const toolNames = opts.tools + .split(',') + .map((t) => t.trim()) + .filter(Boolean); + const customTools = buildInstrumentedTools(toolNames, opts.sandboxDir, (label) => { + _emit({ kind: 'activity-progress', id: opts.activityId ?? opts.label, detail: label }); + }); + return { cwd: opts.sandboxDir, agentDir: isolatedDir, @@ -146,7 +242,9 @@ async function buildSessionOptions(opts: RunPiOpts, isolatedDir: string): Promis authStorage, modelRegistry, resourceLoader, - tools: opts.tools.split(','), + noTools: 'builtin', + tools: toolNames, + customTools, sessionManager: SessionManager.inMemory(opts.sandboxDir), settingsManager: SettingsManager.inMemory({ compaction: { enabled: false } }), }; @@ -165,8 +263,9 @@ async function runPi( const timeoutMs = deps.timeoutMs ?? PI_TIMEOUT_MS; const maxOutput = deps.maxOutput ?? PI_MAX_OUTPUT; const start = Date.now(); + const activityId = opts.activityId ?? opts.label; // Open a live wait so the (up to 5-minute) agent session isn't dead air. - _emit({ kind: 'activity-start', id: opts.label, label: opts.label }); + _emit({ kind: 'activity-start', id: activityId, label: opts.label }); let heartbeatKb = 0; const isolatedDir = createAgentDir(); @@ -223,11 +322,14 @@ async function runPi( } captured += delta; capturedBytes += deltaBytes; - // Throttled heartbeat — every 2 KB — so the spinner shows progress, not churn. + // Throttled heartbeat — every 2 KB — surface what the agent is currently + // saying (its latest line) instead of a raw byte count, so the wait reads + // as live work, not just "still going". const kb = Math.floor(capturedBytes / 1024); if (kb >= heartbeatKb + 2) { heartbeatKb = kb; - _emit({ kind: 'activity-progress', id: opts.label, detail: `${kb} KB` }); + const snippet = latestLine(captured); + if (snippet) _emit({ kind: 'activity-progress', id: activityId, detail: snippet }); } } }); @@ -244,7 +346,7 @@ async function runPi( cleanupAgentDir(); // Always close the wait — even on timeout / overflow / prompt error — so // the spinner can never hang. - _emit({ kind: 'activity-end', id: opts.label }); + _emit({ kind: 'activity-end', id: activityId }); } if (timedOut) throw piTimeoutError(timeoutMs); @@ -329,9 +431,10 @@ export function createPiActions(opts?: { return { 'evaluate-done': async (ctx: ActionContext) => { const label = sliceLabel(ctx.slice); + _emit({ kind: 'slice', id: ctx.slice.id, epicId: ctx.epic.id, status: 'running', step: 'verify' }); log('?', `evaluate ${label}`); const { done, failureKind, results } = await withActivity( - `verify ${label}`, + ctx.slice.id, `running tests · ${label}`, () => runVerification(ctx.slice.verification, testRunner, ctx.sandboxDir), ); @@ -340,25 +443,45 @@ export function createPiActions(opts?: { log(r.passed ? '✓' : '✗', `verify ${r.target}`); } log(done ? '●' : '○', `verdict ${label} → ${done ? 'DONE' : 'NEEDS WORK'}`); + _emit({ + kind: 'slice', + id: ctx.slice.id, + epicId: ctx.epic.id, + status: done ? 'passed' : 'failed', + ...(done ? {} : { reason: failureKind === 'infra' ? 'infra error' : 'tests failed' }), + }); return report(ctx, 'evaluator', 'eval-done', { done, failureKind, results }); }, 'write-tests': async (ctx: ActionContext) => { const label = sliceLabel(ctx.slice); + _emit({ kind: 'slice', id: ctx.slice.id, epicId: ctx.epic.id, status: 'running', step: 'tests' }); log('▸', `tests ${label}`); const task = sliceTestTask(ctx.slice, toolchain); - await runPi( - { - label: `tests ${label}`, - model: 'claude-sonnet-4-6', - promptFile: join(promptsDir, 'test-writer.md'), - task, - sandboxDir: ctx.sandboxDir, - tools: toolsForAction('write-tests'), - }, - piDeps, - ); + try { + await runPi( + { + label: `tests ${label}`, + model: 'claude-opus-4-8', + promptFile: join(promptsDir, 'test-writer.md'), + task, + sandboxDir: ctx.sandboxDir, + tools: toolsForAction('write-tests'), + activityId: ctx.slice.id, + }, + piDeps, + ); + } catch (err) { + _emit({ + kind: 'slice', + id: ctx.slice.id, + epicId: ctx.epic.id, + status: 'failed', + reason: 'test authoring failed', + }); + throw err; + } return report(ctx, 'test-writer', 'tests-written', { sliceId: ctx.slice.id, @@ -368,20 +491,33 @@ export function createPiActions(opts?: { 'write-code': async (ctx: ActionContext) => { const label = sliceLabel(ctx.slice); + _emit({ kind: 'slice', id: ctx.slice.id, epicId: ctx.epic.id, status: 'running', step: 'code' }); log('▸', `code ${label}`); const task = `Write code to make tests pass for slice "${ctx.slice.id}": ${ctx.slice.definition}\nVerification targets: ${ctx.slice.verification.map((v) => `${v.kind}: ${v.target}`).join(', ')}\nImplement the minimum code to make all tests pass.`; - await runPi( - { - label: `code ${label}`, - model: 'claude-sonnet-4-6', - promptFile: join(promptsDir, 'code-writer.md'), - task, - sandboxDir: ctx.sandboxDir, - tools: toolsForAction('write-code'), - }, - piDeps, - ); + try { + await runPi( + { + label: `code ${label}`, + model: 'claude-opus-4-8', + promptFile: join(promptsDir, 'code-writer.md'), + task, + sandboxDir: ctx.sandboxDir, + tools: toolsForAction('write-code'), + activityId: ctx.slice.id, + }, + piDeps, + ); + } catch (err) { + _emit({ + kind: 'slice', + id: ctx.slice.id, + epicId: ctx.epic.id, + status: 'failed', + reason: 'code authoring failed', + }); + throw err; + } return report(ctx, 'code-writer', 'code-written', { sliceId: ctx.slice.id, @@ -402,7 +538,7 @@ export function createPiActions(opts?: { await runPi( { label: `verify ${ctx.epic.id} (write)`, - model: 'claude-sonnet-4-6', + model: 'claude-opus-4-8', promptFile: join(promptsDir, 'test-writer.md'), task: writeTask, sandboxDir: ctx.sandboxDir, diff --git a/src/orchestrator/src/plan-architect.test.ts b/src/orchestrator/src/plan-architect.test.ts index b55a2df1..679202e9 100644 --- a/src/orchestrator/src/plan-architect.test.ts +++ b/src/orchestrator/src/plan-architect.test.ts @@ -6,7 +6,7 @@ import { describe, expect, it } from 'vitest'; -import { architectDraftSchema, architectPlan } from './plan-architect.js'; +import { architectDraftSchema, architectPlan, DEFAULT_ARCHITECT_MODEL_ID } from './plan-architect.js'; import type { Plan } from './types.js'; const projected: Plan = { @@ -49,6 +49,10 @@ const wellFormed = { }; describe('architectPlan', () => { + it('defaults the production architect to the current Opus model', () => { + expect(DEFAULT_ARCHITECT_MODEL_ID).toBe('claude-opus-4-6'); + }); + it('parses a well-formed authored draft', async () => { const result = await architectPlan(projected, async () => wellFormed); expect(result.status).toBe('succeeded'); diff --git a/src/orchestrator/src/plan-architect.ts b/src/orchestrator/src/plan-architect.ts index f7082d48..463961dc 100644 --- a/src/orchestrator/src/plan-architect.ts +++ b/src/orchestrator/src/plan-architect.ts @@ -94,6 +94,8 @@ export type ArchitectResult = export type RunModel = (prompt: string) => Promise; +export const DEFAULT_ARCHITECT_MODEL_ID = 'claude-opus-4-6'; + const EMPTY_DRAFT: ArchitectDraft = { epics: [], slices: [], nonBuildableRequirementIds: [] }; /** @@ -211,7 +213,7 @@ function errorMessage(error: unknown): string { */ export const defaultArchitectRunModel: RunModel = async (prompt) => { const result = await generateText({ - model: anthropic(process.env.SPEC_TO_COOK_PLAN_MODEL || 'claude-sonnet-4-20250514'), + model: anthropic(process.env.SPEC_TO_COOK_PLAN_MODEL || DEFAULT_ARCHITECT_MODEL_ID), maxOutputTokens: 4096, prompt, output: Output.object({ schema: architectDraftSchema }), diff --git a/src/orchestrator/src/presenter/events.ts b/src/orchestrator/src/presenter/events.ts index 6be9e662..35926e76 100644 --- a/src/orchestrator/src/presenter/events.ts +++ b/src/orchestrator/src/presenter/events.ts @@ -29,7 +29,29 @@ export type CookEvent = // Updates the in-flight detail of an open activity (e.g. a pi token heartbeat). | { kind: 'activity-progress'; id: string; detail: string } // Closes the activity; the wait is over. - | { kind: 'activity-end'; id: string }; + | { kind: 'activity-end'; id: string } + // The run finished (emitted after promotion); `ok` = completed vs halted, + // `reason` is the halt reason when it did not complete. + | { kind: 'cook-done'; ok: boolean; reason?: string } + // --- slice grid --- + // Seeds the epic→slice progress grid up front (all slices start queued). + // `maxRetries` is the per-slice retry budget — total attempts is that + 1. + | { + kind: 'run-shape'; + epics: { id: string }[]; + slices: { id: string; epicId: string }[]; + maxRetries?: number; + } + // A slice changed state. `step` is the current sub-action while running; + // `reason` is why it failed (e.g. 'tests failed', 'infra error'). + | { + kind: 'slice'; + id: string; + epicId: string; + status: 'running' | 'passed' | 'failed'; + step?: string; + reason?: string; + }; export interface Presenter { onEvent(event: CookEvent): void; diff --git a/src/orchestrator/src/presenter/format.ts b/src/orchestrator/src/presenter/format.ts index 5dc04935..12ade6d0 100644 --- a/src/orchestrator/src/presenter/format.ts +++ b/src/orchestrator/src/presenter/format.ts @@ -36,5 +36,12 @@ export function formatCookEvent(event: CookEvent, clock: ElapsedClock): string[] case 'activity-end': // Live-only: the Ink panel reflects these; the existing completion log marks the end. return []; + case 'cook-done': + // Phase signal only (lights `serve`); the run summary already printed. + return []; + case 'run-shape': + case 'slice': + // Grid signals only — the per-action log lines already narrate plain output. + return []; } } diff --git a/src/orchestrator/src/presenter/ink/app.test.tsx b/src/orchestrator/src/presenter/ink/app.test.tsx index 667af51b..987af560 100644 --- a/src/orchestrator/src/presenter/ink/app.test.tsx +++ b/src/orchestrator/src/presenter/ink/app.test.tsx @@ -18,8 +18,8 @@ describe('Ink App', () => { await tick(); const frame = lastFrame() ?? ''; - // Wordmark header + command. - expect(frame).toContain('brunch'); + // Big lowercase ASCII wordmark rendered + the command label. + expect(frame).toContain('/_.___/'); expect(frame).toContain('cook'); // Brigade tracker shows every phase, with cook active (◐) once cooking. expect(frame).toContain('prep'); @@ -65,3 +65,89 @@ describe('Ink App', () => { expect(frame).not.toContain('agent writing tests'); }); }); + +describe('Ink App — slice grid', () => { + it("renders epics with per-slice status, the running slice's step/detail, and queued slices", async () => { + const store = new RunStore('cook', () => 0); + const { lastFrame } = render( 0} />); + + store.push({ + kind: 'run-shape', + epics: [{ id: 'api-auth' }], + slices: [ + { id: 'login', epicId: 'api-auth' }, + { id: 'refresh', epicId: 'api-auth' }, + ], + }); + store.push({ kind: 'slice', id: 'login', epicId: 'api-auth', status: 'passed' }); + store.push({ kind: 'slice', id: 'refresh', epicId: 'api-auth', status: 'running', step: 'code' }); + store.push({ kind: 'activity-progress', id: 'refresh', detail: 'edit src/token.ts' }); + await tick(); + + const frame = lastFrame() ?? ''; + expect(frame).toContain('api-auth'); // epic group header + expect(frame).toContain('✓ login'); // passed + expect(frame).toContain('refresh · code · edit src/token.ts'); // running w/ step + detail + }); +}); + +describe('Ink App — failure legibility', () => { + it('shows a failed slice reason and pins a halt summary', async () => { + const store = new RunStore('cook', () => 0); + const { lastFrame } = render( 0} />); + store.push({ kind: 'run-shape', epics: [{ id: 'api' }], slices: [{ id: 'login', epicId: 'api' }] }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'failed', reason: 'tests failed' }); + store.push({ kind: 'cook-done', ok: false, reason: 'login exhausted retries' }); + await tick(); + + const frame = lastFrame() ?? ''; + expect(frame).toContain('login · tests failed'); + expect(frame).toContain('✗ halted · login exhausted retries'); + }); + + it('shows no halt summary for a completed run', async () => { + const store = new RunStore('cook', () => 0); + const { lastFrame } = render( 0} />); + store.push({ kind: 'run-shape', epics: [{ id: 'api' }], slices: [{ id: 'login', epicId: 'api' }] }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'passed' }); + store.push({ kind: 'cook-done', ok: true }); + await tick(); + + expect(lastFrame() ?? '').not.toContain('✗ halted'); + }); +}); + +describe('Ink App — attempt count', () => { + it('shows the attempt as n/max only once a slice has retried', async () => { + const store = new RunStore('cook', () => 0); + const { lastFrame } = render( 0} />); + // maxRetries 3 → total attempts 4 (the n/max denominator). + store.push({ + kind: 'run-shape', + epics: [{ id: 'api' }], + slices: [{ id: 'login', epicId: 'api' }], + maxRetries: 3, + }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'code' }); + await tick(); + expect(lastFrame() ?? '').not.toContain('attempt'); // first run: no clutter + + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'failed', reason: 'tests failed' }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'code' }); + await tick(); + expect(lastFrame() ?? '').toContain('attempt 2/4'); + }); + + it('falls back to a bare attempt count when the retry budget is unknown', async () => { + const store = new RunStore('cook', () => 0); + const { lastFrame } = render( 0} />); + store.push({ kind: 'run-shape', epics: [{ id: 'api' }], slices: [{ id: 'login', epicId: 'api' }] }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'code' }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'failed', reason: 'x' }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'code' }); + await tick(); + const frame = lastFrame() ?? ''; + expect(frame).toContain('attempt 2'); + expect(frame).not.toContain('attempt 2/'); + }); +}); diff --git a/src/orchestrator/src/presenter/ink/app.tsx b/src/orchestrator/src/presenter/ink/app.tsx index d67ba031..5d332aa7 100644 --- a/src/orchestrator/src/presenter/ink/app.tsx +++ b/src/orchestrator/src/presenter/ink/app.tsx @@ -1,31 +1,21 @@ -// The full-screen Ink view: brunch wordmark header, brigade phase tracker, and a -// bounded live activity log. A thin projection of RunStore — all folding -// lives in the store + the pure phase tracker, so this stays declarative. +// The full-screen Ink view. The wordmark + activity log stream into terminal +// scrollback via (printed once each, so the full run is preserved and +// nothing "collapses"); a live footer below shows the brigade tracker, the +// single global run timer, and the pending-wait spinner. A thin projection of +// RunStore — all folding lives in the store + the pure phase tracker. -import { Box, Text } from 'ink'; -import { useEffect, useState, useSyncExternalStore } from 'react'; +import { Box, Static, Text } from 'ink'; +import { useEffect, useMemo, useState, useSyncExternalStore } from 'react'; import { formatElapsed } from '../clock.js'; import { BRIGADE, type BrigadePhase } from '../phase.js'; -import type { PendingActivity, RunStore } from '../run-store.js'; -import { BRUNCH_WORDMARK } from './wordmark.js'; +import type { PendingActivity, RunState, RunStore, SliceRow } from '../run-store.js'; +import { BRUNCH_ASCII, BRUNCH_ORANGE } from './wordmark.js'; -const LOG_TAIL = 15; const SPINNER = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏']; const TICK_MS = 250; -function Header({ command }: { command: string }) { - return ( - - {BRUNCH_WORDMARK.map(({ ch, color }) => ( - - {ch} - - ))} - {command} - - ); -} +type ScrollItem = { kind: 'mark'; text: string; color: string } | { kind: 'log'; text: string }; const STATUS_ICON = { done: '✓', active: '◐', pending: '○' } as const; @@ -47,31 +37,76 @@ function Brigade({ phase }: { phase: BrigadePhase }) { ); } -function ActivityLog({ lines }: { lines: string[] }) { +const SLICE_ICON = { queued: '○', running: '', passed: '✓', failed: '✗' } as const; +const SLICE_COLOR = { queued: 'gray', running: 'cyan', passed: 'green', failed: 'red' } as const; + +function attemptLabel(attempts: number | undefined, maxAttempts: number | undefined): string | undefined { + // Only once a slice has retried (≥2), formatted n/max when the budget is known. + if (!attempts || attempts < 2) return undefined; + return maxAttempts ? `attempt ${attempts}/${maxAttempts}` : `attempt ${attempts}`; +} + +function sliceTail(row: SliceRow, maxAttempts: number | undefined): string { + // For a failed slice the store cleared step/detail, so the tail is the reason. + return [row.step, attemptLabel(row.attempts, maxAttempts), row.reason, row.detail] + .filter(Boolean) + .join(' · '); +} + +const HALT_MAX = 56; + +function HaltSummary({ reason }: { reason: string }) { + const text = reason.length > HALT_MAX ? `${reason.slice(0, HALT_MAX - 1)}…` : reason; return ( - - {lines.slice(-LOG_TAIL).map((line, i) => ( - {line === '' ? ' ' : line} - ))} + + + ✗ halted · {text} + ); } -function PendingPanel({ - pending, - now, +function SliceGrid({ + epics, + slices, + maxAttempts, frame, -}: { - pending: PendingActivity[]; - now: () => number; - frame: string; -}) { - if (pending.length === 0) return null; +}: Pick & { frame: string }) { + if (slices.length === 0) return null; return ( + {epics.map((epicId) => { + const rows = slices.filter((s) => s.epicId === epicId); + if (rows.length === 0) return null; + return ( + + {epicId} + {rows.map((row) => { + const icon = row.status === 'running' ? frame : SLICE_ICON[row.status]; + const tail = sliceTail(row, maxAttempts); + return ( + + {' '} + {icon} {row.id} + {tail ? ` · ${tail}` : ''} + + ); + })} + + ); + })} + + ); +} + +function PendingPanel({ pending, frame }: { pending: PendingActivity[]; frame: string }) { + if (pending.length === 0) return null; + // One global timer lives in the footer; rows show only what's running. + return ( + {pending.map((a) => ( - {frame} {a.label} · {formatElapsed(now() - a.startedAt)} + {frame} {a.label} {a.detail ? ` · ${a.detail}` : ''} ))} @@ -82,24 +117,57 @@ function PendingPanel({ export function App({ store, now = () => Date.now() }: { store: RunStore; now?: () => number }) { const state = useSyncExternalStore(store.subscribe, store.getSnapshot, store.getSnapshot); - // Tick only while something is pending, so the spinner/elapsed advance even - // between events; the interval is torn down as soon as the waits clear. + // One ticker drives the spinner and the global elapsed clock while mounted. const [tick, setTick] = useState(0); - const hasPending = state.pending.length > 0; useEffect(() => { - if (!hasPending) return; const id = setInterval(() => setTick((t) => t + 1), TICK_MS); return () => clearInterval(id); - }, [hasPending]); + }, []); + + // Wordmark (once) + the append-only log → , so they stream into + // scrollback rather than redrawing in a bounded box. + const scroll = useMemo( + () => [ + ...BRUNCH_ASCII.map((text, i) => ({ + kind: 'mark' as const, + text, + color: BRUNCH_ORANGE[i % BRUNCH_ORANGE.length]!, + })), + ...state.lines.map((text) => ({ kind: 'log' as const, text })), + ], + [state.lines], + ); return ( - -
- - + <> + + {(item, i) => + item.kind === 'mark' ? ( + + {item.text} + + ) : ( + {item.text === '' ? ' ' : item.text} + ) + } + + + + + + {' '} + {state.command} · {formatElapsed(now() - state.runStart)} + + + + + {state.haltReason ? : null} - - - + ); } diff --git a/src/orchestrator/src/presenter/ink/wordmark.ts b/src/orchestrator/src/presenter/ink/wordmark.ts index fc793939..d7404ed5 100644 --- a/src/orchestrator/src/presenter/ink/wordmark.ts +++ b/src/orchestrator/src/presenter/ink/wordmark.ts @@ -1,12 +1,15 @@ -// The "brunch" wordmark for the TUI header, tinted with the brunch.ai brand -// gradient (HASH blue → indigo → violet, from the product mark). One hex per -// letter, left to right. The plain/CI backend stays untinted. +// The "brunch" wordmark for the TUI header: a big lowercase figlet (Slant), +// tinted top-to-bottom with a warm orange theme (the kind of sunset gradient +// CLI tools tend to use). Generated once with figlet (no runtime dep). The +// plain/CI backend stays untinted and prints no banner. -export const BRUNCH_WORDMARK: readonly { ch: string; color: string }[] = [ - { ch: 'b', color: '#00BBFF' }, - { ch: 'r', color: '#0080FF' }, - { ch: 'u', color: '#0046FF' }, - { ch: 'n', color: '#3A36FF' }, - { ch: 'c', color: '#5424FF' }, - { ch: 'h', color: '#6D2BF6' }, +export const BRUNCH_ASCII: readonly string[] = [ + ' __ __ ', + ' / /_ _______ ______ _____/ /_ ', + ' / __ \\/ ___/ / / / __ \\/ ___/ __ \\', + ' / /_/ / / / /_/ / / / / /__/ / / /', + '/_.___/_/ \\__,_/_/ /_/\\___/_/ /_/ ', ]; + +// One shade per row, light amber → deep ember. +export const BRUNCH_ORANGE: readonly string[] = ['#FFB454', '#FFA033', '#FF8C1A', '#FF7A00', '#F26419']; diff --git a/src/orchestrator/src/presenter/phase.test.ts b/src/orchestrator/src/presenter/phase.test.ts index e3a78a81..75f1b4cd 100644 --- a/src/orchestrator/src/presenter/phase.test.ts +++ b/src/orchestrator/src/presenter/phase.test.ts @@ -13,28 +13,41 @@ describe('nextPhase', () => { expect(nextPhase('prep', { kind: 'cook-start', runStart: 0 })).toBe('cook'); }); - it('advances to taste on an epic/verify action and to plate on a promotion line', () => { - expect(nextPhase('cook', { kind: 'action', icon: '▸', message: 'verify api-auth' })).toBe('taste'); + it('lights taste on the epic verdict but NOT on per-slice verify (mid-cook)', () => { + // Per-slice verify runs during cooking — must not light taste. + expect(nextPhase('cook', { kind: 'action', icon: '▸', message: 'verify api-auth' })).toBe('cook'); + expect(nextPhase('cook', { kind: 'action', icon: '✓', message: 'verify tests/x.test.ts' })).toBe( + 'cook', + ); + // The epic-verification verdict is the real verify→taste signal. expect(nextPhase('cook', { kind: 'action', icon: '●', message: 'epic api-auth → PASS' })).toBe( 'taste', ); - expect(nextPhase('taste', { kind: 'line', text: ' ✓ promoted → cook/abc @ 1234abcd' })).toBe('plate'); + }); + + it('advances to plate on a promotion line and to serve on a completed run', () => { + expect(nextPhase('cook', { kind: 'line', text: ' ✓ promoted → cook/abc @ 1234abcd' })).toBe('plate'); + expect(nextPhase('plate', { kind: 'cook-done', ok: true })).toBe('serve'); + }); + + it('does not light serve when the run halted', () => { + expect(nextPhase('cook', { kind: 'cook-done', ok: false })).toBe('cook'); }); it('never regresses to an earlier phase', () => { - // A per-slice action after taste must not pull the tracker back to cook. + expect(nextPhase('serve', { kind: 'cook-start', runStart: 0 })).toBe('serve'); expect(nextPhase('taste', { kind: 'action', icon: '▸', message: 'tests slice-2' })).toBe('taste'); - expect(nextPhase('plate', { kind: 'cook-start', runStart: 0 })).toBe('plate'); }); - it('walks a full cook run prep → cook → taste → plate', () => { + it('walks a full cook run prep → cook → taste → plate → serve', () => { expect( walk([ { kind: 'cook-start', runStart: 0 }, { kind: 'action', icon: '▸', message: 'tests slice-1' }, - { kind: 'action', icon: '▸', message: 'verify api-auth' }, + { kind: 'action', icon: '●', message: 'epic api-auth → PASS' }, { kind: 'line', text: ' ✓ promoted → cook/abc @ 1234abcd' }, + { kind: 'cook-done', ok: true }, ]), - ).toBe('plate'); + ).toBe('serve'); }); }); diff --git a/src/orchestrator/src/presenter/phase.ts b/src/orchestrator/src/presenter/phase.ts index e74b1e0b..232244db 100644 --- a/src/orchestrator/src/presenter/phase.ts +++ b/src/orchestrator/src/presenter/phase.ts @@ -26,9 +26,16 @@ function phaseFor(event: CookEvent): BrigadePhase | undefined { case 'cook-start': return 'cook'; case 'action': - return /^(verify|epic)/.test(event.message) ? 'taste' : undefined; + // verify→taste fires on the epic-verification verdict (`epic → …`), + // NOT on per-slice `verify ` lines — those run mid-cook and would + // light taste while still cooking. + return /^epic\b/.test(event.message) ? 'taste' : undefined; case 'line': return event.text.includes('promoted') ? 'plate' : undefined; + case 'cook-done': + // ship→serve: the run completed (emitted after promotion). A halted run + // does not ship, so it never lights serve. + return event.ok ? 'serve' : undefined; default: return undefined; } diff --git a/src/orchestrator/src/presenter/run-store.test.ts b/src/orchestrator/src/presenter/run-store.test.ts index b7cda75d..c5d565e9 100644 --- a/src/orchestrator/src/presenter/run-store.test.ts +++ b/src/orchestrator/src/presenter/run-store.test.ts @@ -45,22 +45,24 @@ describe('RunStore', () => { }); it('tracks pending activities: start adds, progress updates detail, end removes', () => { - let clock = 1000; - const store = new RunStore('cook', () => clock); + const store = new RunStore('cook', () => 1000); store.push({ kind: 'activity-start', id: 'tests:slice-1', label: 'agent writing tests' }); - let pending = store.getSnapshot().pending; + const pending = store.getSnapshot().pending; expect(pending).toHaveLength(1); - expect(pending[0]).toMatchObject({ id: 'tests:slice-1', label: 'agent writing tests', startedAt: 1000 }); + expect(pending[0]).toMatchObject({ id: 'tests:slice-1', label: 'agent writing tests' }); store.push({ kind: 'activity-progress', id: 'tests:slice-1', detail: '8 KB' }); expect(store.getSnapshot().pending[0]).toMatchObject({ detail: '8 KB' }); - clock = 5000; store.push({ kind: 'activity-end', id: 'tests:slice-1' }); expect(store.getSnapshot().pending).toHaveLength(0); }); + it('stamps a run-start for the global timer at construction', () => { + expect(new RunStore('cook', () => 4242).getSnapshot().runStart).toBe(4242); + }); + it('does not put activity events into the scrolling line log', () => { const store = new RunStore('cook', () => 0); store.push({ kind: 'activity-start', id: 'a', label: 'booting app' }); @@ -68,3 +70,125 @@ describe('RunStore', () => { expect(store.getSnapshot().lines).toEqual([]); }); }); + +describe('RunStore — slice grid', () => { + function seeded(): RunStore { + const store = new RunStore('cook', () => 0); + store.push({ + kind: 'run-shape', + epics: [{ id: 'api' }, { id: 'pay' }], + slices: [ + { id: 'login', epicId: 'api' }, + { id: 'refresh', epicId: 'api' }, + { id: 'charge', epicId: 'pay' }, + ], + }); + return store; + } + + it('seeds every slice as queued, grouped by epic order', () => { + const { epics, slices } = seeded().getSnapshot(); + expect(epics).toEqual(['api', 'pay']); + expect(slices.map((s) => [s.id, s.status])).toEqual([ + ['login', 'queued'], + ['refresh', 'queued'], + ['charge', 'queued'], + ]); + }); + + it('flips a slice to running with a step, then passed (latest wins, detail cleared)', () => { + const store = seeded(); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'tests' }); + store.push({ kind: 'activity-progress', id: 'login', detail: 'edit src/login.ts' }); + let row = store.getSnapshot().slices.find((s) => s.id === 'login')!; + expect(row).toMatchObject({ status: 'running', step: 'tests', detail: 'edit src/login.ts' }); + + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'passed' }); + row = store.getSnapshot().slices.find((s) => s.id === 'login')!; + expect(row.status).toBe('passed'); + expect(row.step).toBeUndefined(); // in-flight label cleared once it stops running + expect(row.detail).toBeUndefined(); // heartbeat cleared once it stops running + }); + + it('routes slice-keyed activity to the grid, non-slice activity to pending', () => { + const store = seeded(); + // A slice-keyed activity must NOT create a pending entry. + store.push({ kind: 'activity-start', id: 'login', label: 'login' }); + expect(store.getSnapshot().pending).toHaveLength(0); + + // A non-slice wait (promotion) does. + store.push({ kind: 'activity-start', id: 'promote', label: 'promoting → cook/abc' }); + expect(store.getSnapshot().pending.map((p) => p.id)).toEqual(['promote']); + store.push({ kind: 'activity-end', id: 'promote' }); + expect(store.getSnapshot().pending).toHaveLength(0); + }); +}); + +describe('RunStore — failure legibility', () => { + it('stores a slice failure reason', () => { + const store = new RunStore('cook', () => 0); + store.push({ kind: 'run-shape', epics: [{ id: 'api' }], slices: [{ id: 'login', epicId: 'api' }] }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'failed', reason: 'tests failed' }); + expect(store.getSnapshot().slices.find((s) => s.id === 'login')).toMatchObject({ + status: 'failed', + reason: 'tests failed', + }); + }); + + it('sets haltReason from cook-done(ok:false) and leaves it unset on completion', () => { + const halted = new RunStore('cook', () => 0); + halted.push({ kind: 'cook-done', ok: false, reason: 'budget exhausted' }); + expect(halted.getSnapshot().haltReason).toBe('budget exhausted'); + + const done = new RunStore('cook', () => 0); + done.push({ kind: 'cook-done', ok: true }); + expect(done.getSnapshot().haltReason).toBeUndefined(); + }); +}); + +describe('RunStore — attempt counting', () => { + function seed(): RunStore { + const store = new RunStore('cook', () => 0); + store.push({ kind: 'run-shape', epics: [{ id: 'api' }], slices: [{ id: 'login', epicId: 'api' }] }); + return store; + } + const attemptsOf = (store: RunStore) => store.getSnapshot().slices.find((s) => s.id === 'login')!.attempts; + + it('counts attempt 1 on first run; step changes mid-run do not bump it', () => { + const store = seed(); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'tests' }); + expect(attemptsOf(store)).toBe(1); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'code' }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'verify' }); + expect(attemptsOf(store)).toBe(1); // running→running keeps the count + }); + + it('bumps the attempt on a retry (failed → running) and keeps it on terminal failure', () => { + const store = seed(); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'verify' }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'failed', reason: 'tests failed' }); + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'running', step: 'code' }); + expect(attemptsOf(store)).toBe(2); // failed→running is attempt 2 + store.push({ kind: 'slice', id: 'login', epicId: 'api', status: 'failed', reason: 'tests failed' }); + expect(attemptsOf(store)).toBe(2); // terminal failure keeps the count + }); +}); + +describe('RunStore — retry budget', () => { + it('derives maxAttempts as retry budget + 1 from run-shape', () => { + const store = new RunStore('cook', () => 0); + store.push({ + kind: 'run-shape', + epics: [{ id: 'api' }], + slices: [{ id: 'a', epicId: 'api' }], + maxRetries: 3, + }); + expect(store.getSnapshot().maxAttempts).toBe(4); + }); + + it('leaves maxAttempts unset when run-shape omits the budget', () => { + const store = new RunStore('cook', () => 0); + store.push({ kind: 'run-shape', epics: [{ id: 'api' }], slices: [{ id: 'a', epicId: 'api' }] }); + expect(store.getSnapshot().maxAttempts).toBeUndefined(); + }); +}); diff --git a/src/orchestrator/src/presenter/run-store.ts b/src/orchestrator/src/presenter/run-store.ts index 81e9add8..466854d9 100644 --- a/src/orchestrator/src/presenter/run-store.ts +++ b/src/orchestrator/src/presenter/run-store.ts @@ -10,20 +10,44 @@ import type { CookEvent } from './events.js'; import { formatCookEvent } from './format.js'; import { type BrigadePhase, nextPhase } from './phase.js'; -const MAX_LINES = 500; - export interface PendingActivity { id: string; label: string; detail?: string; - startedAt: number; +} + +export type SliceStatus = 'queued' | 'running' | 'passed' | 'failed'; + +export interface SliceRow { + id: string; + epicId: string; + status: SliceStatus; + /** Current sub-action while running (tests / code / verify). */ + step?: string; + /** Live heartbeat for the running slice (latest line / tool). */ + detail?: string; + /** Why the slice failed (e.g. 'tests failed', 'infra error'). */ + reason?: string; + /** Attempt count — incremented each time the slice (re)enters running. */ + attempts?: number; } export interface RunState { command: string; phase: BrigadePhase; lines: string[]; + /** Non-slice waits (worktree, promotion). Slice waits live on the grid. */ pending: PendingActivity[]; + /** Epic ids in plan order, for grouping the grid. */ + epics: string[]; + /** The slice grid — every slice, seeded queued by run-shape. */ + slices: SliceRow[]; + /** When the run started, for the single global header timer. */ + runStart: number; + /** Set when the run halted — the reason, pinned in a halt summary. */ + haltReason?: string; + /** Total attempts allowed per slice (retry budget + 1), for the n/max display. */ + maxAttempts?: number; } export class RunStore { @@ -36,31 +60,91 @@ export class RunStore { private readonly now: () => number = () => Date.now(), ) { this.clock = createElapsedClock(now); - this.state = { command, phase: 'prep', lines: [], pending: [] }; + this.state = { command, phase: 'prep', lines: [], pending: [], epics: [], slices: [], runStart: now() }; + } + + private isSlice(id: string): boolean { + return this.state.slices.some((s) => s.id === id); + } + + private updateSlice(id: string, patch: Partial): SliceRow[] { + return this.state.slices.map((s) => (s.id === id ? { ...s, ...patch } : s)); } push(event: CookEvent): void { - if (event.kind === 'activity-start') { + if (event.kind === 'run-shape') { this.commit({ - pending: [...this.state.pending, { id: event.id, label: event.label, startedAt: this.now() }], + epics: event.epics.map((e) => e.id), + slices: event.slices.map((s) => ({ id: s.id, epicId: s.epicId, status: 'queued' as const })), + // total attempts = retry budget + 1 (attempt 1 is the first run) + ...(event.maxRetries !== undefined ? { maxAttempts: event.maxRetries + 1 } : {}), }); return; } + if (event.kind === 'slice') { + const running = event.status === 'running'; + const prev = this.state.slices.find((s) => s.id === event.id); + // A fresh run (queued→running) is attempt 1; a retry (failed→running) bumps + // it; a step change mid-run (running→running) keeps the count. + const attempts = running + ? prev?.status === 'running' + ? prev.attempts + : (prev?.attempts ?? 0) + 1 + : prev?.attempts; + this.commit({ + slices: this.updateSlice(event.id, { + status: event.status, + ...(event.step !== undefined ? { step: event.step } : {}), + // clear the in-flight label + heartbeat once the slice stops running + ...(running ? {} : { step: undefined, detail: undefined }), + // set/clear the failure reason from the event (undefined for passed/running) + reason: event.reason, + attempts, + }), + }); + return; + } + // Slice-keyed activity detail lands on the grid row; everything else is a + // non-slice wait (worktree, promotion) and shows in the pending footer. + if (event.kind === 'activity-start') { + if (this.isSlice(event.id)) return; + this.commit({ pending: [...this.state.pending, { id: event.id, label: event.label }] }); + return; + } if (event.kind === 'activity-progress') { + if (this.isSlice(event.id)) { + this.commit({ slices: this.updateSlice(event.id, { detail: event.detail }) }); + return; + } this.commit({ pending: this.state.pending.map((a) => (a.id === event.id ? { ...a, detail: event.detail } : a)), }); return; } if (event.kind === 'activity-end') { + if (this.isSlice(event.id)) { + this.commit({ slices: this.updateSlice(event.id, { detail: undefined }) }); + return; + } this.commit({ pending: this.state.pending.filter((a) => a.id !== event.id) }); return; } + if (event.kind === 'cook-done') { + // Advances the brigade to `serve` on success; pins a halt summary otherwise. + this.commit({ + phase: nextPhase(this.state.phase, event), + ...(event.ok ? {} : { haltReason: event.reason ?? 'halted' }), + }); + return; + } + const added = formatCookEvent(event, this.clock); const phase = nextPhase(this.state.phase, event); if (added.length === 0 && phase === this.state.phase) return; - this.commit({ phase, lines: [...this.state.lines, ...added].slice(-MAX_LINES) }); + // Append-only — the Ink backend streams these through , which + // assumes items only grow; the lines live in terminal scrollback. + this.commit({ phase, lines: [...this.state.lines, ...added] }); } private commit(patch: Partial): void { diff --git a/src/orchestrator/src/project-detect.test.ts b/src/orchestrator/src/project-detect.test.ts index 0da4e0a6..7785b7bb 100644 --- a/src/orchestrator/src/project-detect.test.ts +++ b/src/orchestrator/src/project-detect.test.ts @@ -187,3 +187,50 @@ describe('detectProfile resolves the runner from workspace packages in a monorep expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-vitest' }); }); }); + +describe('detectProfile resolves the runner from workspace packages in a monorepo', () => { + it('finds vitest in a workspace package when the root declares no runner', () => { + const dir = repo({ + 'package.json': JSON.stringify({ workspaces: ['packages/*'] }), + 'packages/app/package.json': pkg({ vitest: '^2.0.0' }), + 'packages/lib/package.json': pkg({ typescript: '^5.0.0' }), + }); + expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-vitest' }); + }); + + it('finds the runner via a pnpm-workspace.yaml package list', () => { + const dir = repo({ + 'package.json': JSON.stringify({ name: 'root' }), + 'pnpm-workspace.yaml': "packages:\n - 'packages/*'\n", + 'packages/web/package.json': pkg({ jest: '^29.0.0' }), + }); + expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-jest' }); + }); + + it('a root runner wins without scanning (and a workspace cannot make it ambiguous)', () => { + const dir = repo({ + 'package.json': JSON.stringify({ workspaces: ['packages/*'], devDependencies: { vitest: '^2.0.0' } }), + 'packages/legacy/package.json': pkg({ jest: '^29.0.0' }), + }); + expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-vitest' }); + }); + + it('workspaces collectively declaring both runners is ambiguous, not silently picked', () => { + const dir = repo({ + 'package.json': JSON.stringify({ workspaces: ['packages/*'] }), + 'packages/a/package.json': pkg({ vitest: '^2.0.0' }), + 'packages/b/package.json': pkg({ jest: '^29.0.0' }), + }); + const result = detectProfile(dir); + expect(result.detected).toBe(false); + expect(!result.detected && result.reason).toMatch(/ambiguous/i); + }); + + it('a literal (non-wildcard) workspace directory is resolved', () => { + const dir = repo({ + 'package.json': JSON.stringify({ workspaces: ['apps/web'] }), + 'apps/web/package.json': pkg({ vitest: '^2.0.0' }), + }); + expect(detectProfile(dir)).toMatchObject({ detected: true, profile: 'node-vitest' }); + }); +}); diff --git a/src/orchestrator/src/project-profile.test.ts b/src/orchestrator/src/project-profile.test.ts index 63f4915e..45d615fc 100644 --- a/src/orchestrator/src/project-profile.test.ts +++ b/src/orchestrator/src/project-profile.test.ts @@ -164,3 +164,30 @@ describe('withTestDir relocates test targets while preserving the filename conve expect(relocated.testCommand('src/x.test.ts')).toEqual(['npx', 'vitest', 'run', 'src/x.test.ts']); }); }); + +describe('withTestDir relocates test targets while preserving the filename convention', () => { + it('moves a tests/-default profile into the detected directory', () => { + const relocated = withTestDir(PROFILES['node-vitest'].toolchain, 'src'); + expect(relocated.sliceTarget('req-180')).toBe('src/req-180.test.ts'); + expect(relocated.epicTarget('epic-1')).toBe('src/epic-1.integration.test.ts'); + }); + + it('relocates the root-co-located brunch profile into a directory', () => { + const relocated = withTestDir(brunchProfile.toolchain, 'src'); + expect(relocated.sliceTarget('req-180')).toBe('src/req-180.test.ts'); + }); + + it('strips a trailing slash from the directory', () => { + expect(withTestDir(bunProfile.toolchain, 'pkg/').sliceTarget('s1')).toBe('pkg/s1.test.ts'); + }); + + it('an empty or "." directory places tests at the repo root', () => { + expect(withTestDir(bunProfile.toolchain, '').sliceTarget('s1')).toBe('s1.test.ts'); + expect(withTestDir(bunProfile.toolchain, '.').sliceTarget('s1')).toBe('s1.test.ts'); + }); + + it('leaves the test command untouched (only the target path changes)', () => { + const relocated = withTestDir(PROFILES['node-vitest'].toolchain, 'src'); + expect(relocated.testCommand('src/x.test.ts')).toEqual(['npx', 'vitest', 'run', 'src/x.test.ts']); + }); +}); diff --git a/src/orchestrator/src/promote-run.test.ts b/src/orchestrator/src/promote-run.test.ts index e79a719e..97003125 100644 --- a/src/orchestrator/src/promote-run.test.ts +++ b/src/orchestrator/src/promote-run.test.ts @@ -5,7 +5,7 @@ import { join } from 'node:path'; import { afterEach, describe, expect, it } from 'vitest'; -import { promoteBrownfieldRun, promoteGreenfieldRun } from './promote-run.js'; +import { landCookBranch, promoteBrownfieldRun, promoteGreenfieldRun } from './promote-run.js'; const dirs: string[] = []; const GIT_TEST_TIMEOUT_MS = 20_000; @@ -355,3 +355,53 @@ describe('promoteBrownfieldRun', () => { expect(files).not.toContain('old.ts'); }); }); + +describe('landCookBranch', () => { + const id = ['-c', 'user.name=t', '-c', 'user.email=t@e']; + + // A user repo on `main` with one base commit and a promoted cook/r1 branch + // (the composed result already committed on top of base via promoteBrownfieldRun). + function repoWithPromotedCook(): { dir: string; baseHead: string; cookCommit: string } { + const dir = mkdtempSync(join(tmpdir(), 'cook-land-')); + dirs.push(dir); + execFileSync('git', ['init', '-q', '-b', 'main'], { cwd: dir }); + writeFileSync(join(dir, 'app.ts'), 'export const v = 1;\n'); + writeFileSync(join(dir, '.gitignore'), 'node_modules/\n'); + execFileSync('git', ['add', '.'], { cwd: dir }); + execFileSync('git', [...id, 'commit', '-q', '-m', 'base'], { cwd: dir }); + execFileSync('git', ['branch', 'cook/r1'], { cwd: dir }); + const baseHead = execFileSync('git', ['rev-parse', 'HEAD'], { cwd: dir, encoding: 'utf8' }).trim(); + + const tree = mkdtempSync(join(tmpdir(), 'cook-land-tree-')); + dirs.push(tree); + writeFileSync(join(tree, 'app.ts'), 'export const v = 2;\n'); + writeFileSync(join(tree, 'feature.ts'), 'export const f = true;\n'); + writeFileSync(join(tree, '.gitignore'), 'node_modules/\n'); + const { commit } = promoteBrownfieldRun({ sourceDir: dir, sourceTreeDir: tree, runId: 'r1' }); + return { dir, baseHead, cookCommit: commit }; + } + + function head(dir: string): string { + return execFileSync('git', ['rev-parse', 'HEAD'], { cwd: dir, encoding: 'utf8' }).trim(); + } + + it( + 'fast-forwards the active branch onto cook/ when HEAD has not moved', + () => { + const { dir, cookCommit } = repoWithPromotedCook(); + + const result = landCookBranch({ sourceDir: dir, runId: 'r1' }); + + expect(result).toEqual({ kind: 'landed', mode: 'fast-forward', branch: 'main', commit: cookCommit }); + // Active branch advanced to the cook commit; the delta is now in the working tree. + expect(head(dir)).toBe(cookCommit); + expect(readFileSync(join(dir, 'app.ts'), 'utf8')).toContain('v = 2'); + expect(existsSync(join(dir, 'feature.ts'))).toBe(true); + // cook/r1 still exists for re-review. + expect(execFileSync('git', ['rev-parse', 'cook/r1'], { cwd: dir, encoding: 'utf8' }).trim()).toBe( + cookCommit, + ); + }, + GIT_TEST_TIMEOUT_MS, + ); +}); diff --git a/src/orchestrator/src/promote-run.ts b/src/orchestrator/src/promote-run.ts index fa6d410a..9eab6aa0 100644 --- a/src/orchestrator/src/promote-run.ts +++ b/src/orchestrator/src/promote-run.ts @@ -5,6 +5,17 @@ import { basename, isAbsolute, join, relative, resolve } from 'node:path'; export type PromoteResult = { target: string; branch: string; commit: string }; +export type LandResult = + | { kind: 'landed'; mode: 'fast-forward' | 'merge'; branch: string; commit: string } + | { kind: 'refused'; reason: 'dirty' | 'detached' } + | { kind: 'conflict'; branch: string }; + +export type LandOptions = { + /** The user's repo root whose active branch should receive the cook commit. */ + sourceDir: string; + runId: string; +}; + export type PromoteOptions = { sandboxDir: string; target: string; @@ -24,6 +35,15 @@ function git(args: string[], cwd: string, env?: NodeJS.ProcessEnv): string { return execFileSync('git', args, { cwd, env, encoding: 'utf8', stdio: ['ignore', 'pipe', 'pipe'] }).trim(); } +function gitOk(args: string[], cwd: string): boolean { + try { + git(args, cwd); + return true; + } catch { + return false; + } +} + // Deterministic committer so promotion never depends on (or mutates) global git config. const COMMIT_IDENTITY = ['-c', 'user.name=brunch', '-c', 'user.email=cook@brunch']; @@ -181,3 +201,44 @@ export function promoteBrownfieldRun(opts: BrownfieldPromoteOptions): PromoteRes rmSync(tmp, { recursive: true, force: true }); } } + +/** + * Merge a promoted `cook/` branch into the repo's checked-out branch — the + * opt-in counterpart to brownfield promotion's hands-off default. Promotion + * deliberately never touches the working branch; this is the only path that does, + * and only when the caller (`serve --land`) explicitly asks. It refuses rather + * than freelance: a dirty tree or detached HEAD is left untouched, and a real + * merge that conflicts is aborted back to a clean state. On every non-landed + * outcome the `cook/` branch stays intact for manual merge/review. + */ +export function landCookBranch(opts: LandOptions): LandResult { + const sourceDir = resolve(opts.sourceDir); + const ref = `cook/${opts.runId}`; + const cookCommit = git(['rev-parse', '--verify', ref], sourceDir); + + // Refuse on a detached HEAD (no branch to advance) or a dirty tree (don't bury + // uncommitted work under a merge) — leave the repo exactly as found. + let branch: string; + try { + branch = git(['symbolic-ref', '--quiet', '--short', 'HEAD'], sourceDir); + } catch { + return { kind: 'refused', reason: 'detached' }; + } + if (git(['status', '--porcelain'], sourceDir) !== '') { + return { kind: 'refused', reason: 'dirty' }; + } + + // HEAD unmoved since the run branched → cook/ is strictly ahead, so a + // fast-forward lands the commit verbatim. Otherwise a real merge is required. + if (gitOk(['merge-base', '--is-ancestor', 'HEAD', ref], sourceDir)) { + git(['merge', '--ff-only', ref], sourceDir); + return { kind: 'landed', mode: 'fast-forward', branch, commit: cookCommit }; + } + try { + git([...COMMIT_IDENTITY, 'merge', '--no-edit', ref], sourceDir); + } catch { + git(['merge', '--abort'], sourceDir); + return { kind: 'conflict', branch }; + } + return { kind: 'landed', mode: 'merge', branch, commit: git(['rev-parse', 'HEAD'], sourceDir) }; +} diff --git a/src/orchestrator/src/types.ts b/src/orchestrator/src/types.ts index e74eb851..3f8ede4a 100644 --- a/src/orchestrator/src/types.ts +++ b/src/orchestrator/src/types.ts @@ -246,6 +246,8 @@ export type OrchestratorInput = { reports: ReportSink; testRunner: TestRunner; policy: RunPolicy; + /** Ephemeral presentation events for live CLI surfaces (non-durable). */ + emit?: (event: import('./presenter/events.js').CookEvent) => void; /** * 'fixture' (default): per-slice worktrees are created empty. Greenfield. * 'codebase': per-slice worktrees are real `git worktree`s on slice-level diff --git a/src/server/app.test.ts b/src/server/app.test.ts index 099b6b04..12212985 100644 --- a/src/server/app.test.ts +++ b/src/server/app.test.ts @@ -619,6 +619,65 @@ describe('POST /api/specifications/:id/chat', () => { ); }); + it('accepts follow-up chat history containing failed (output-error) tool parts', async () => { + const projectId = await createTestSpecification(); + + await request(app) + .post(`/api/specifications/${projectId}/chat`) + .send({ + messages: [{ id: 'u1', role: 'user', parts: [{ type: 'text', text: 'hello' }] }], + }) + .expect(200); + + // Mirrors the real Opus 4.8 failure: a malformed `ask_question` lands as an + // `output-error` part carrying `rawInput` (no `input`) before the model + // retries successfully. The dead attempt must not brick the next turn. + await request(app) + .post(`/api/specifications/${projectId}/chat`) + .send({ + messages: [ + { id: 'u1', role: 'user', parts: [{ type: 'text', text: 'hello' }] }, + { + id: 'a1', + role: 'assistant', + parts: [ + { + type: 'tool-ask_question', + toolCallId: 'toolu_failed', + state: 'output-error', + rawInput: { content: 'A single option', is_recommended: 'false' }, + errorText: 'Invalid input for tool ask_question: Type validation failed', + }, + { + type: 'tool-ask_question', + toolCallId: 'toolu_retry', + state: 'output-available', + input: { + question: 'What should we focus on first?', + why: 'This narrows the initial slice.', + impact: 'high', + options: [], + }, + output: { ok: true, turnId: 2, optionCount: 0 }, + }, + ], + }, + { id: 'u2', role: 'user', parts: [{ type: 'text', text: 'Focus on export flow' }] }, + ], + }) + .expect('Content-Type', /text\/event-stream/) + .expect(200); + + expect(mockStreamInterviewer).toHaveBeenLastCalledWith( + expect.anything(), + expect.anything(), + expect.any(Array), + 'Focus on export flow', + 'grounding', + undefined, + ); + }); + it('returns an AI SDK UI message stream and persists the turn', async () => { const projectId = await createTestSpecification(); diff --git a/src/server/app.ts b/src/server/app.ts index 79b9cac7..489cc37c 100644 --- a/src/server/app.ts +++ b/src/server/app.ts @@ -123,6 +123,34 @@ function parseEntityProjectionMode(rawMode: unknown): EntityProjectionMode | nul return rawMode === 'active-path' || rawMode === 'project-wide' ? rawMode : null; } +/** + * Drop failed tool-call attempts (state `output-error`) the client may echo + * back in chat history. These arise when the model emitted a malformed tool + * call and retried — they carry `rawInput` but no `input`, which the AI SDK's + * `output-error` schema rejects, bricking the whole turn during + * `validateUIMessages`. The retried call is what matters; the failed attempt + * carries no history value, so strip it before validation. + */ +function stripFailedToolParts(rawMessages: unknown): unknown { + if (!Array.isArray(rawMessages)) { + return rawMessages; + } + return rawMessages.map((message) => { + if (!message || typeof message !== 'object' || !Array.isArray((message as { parts?: unknown }).parts)) { + return message; + } + const parts = (message as { parts: unknown[] }).parts.filter((part) => { + if (!part || typeof part !== 'object') { + return true; + } + const { type, state } = part as { type?: unknown; state?: unknown }; + const isToolPart = typeof type === 'string' && (type.startsWith('tool-') || type === 'dynamic-tool'); + return !(isToolPart && state === 'output-error'); + }); + return { ...message, parts }; + }); +} + function getChatRouteTransitionErrorStatus(kind: ChatRouteTransitionErrorKind): 400 | 404 | 409 { switch (kind) { case 'phase-intent-not-available': @@ -477,7 +505,7 @@ export function createApp(dbPathOrOptions?: string | AppOptions): AppServices { let messages: BrunchUIMessage[]; try { messages = await validateUIMessages({ - messages: req.body.messages ?? [], + messages: stripFailedToolParts(req.body.messages ?? []), dataSchemas: brunchDataPartSchemas, // The client may echo earlier assistant history that still contains dynamic // workspace-tool parts from a live stream (for example `list_directory`). diff --git a/src/server/cli.ts b/src/server/cli.ts index 00f133fc..28ae029e 100644 --- a/src/server/cli.ts +++ b/src/server/cli.ts @@ -141,25 +141,31 @@ exitIfAnthropicApiKeyMissing(); if (rawArgs[0] === 'cook') { const { parseCookArgs, runCook } = await import('../orchestrator/src/cook-cli.js'); const { withCookBus } = await import('../orchestrator/src/presenter.js'); - const opts = parseCookArgs(rawArgs.slice(1)); - // withCookBus disposes the bus (unmounts the Ink app) in finally so the TTY run exits. - await withCookBus('cook', (bus) => runCook(opts, bus)).catch((error) => { - console.error('Failed to run brunch cook:', error); + try { + // Parse before mounting the TUI; withCookBus disposes (unmounts Ink) in + // finally so a run error tears the TUI down before we print it. + const opts = parseCookArgs(rawArgs.slice(1)); + await withCookBus('cook', (bus) => runCook(opts, bus)); + } catch (error) { + console.error(`Failed to run brunch cook: ${error instanceof Error ? error.message : String(error)}`); process.exit(1); - }); + } } else if (rawArgs[0] === 'serve') { const { runPlan } = await import('./plan-runner.js'); const { runCook } = await import('../orchestrator/src/cook-cli.js'); const { parseServeArgs, runServe } = await import('./serve-runner.js'); const { withCookBus } = await import('../orchestrator/src/presenter.js'); - await withCookBus('serve', (bus) => - withCompletedSpec( - 'serve', - () => parseServeArgs(rawArgs.slice(1)), - async (opts, { project, snapshot }) => { + // Validate args + spec BEFORE mounting the TUI, so a bad specId errors plainly + // (no chrome flash). withCookBus then owns the TUI only for the actual run, and + // disposes it in finally even if the run throws. + await withCompletedSpec( + 'serve', + () => parseServeArgs(rawArgs.slice(1)), + async (opts, { project, snapshot }) => { + await withCookBus('serve', (bus) => // Cook runs against the same dir the plan was written to (launchCwd); see // serveCookOptions — runCook reads opts.dir raw, so serve must thread it. - await runServe(opts, launchCwd, { + runServe(opts, launchCwd, { plan: () => runPlan({ specificationId: opts.specificationId, @@ -172,19 +178,19 @@ if (rawArgs[0] === 'cook') { bus, }), cook: (cookOpts) => runCook(cookOpts, bus), - }); - }, - ), + }), + ); + }, ); } else if (rawArgs[0] === 'plan') { const { parsePlanArgs, runPlan } = await import('./plan-runner.js'); const { withCookBus } = await import('../orchestrator/src/presenter.js'); - await withCookBus('plan', (bus) => - withCompletedSpec( - 'plan', - () => parsePlanArgs(rawArgs.slice(1), launchCwd), - async (opts, { project, snapshot }) => { - await runPlan({ + await withCompletedSpec( + 'plan', + () => parsePlanArgs(rawArgs.slice(1), launchCwd), + async (opts, { project, snapshot }) => { + await withCookBus('plan', (bus) => + runPlan({ specificationId: opts.specificationId, snapshot, outDir: opts.outDir, @@ -193,9 +199,9 @@ if (rawArgs[0] === 'cook') { // Brownfield detection reads the launch cwd (the user's repo); greenfield ignores it. repoDir: project.cwd, bus, - }); - }, - ), + }), + ); + }, ); } else if (rawArgs[0] === 'agent') { const project = resolveBrunchProject(launchCwd); diff --git a/src/server/interview.ts b/src/server/interview.ts index 63a8bafd..6ab5c123 100644 --- a/src/server/interview.ts +++ b/src/server/interview.ts @@ -320,16 +320,20 @@ export function createInterviewerAgent( const instructions = getInterviewerInstructions(phase, options); return new ToolLoopAgent({ - model: anthropic(process.env.ANTHROPIC_MODEL || 'claude-sonnet-4-20250514'), + model: anthropic(process.env.ANTHROPIC_MODEL || 'claude-opus-4-8'), instructions, tools, providerOptions: { anthropic: { sendReasoning: true, - thinking: { - type: 'enabled', - budgetTokens: 10000, - }, + // Opus 4.8 controls thinking via adaptive type + effort, not the + // enabled/budgetTokens shape (which the API rejects for this model). + thinking: { type: 'adaptive' }, + effort: 'medium', + // Opus 4.8 otherwise fragments a single `ask_question` into several + // parallel partial calls (one option each / leaked tool-call XML), + // which land as failed `output-error` parts. Force one call per step. + disableParallelToolUse: true, }, }, maxOutputTokens: 16000, diff --git a/src/server/secondary-chat-route.ts b/src/server/secondary-chat-route.ts index 5fd3a992..cfb8d1d2 100644 --- a/src/server/secondary-chat-route.ts +++ b/src/server/secondary-chat-route.ts @@ -428,7 +428,7 @@ export async function handleSecondaryChatMessageRequest(db: DB, req: Request, re const stream = createUIMessageStream({ async execute({ writer }) { const result = streamText({ - model: anthropic(process.env.ANTHROPIC_MODEL || 'claude-sonnet-4-20250514'), + model: anthropic(process.env.ANTHROPIC_MODEL || 'claude-opus-4-6'), system, messages: messages.map((message) => ({ role: message.role, content: message.content })), tools, diff --git a/src/server/serve-runner.test.ts b/src/server/serve-runner.test.ts index 5df7d900..da9d837d 100644 --- a/src/server/serve-runner.test.ts +++ b/src/server/serve-runner.test.ts @@ -46,6 +46,12 @@ describe('parseServeArgs', () => { expect(() => parseServeArgs(['1', '2'])).toThrow(/Unexpected positional/); }); + it('parses --land and rejects combining it with the greenfield --out target', () => { + expect(parseServeArgs(['12', '--land']).land).toBe(true); + expect(parseServeArgs(['12']).land).toBe(false); + expect(() => parseServeArgs(['12', '--land', '--out=dist'])).toThrow(/--land/); + }); + it('rejects petrinaut companion flags unless streaming is enabled', () => { expect(() => parseServeArgs(['1', '--petrinaut-url=https://x/brunch'])).toThrow( /--petrinaut-url requires --petrinaut-stream/, @@ -94,6 +100,11 @@ describe('serveCookOptions', () => { const cook = serveCookOptions(parseServeArgs(['9']), '/proj'); expect(cook.outDir).toBeUndefined(); }); + + it('forwards --land as cook landBranch (off by default)', () => { + expect(serveCookOptions(parseServeArgs(['9', '--land']), '/proj').landBranch).toBe(true); + expect(serveCookOptions(parseServeArgs(['9']), '/proj').landBranch).toBe(false); + }); }); describe('runServe', () => { diff --git a/src/server/serve-runner.ts b/src/server/serve-runner.ts index 09079863..0ddb3033 100644 --- a/src/server/serve-runner.ts +++ b/src/server/serve-runner.ts @@ -14,6 +14,8 @@ export type ServeOptions = { specificationId: number; /** Greenfield promote target (→ cook `--out`); brownfield promotes automatically. */ outDir?: string; + /** Merge the promoted brownfield `cook/` branch into the active branch as the final step. */ + land: boolean; force: boolean; /** Toolchain profile override; stamped into the emitted plan. */ profile?: ProfileId; @@ -29,11 +31,12 @@ export type ServeOptions = { }; const USAGE = - 'Usage: brunch serve [--out=] [--force] [--profile=] [--policy=serial|parallel] [--max-retries=] [--petrinaut-stream] [--petrinaut-url=] [--petrinaut-lanes=both|mechanical] [--petrinaut-fold=color|identity] [--no-petrinaut-open] [--verbose]'; + 'Usage: brunch serve [--out=] [--land] [--force] [--profile=] [--policy=serial|parallel] [--max-retries=] [--petrinaut-stream] [--petrinaut-url=] [--petrinaut-lanes=both|mechanical] [--petrinaut-fold=color|identity] [--no-petrinaut-open] [--verbose]'; export function parseServeArgs(args: string[]): ServeOptions { let specIdRaw: string | undefined; let outDir: string | undefined; + let land = false; let force = false; let profile: ProfileId | undefined; let verbose = false; @@ -50,6 +53,8 @@ export function parseServeArgs(args: string[]): ServeOptions { for (const arg of args) { if (arg.startsWith('--out=')) { outDir = arg.slice('--out='.length); + } else if (arg === '--land') { + land = true; } else if (arg === '--force') { force = true; } else if (arg.startsWith('--profile=')) { @@ -98,6 +103,11 @@ export function parseServeArgs(args: string[]): ServeOptions { if (!Number.isInteger(specificationId) || specificationId <= 0) { throw new Error(`Invalid "${specIdRaw}": expected a positive integer. ${USAGE}`); } + if (land && outDir !== undefined) { + // --out is the greenfield promote target (a separate dir); --land merges the + // brownfield result into this repo's active branch. They name different modes. + throw new Error('--land cannot be combined with --out (--out is the greenfield promote target).'); + } if (sawPetrinautUrl && !petrinautStream) { throw new Error('--petrinaut-url requires --petrinaut-stream'); } @@ -108,6 +118,7 @@ export function parseServeArgs(args: string[]): ServeOptions { return { specificationId, outDir, + land, force, profile, verbose, @@ -144,6 +155,7 @@ export function serveCookOptions(opts: ServeOptions, cookDir: string): CookOptio ...(opts.petrinautUrl ? { petrinautUrl: opts.petrinautUrl } : {}), petrinautOpen: opts.petrinautOpen, ...(opts.outDir ? { outDir: resolve(cookDir, opts.outDir) } : {}), + landBranch: opts.land, force: opts.force, specId: opts.specificationId, };