FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec by kostandinang · Pull Request #224 · hashintel/brunch

kostandinang · 2026-06-16T14:18:21Z

Stack Context

Stacks on FE-878 and stays under the FE-864 brownfield orchestration umbrella. This PR is no longer just the timeout tweak; it collects the operational improvements needed to make brunch serve usable on real brownfield runs.

What?

Raises the cook agent action timeout to 600s.
Improves the live serve/cook heartbeat and completion signals so long runs show useful progress.
Lazily provisions per-slice cook worktrees and shares node_modules instead of copying it into every slice.
Moves orchestration defaults to claude-opus-4-6 so plan/cook/chat paths do not fall back on retired or weaker defaults.

Why?

The spec 23 run exposed this as a broader reliability issue, not a single timeout problem: eager worktree seeding copied nearly a gigabyte of node_modules per slice before execution could start, and the architect model default caused a fallback plan that amplified the slice count. This PR makes the FE-864 branch an umbrella for those concrete orchestration improvements.

cursor · 2026-06-16T14:18:31Z

PR Summary

Medium Risk
Changes brownfield sandbox provisioning and shared node_modules semantics (possible cache contention across parallel slices) plus widespread model defaults; TUI and error-path behavior affect every cook/serve run.

Overview
Improves brownfield brunch serve / cook runs that were slow or opaque on real repos: slice sandboxes are provisioned lazily when a transition fires (only touched slices pay for worktrees), and node_modules is symlinked from the parent worktree instead of CoW-copied per slice. ensureSliceWorktree makes repeat fires and rework idempotent.

The live cook presenter gains an upfront epic→slice grid (run-shape, slice events), tool-call heartbeats (instrumented pi tools + latest agent line instead of raw KB counts), a cook-done signal for the brigade serve phase, and Ink changes (Static scrollback, figlet wordmark, global timer). Brigade taste now tracks epic verdict lines, not per-slice verify.

Cook CLI rejects unknown flags, raises agent timeout to 600s, and throws (instead of process.exit) on plan/sandbox/Petrinaut preflight so withCookBus can unmount Ink before errors print. Entry points parse/validate args before mounting the TUI where applicable.

Default LLM moves to claude-opus-4-6 on cook pi-actions, plan architect, interviewer, and secondary chat (docs updated).

^{Reviewed by Cursor Bugbot for commit 3e02bfb. Bugbot is set up for automated code reviews on this repo. Configure here.}

kostandinang · 2026-06-16T14:18:41Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Each cook agent action (write-tests, write-code, verify-epic) runs under a per-action wall-clock budget enforced in pi-actions.ts. Raise the default from 300s to 600s so Sonnet agents have headroom on larger slices and on brownfield repos where setup/discovery eats into the turn. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…eam, clean failures Iterating on the live TUI from real-terminal feedback: - One global run timer in the footer instead of a per-item clock on every pending row (and whole-second, no jittery decimals). - "brunch" wordmark is now a big lowercase figlet (Slant) in a warm orange gradient, replacing the egg. - Activity log + wordmark stream through Ink <Static> so the full run lands in scrollback instead of collapsing in a redrawn bounded box; line cap removed. - Brigade tracker no longer lights "taste" mid-cook — per-slice verify actions fire during cooking, so taste stays unlit until a real end-of-cook signal. - Failures throw instead of process.exit, so withCookBus disposes (unmounts Ink) before the error prints — no more frozen "prep ◐" hang. cook validates args before mounting the TUI and rejects unknown flags (e.g. --spec-id). check + presenter/cook/pi-actions tests green; full build deferred (active graphite stack navigation). Co-Authored-By: Claude <noreply@anthropic.com>

Wires the two remaining kitchen-brigade phases faithfully to the orchestrator-arcs mapping (verify→taste, ship→serve): - taste lights on the epic-verification verdict (action `epic <id> → …`), not on per-slice `verify <target>` lines — those fire mid-cook and previously lit taste while still cooking. - serve lights on a new `cook-done` event emitted at the end of runCook (after promotion); a halted run never ships, so it never lights serve. phase.test covers both signals + the full prep→cook→taste→plate→serve walk; check + presenter/cook tests green. Co-Authored-By: Claude <noreply@anthropic.com>

Each cook pi session was a black box in the pending panel — just a KB count. runPi already subscribes to the session's text stream; instead of bytes, surface the agent's latest non-empty line (tail-truncated, throttled every 2 KB) as the activity-progress detail, so a wait reads as live work ("agent writing tests · …adds the RefreshToken guard") rather than "still going". Kept headless createAgentSession — no pi InteractiveMode, no new pi API: pi's tool-call events come via an extension hook (on('tool_call')), not the subscribe stream, so a richer "editing <file> / running <tool>" heartbeat is a separate follow-up that needs the extension-registration path verified. check + pi-actions/presenter tests green. Co-Authored-By: Claude <noreply@anthropic.com>

Richer "what the agent is doing" in the pending panel (the spike's Option A, full tier): instead of only the agent's latest line, show the tool calls — "edit src/auth/token.ts", "bash bun test", "grep RefreshToken". pi exposes no tool-call hook on session.subscribe (text/lifecycle only), so buildSessionOptions now supplies the built-in tools itself via customTools + noTools:'builtin': each createXToolDefinition(cwd) is wrapped to emit a label from its params, then delegates unchanged. The builders bake in the real config (withFileMutationQueue, truncation defaults), so behavior is preserved — confirmed in pi's edit.js. Observation is fail-safe (emit in try/catch). toolLabel + instrumentToolDefinition are pure/unit-tested (label mapping; wrap delegates same args + result; observer error can't break a tool call). Caveat: the customTools/noTools runtime wiring isn't covered by tests (they stub createSession, bypassing buildSessionOptions) — needs a real cook run to confirm the agent receives the instrumented tools and they emit live. check + pi-actions tests green. Co-Authored-By: Claude <noreply@anthropic.com>

Brownfield cook provisioned every slice's git worktree eagerly in wireHandlers — N `git worktree add` + N recursive node_modules CoW copies paid synchronously at startup before any slice fired. - Move slice-worktree creation into resolveSliceCwd via idempotent ensureSliceWorktree, so a slice's worktree is materialized on first fire. A run touching 2 of 8 slices pays for 2 worktrees, not 8. Synchronous provisioning serializes concurrent fires on the JS thread, so parallel-policy worktree adds never overlap. - Symlink each slice's node_modules to the parent worktree's single copy instead of CoW-copying per slice (SHAREABLE_TOP_LEVEL_ENTRIES). walkFiles already skips symlinks, so the shared tree is never re-walked during dep seeding, merge, or promotion. Other gitignored dirs still copy per slice. Correctness-neutral: same worktrees/branches, just lazy; deps resolve through the symlink. npm run verify green; adds symlink + idempotency unit tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Replaces the coarse single-phase view with a live grid that reflects cook's actual shape — the highest-value TUI improvement from the review, and it kills the brittle string-matching. - events: run-shape (seeds the grid from the plan, all slices queued) + slice (typed status running|passed|failed + step), emitted from cook-cli + the pi-actions handlers (write-tests/code/evaluate-done) — not string-matched logs. - run-store: a slices grid grouped by epic; slice-keyed activity heartbeat (aligned via runPi activityId = slice id) attaches to the slice's row, so "what the agent is doing" shows inline; non-slice waits stay in the pending footer. - ink: SliceGrid renders epic groups with per-slice status icons + the running slice's step/detail + spinner; replaces the flat pending list for slices. Retry counts deferred (a re-running slice just shows running again; latest wins). Live wiring (run-shape/slice from a real cook) is manual-verify, like the heartbeat. check + presenter/pi-actions/cook tests green (126). Co-Authored-By: Claude <noreply@anthropic.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 3e02bfb. Configure here.}

cursor · 2026-06-16T23:57:21Z

-  const resolveSliceCwd = (slice: Slice): string =>
-    sliceLayout === 'shared'
-      ? input.sandboxDir
-      : seedSliceSandboxFromDeps(input.sandboxDir, plan, slice, { preserveExisting: true });


Grid stale during run-tests

Medium Severity

After write-code, the slice grid keeps the code step while the net’s deferred run-tests transition runs verification. Slice progress events were added only in pi-actions, not where mechanical runVerification runs, so the TUI misstates what the slice is doing until evaluate-done fires.

Additional Locations (1)

src/orchestrator/src/pi-actions.ts#L479-L502

^{Reviewed by Cursor Bugbot for commit 3e02bfb. Configure here.}

cursor · 2026-06-16T23:57:22Z

+          // clear the live heartbeat once the slice stops running
+          ...(running ? {} : { detail: undefined }),
+        }),
+      });


Passed slices keep step text

Low Severity

When a slice moves to passed or failed, RunStore clears detail but leaves step set. The Ink grid still appends the old sub-action (e.g. verify) next to a checkmark, implying work is in progress after the slice finished.

^{Reviewed by Cursor Bugbot for commit 3e02bfb. Configure here.}

kostandinang mentioned this pull request Jun 16, 2026

FE-879: Lazy per-slice cook worktrees and shared node_modules for brownfield #223

Open

kostandinang changed the title ~~FE-864: raise pi action timeout to 600s~~ FE-864: Raise pi-action timeout to 600s Jun 16, 2026

kostandinang force-pushed the ka/fe-878-brunch-serve branch from 90bb5ef to 40a9d88 Compare June 16, 2026 18:05

kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from 5b145b2 to 7d39fe7 Compare June 16, 2026 18:05

kostandinang changed the title ~~FE-864: Raise pi-action timeout to 600s~~ FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec Jun 16, 2026

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/orchestrator/src/epic-sandbox-merge.ts

Comment thread src/orchestrator/src/epic-sandbox-merge.ts

kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from a5bfc10 to ac4e47c Compare June 16, 2026 23:45

kostandinang force-pushed the ka/fe-878-brunch-serve branch from 40a9d88 to 05b471a Compare June 16, 2026 23:45

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/orchestrator/src/pi-actions.ts Outdated

kostandinang and others added 10 commits June 17, 2026 00:52

FE-879: update default Anthropic model

6850c4e

FE-864: use Opus 4.6 for orchestration defaults

230af64

Co-authored-by: Cursor <cursoragent@cursor.com>

FE-864: fail slice rows when writer pi aborts

3e02bfb

Co-authored-by: Cursor <cursoragent@cursor.com>

kostandinang force-pushed the ka/fe-878-brunch-serve branch from 05b471a to 89e7850 Compare June 16, 2026 23:55

kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from ac4e47c to 3e02bfb Compare June 16, 2026 23:55

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec#224

FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec#224
kostandinang wants to merge 10 commits into
ka/fe-878-brunch-servefrom
ka/fe-864-pi-timeout-600s

kostandinang commented Jun 16, 2026 •

edited

Loading

Uh oh!

cursor Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

kostandinang commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 16, 2026

Uh oh!

cursor Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kostandinang commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stack Context

What?

Why?

Uh oh!

cursor Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

kostandinang commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

Grid stale during run-tests

Uh oh!

cursor Bot Jun 16, 2026

Choose a reason for hiding this comment

Passed slices keep step text

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kostandinang commented Jun 16, 2026 •

edited

Loading

cursor Bot commented Jun 16, 2026 •

edited

Loading

kostandinang commented Jun 16, 2026 •

edited

Loading