Skip to content

FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8)#224

Open
kostandinang wants to merge 50 commits into
ka/fe-878-brunch-servefrom
ka/fe-864-pi-timeout-600s
Open

FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8)#224
kostandinang wants to merge 50 commits into
ka/fe-878-brunch-servefrom
ka/fe-864-pi-timeout-600s

Conversation

@kostandinang

@kostandinang kostandinang commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Stack Context

Stacks on FE-878 (#222). FE-883 — operational improvements under the FE-864 brownfield-orchestration umbrella (planning PR #212). No longer the single timeout tweak it started as — this branch collects the operational improvements that make brunch serve usable on real brownfield runs.

What?

  • Cook agent action timeout → 600s — long real-repo actions no longer abort prematurely.
  • Orchestration on Opus 4.8 — interviewer, cook agents, and the architect default (DEFAULT_ARCHITECT_MODEL_ID) all run claude-opus-4-8; the interviewer uses adaptive thinking + effort, and a fix stops 4.8 fragmenting ask_question into failed tool parts.
  • Slice grid fidelity — fail rows when the writer pi aborts (including verify-epic), keep the grid current during verification, surface the failure reason + a pinned halt summary, show per-slice attempt count, and a "not-started" evaluate gate so a greenfield slice isn't painted ✗ before its tests exist.
  • serve --land — merge the promoted cook branch into the active branch as the final serve step; greenfield runs (no repo branch to land onto) warn instead of silently ignoring the flag.
  • Run-artifact composer — a git-merge-based run artifact (run-artifact.ts / run-refs.ts) composes the completed run.
  • Epic-merge dep resolution — the merged __epic__ tree re-links the parent's symlinked node_modules so verify-epic and probes have resolvable deps.
  • Dev-server quiet — stop the dev server reloading on .brunch cook worktrees.
  • Presentation polish — recipe banner lexicon, exact out-dir reporting, verbose warnings.

Why?

The spec-23 run exposed serve reliability as a broad issue, not a single timeout: actions timed out on real repos, the architect's model default produced weaker fallback plans that amplified slice counts, and slice failures — especially writer and epic-verify aborts — vanished from the grid with no reason. This branch is the FE-864 umbrella's implementation seam for those concrete operational fixes.

Tests

Per-seam unit coverage for the changes above: serve --land arg mapping + greenfield warning, verify-epic failed-row emission, the epic-merge node_modules re-link, the architect default model id, the run-artifact composer, and the slice-grid presenter.

@cursor

cursor Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches brownfield git promotion/landing, lazy worktree provisioning, and merge-based artifact folding—real-repo edge cases (conflicts, dirty tree, parallel slices) matter; presentation and timeout changes are lower risk but affect all cook runs.

Overview
Orchestration & models: Cook agents, architect default, and docs now target Opus (claude-opus-4-8 for writers/architect; interviewer default updated). Pi sessions use a 600s idle deadline (re-armed on any session/tool activity), instrumented built-in tools for live “what the agent is doing” heartbeats, and failed slice rows when write-tests, write-code, or verify-epic throws.

Git & promotion: Run/slice branches move from cook/* / cook-slice/* to centralized brunch/run/<id> and brunch/slice/<id>/<slice> via run-refs.ts. Brownfield promotion messaging follows the new refs; landCookBranch merges the promoted run branch into the active branch (refuses dirty/detached HEAD, aborts on conflict). serve --land sets landBranch on cook; greenfield warns that --land is ignored.

Run artifact (new): run-artifact.ts harvests completed runs by committing slice worktrees and folding them with git merge-tree in dependency order (fail-closed on conflicts; optional squash).

Sandboxes: Slice git worktrees are lazy (ensureSliceWorktree at fire time, not eager in wireHandlers). node_modules is symlinked from the parent per slice and re-linked after epic merge so verify/probes resolve deps. pruneWorktrees before worktree add; Vite dev server ignores .brunch/**.

Verification & grid: Test runner adds absent (“No test files found”) so the greenfield evaluate gate does not mark slices failed or burn attempts; strips [agent-tail] noise. Engine and pi-actions emit run-shape, slice, and cook-done events; Ink TUI shows epic/slice grid, halt summary, attempts, and scrollback wordmark; brigade phases fix verify vs epic timing.

CLI robustness: Cook rejects unknown flags; Petrinaut/plan/sandbox errors throw instead of process.exit so the Ink presenter can unmount. Test-writer prompt requires meaningful reds without implementing the slice.

Reviewed by Cursor Bugbot for commit 7a7e1e5. Bugbot is set up for automated code reviews on this repo. Configure here.

@kostandinang kostandinang changed the title FE-864: raise pi action timeout to 600s FE-864: Raise pi-action timeout to 600s Jun 16, 2026
@kostandinang kostandinang force-pushed the ka/fe-878-brunch-serve branch from 90bb5ef to 40a9d88 Compare June 16, 2026 18:05
@kostandinang kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from 5b145b2 to 7d39fe7 Compare June 16, 2026 18:05
@kostandinang kostandinang changed the title FE-864: Raise pi-action timeout to 600s FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec Jun 16, 2026
Comment thread src/orchestrator/src/epic-sandbox-merge.ts
Comment thread src/orchestrator/src/epic-sandbox-merge.ts
@kostandinang kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from a5bfc10 to ac4e47c Compare June 16, 2026 23:45
@kostandinang kostandinang force-pushed the ka/fe-878-brunch-serve branch from 40a9d88 to 05b471a Compare June 16, 2026 23:45
Comment thread src/orchestrator/src/pi-actions.ts Outdated
@kostandinang kostandinang force-pushed the ka/fe-878-brunch-serve branch from 05b471a to 89e7850 Compare June 16, 2026 23:55
@kostandinang kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from ac4e47c to 3e02bfb Compare June 16, 2026 23:55
Comment thread src/orchestrator/src/net-compiler.ts
Comment thread src/orchestrator/src/presenter/run-store.ts
@kostandinang kostandinang changed the title FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec FE-864: Orchestrator improvements Jun 17, 2026
kostandinang and others added 6 commits June 17, 2026 09:36
Each cook agent action (write-tests, write-code, verify-epic) runs under a
per-action wall-clock budget enforced in pi-actions.ts. Raise the default
from 300s to 600s so Sonnet agents have headroom on larger slices and on
brownfield repos where setup/discovery eats into the turn.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…eam, clean failures

Iterating on the live TUI from real-terminal feedback:

- One global run timer in the footer instead of a per-item clock on every
  pending row (and whole-second, no jittery decimals).
- "brunch" wordmark is now a big lowercase figlet (Slant) in a warm orange
  gradient, replacing the egg.
- Activity log + wordmark stream through Ink <Static> so the full run lands in
  scrollback instead of collapsing in a redrawn bounded box; line cap removed.
- Brigade tracker no longer lights "taste" mid-cook — per-slice verify actions
  fire during cooking, so taste stays unlit until a real end-of-cook signal.
- Failures throw instead of process.exit, so withCookBus disposes (unmounts
  Ink) before the error prints — no more frozen "prep ◐" hang. cook validates
  args before mounting the TUI and rejects unknown flags (e.g. --spec-id).

check + presenter/cook/pi-actions tests green; full build deferred (active
graphite stack navigation).

Co-Authored-By: Claude <noreply@anthropic.com>
Wires the two remaining kitchen-brigade phases faithfully to the
orchestrator-arcs mapping (verify→taste, ship→serve):

- taste lights on the epic-verification verdict (action `epic <id> → …`), not
  on per-slice `verify <target>` lines — those fire mid-cook and previously lit
  taste while still cooking.
- serve lights on a new `cook-done` event emitted at the end of runCook (after
  promotion); a halted run never ships, so it never lights serve.

phase.test covers both signals + the full prep→cook→taste→plate→serve walk;
check + presenter/cook tests green.

Co-Authored-By: Claude <noreply@anthropic.com>
Each cook pi session was a black box in the pending panel — just a KB count.
runPi already subscribes to the session's text stream; instead of bytes,
surface the agent's latest non-empty line (tail-truncated, throttled every
2 KB) as the activity-progress detail, so a wait reads as live work ("agent
writing tests · …adds the RefreshToken guard") rather than "still going".

Kept headless createAgentSession — no pi InteractiveMode, no new pi API: pi's
tool-call events come via an extension hook (on('tool_call')), not the
subscribe stream, so a richer "editing <file> / running <tool>" heartbeat is a
separate follow-up that needs the extension-registration path verified.

check + pi-actions/presenter tests green.

Co-Authored-By: Claude <noreply@anthropic.com>
Richer "what the agent is doing" in the pending panel (the spike's Option A,
full tier): instead of only the agent's latest line, show the tool calls —
"edit src/auth/token.ts", "bash bun test", "grep RefreshToken".

pi exposes no tool-call hook on session.subscribe (text/lifecycle only), so
buildSessionOptions now supplies the built-in tools itself via customTools +
noTools:'builtin': each createXToolDefinition(cwd) is wrapped to emit a label
from its params, then delegates unchanged. The builders bake in the real
config (withFileMutationQueue, truncation defaults), so behavior is preserved
— confirmed in pi's edit.js. Observation is fail-safe (emit in try/catch).

toolLabel + instrumentToolDefinition are pure/unit-tested (label mapping;
wrap delegates same args + result; observer error can't break a tool call).

Caveat: the customTools/noTools runtime wiring isn't covered by tests (they
stub createSession, bypassing buildSessionOptions) — needs a real cook run to
confirm the agent receives the instrumented tools and they emit live.

check + pi-actions tests green.

Co-Authored-By: Claude <noreply@anthropic.com>
Brownfield cook provisioned every slice's git worktree eagerly in
wireHandlers — N `git worktree add` + N recursive node_modules CoW copies
paid synchronously at startup before any slice fired.

- Move slice-worktree creation into resolveSliceCwd via idempotent
  ensureSliceWorktree, so a slice's worktree is materialized on first fire.
  A run touching 2 of 8 slices pays for 2 worktrees, not 8. Synchronous
  provisioning serializes concurrent fires on the JS thread, so parallel-policy
  worktree adds never overlap.
- Symlink each slice's node_modules to the parent worktree's single copy
  instead of CoW-copying per slice (SHAREABLE_TOP_LEVEL_ENTRIES). walkFiles
  already skips symlinks, so the shared tree is never re-walked during dep
  seeding, merge, or promotion. Other gitignored dirs still copy per slice.

Correctness-neutral: same worktrees/branches, just lazy; deps resolve through
the symlink. npm run verify green; adds symlink + idempotency unit tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@kostandinang kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from 363cba6 to 91a281d Compare June 17, 2026 11:59
Comment thread src/orchestrator/src/cook-cli.ts
Comment thread src/orchestrator/src/net-compiler.ts
kostandinang and others added 3 commits June 17, 2026 13:04
Opus 4.8 rejects thinking.type=enabled ("Use thinking.type.adaptive and
output_config.effort"). Swap the interviewer's enabled/budgetTokens block
for { type: 'adaptive' } + effort: 'medium'. Only interview.ts sets explicit
thinking provider options; secondary-chat and plan-architect don't.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The grid made the happy path legible; this does the unhappy path.

- events: slice + cook-done gain `reason?`.
- pi-actions: failed-slice emits carry a reason — evaluate-done maps failureKind
  ('infra error' / 'tests failed'); write-tests/write-code catches use
  'test authoring failed' / 'code authoring failed'.
- cook-cli: cook-done carries result.reason on halt.
- run-store: slice rows store `reason`; RunState.haltReason set from
  cook-done(ok:false), unset on completion.
- ink: a failed slice row shows its reason (red); a HaltSummary footer pins
  "✗ halted · <reason>" (truncated) when the run halts.

Retry-attempt counts ("3/3") DEFERRED — retryCount lives on the petri-net
tokens, not the action context; that's a separate slice.

Verified in an isolated worktree off the TUI branch: check clean; presenter +
pi-actions + cook tests green (141). The one full-suite failure is
build-boundary.test (client production-chunk manifest) — unrelated to this
server-only change; a worktree/symlinked-node_modules artifact, re-verify on a
normal checkout.

Co-Authored-By: Claude <noreply@anthropic.com>
Completes the retry half of failure legibility — without net-token plumbing.
A slice loops failed→running on retry, so the attempt count is derivable from
the slice-event stream the store already folds: queued→running is attempt 1, a
step change mid-run keeps it, failed→running bumps it. The grid shows "attempt N"
only for N≥2 (no clutter on the first run).

No engine/net changes: retryCount on the petri tokens isn't needed — the
observable slice transitions carry the same information.

check + presenter tests green (46).

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread src/orchestrator/src/app-probe.test.ts
The attempt count now reads "attempt 2/4" — n over the total attempts allowed
(retry budget + 1). cook-cli emits maxRetries on run-shape; the store derives
maxAttempts = maxRetries + 1; the grid shows it only once a slice has retried
(n ≥ 2), and falls back to a bare "attempt n" when the budget is unknown.

Denominator confirmed against the net: retryCount halts at `>= maxRetries`
(starts 0), so a slice gets maxRetries + 1 total attempts.

check + presenter tests green (49).

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread src/orchestrator/src/cow-copy.ts
Opus 4.8 otherwise splits a single `ask_question` into several parallel
partial calls (one option each, or leaked tool-call XML), which land as
`output-error` parts.

- interview.ts: set `disableParallelToolUse` so the model emits one tool
  call per step.
- app.ts: `stripFailedToolParts` drops echoed `output-error` tool parts
  before `validateUIMessages`. They carry `rawInput` but no `input`, so the
  output-error schema rejects them and bricks the whole follow-up turn; the
  retried call is what matters, the dead attempt carries no history value.
- app.test.ts: covers a follow-up turn whose history contains a failed
  attempt followed by a successful retry.

verify green (check + test + build).

🍳 Built with brunch
Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
Comment thread src/orchestrator/src/cook-cli.ts
Comment thread src/orchestrator/src/pi-actions.ts Outdated
Comment thread src/orchestrator/src/project-detect.test.ts
Evaluate gate (phantom failure fix):
- Add `absent` TestFailureKind for vitest "No test files found" (nothing built
  yet), distinct from a genuine `test` red ("Cannot find module" stays `test`).
- runVerification aggregate precedence infra > test > absent.
- evaluate-done no longer emits `failed` for the unbuilt greenfield gate — the
  slice stays running and flows to write-tests as the SAME attempt, killing the
  phantom n/max + ✗ NEEDS WORK on every clean slice.
- Strip `[agent-tail]` harness noise from captured runner output.

Promotion ref namespace + composer:
- run-refs.ts: centralize the run git-ref namespace (brunch/run/<id>,
  brunch/slice/<id>/<sid>), replacing duplicated cook/ + cook-slice/ strings;
  prune stale worktree registrations before `git worktree add`. On-disk
  .brunch/cook/runs/ path unchanged (FE-743 compat).
- run-artifact.ts: fold per-slice branches into the run branch via
  `git merge-tree` plumbing in dependency order — real conflicts surface
  fail-closed, one merge node per slice, `squash` granularity retained. Pure
  plumbing: no working tree / index / active branch touched. Not yet wired into
  cook-cli (needs a live-run check of the dependency-seed interaction).

🍳 Built with brunch

Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
Comment thread src/orchestrator/src/epic-sandbox-merge.ts
kostandinang and others added 2 commits June 17, 2026 17:29
Cook runs write per-slice worktrees under .brunch/cook/runs/<id>/worktree/,
which Vite's default file watcher picks up — every slice edit triggered a
page reload and tsconfig-change full reload in the host dev server.

- vite-client.ts: server.watch.ignored = ['**/.brunch/**']. Merged with
  Vite's built-in ignores (.git, node_modules, cacheDir), so run artifacts
  no longer drive HMR.

🍳 Built with brunch

Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
project-detect.test.ts had the "resolves the runner from workspace
packages in a monorepo" describe block verbatim twice — remove the copy.

🍳 Built with brunch

Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
Comment thread src/orchestrator/src/pi-actions.ts
Comment thread src/orchestrator/src/project-profile.test.ts
kostandinang and others added 6 commits June 17, 2026 17:41
Greenfield serves accepted --land but never landed: landing only runs in
the brownfield (codebase) promotion path. Greenfield has no repo branch to
land onto, so surface the no-op instead of silently swallowing the flag.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
write-tests and write-code emit a failed slice event before rethrowing on
runPi failure; verify-epic did not, so epic-verification failures left the
slice grid without a failed row or reason. Mirror the wrapper on the epic's
representative slice.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
mergeSlicesIntoEpicSandbox builds __epic__ trees via walkFiles, which skips
symlinks, so the parent's symlinked node_modules never reached the merged
cwd. verify-epic and probes then ran without resolvable deps even though
per-slice verification passed. Re-link the shareable entries after the merge.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
The `withTestDir relocates test targets…` describe block was copied verbatim
a second time. Remove the duplicate.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
Cook actions (write-tests/write-code/verify-epic) and the interview already
run on claude-opus-4-8; the architect default lagged at claude-opus-4-6.
Lift it to 4-8 so plan, cook, and interview paths share the latest Opus.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
… detail

Three serve/plan presentation fixes the dogfood run surfaced:

- Recipe banner: the header said "brunch plan" / "✓ plan" while the phase
  tracker right below it reads "recipe". Rename both to "recipe" so the
  planning surface speaks one brigade term (the CLI command stays `plan`;
  this is the phase label).
- out folder: plan-start showed the launch root (e.g. the repo dir). Emit
  the spec plan dir (.brunch/cook/specs/<id>) — the exact place plan.yaml
  lands.
- Verbose warnings: --verbose now appends a one-sentence plain-English
  account after each terse warning code (explainEmitterWarning /
  explainReconciliationWarning), so a reviewer needn't know the code
  vocabulary. Default verbosity keeps the terse line.

Also fix a pre-existing typecheck break in run-artifact.test.ts (mode
'codebase' → 'greenfield'; PlanMode is greenfield|brownfield) that was
reddening the gate.

🍳 Built with brunch

Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>
@kostandinang kostandinang changed the title FE-864: Orchestrator improvements FE-864: Orchestrator operational improvements (serve reliability + Opus 4.8) Jun 17, 2026
@kostandinang kostandinang changed the title FE-864: Orchestrator operational improvements (serve reliability + Opus 4.8) FE-864: Orchestrator operational improvements Jun 17, 2026
@kostandinang kostandinang changed the title FE-864: Orchestrator operational improvements FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8) Jun 17, 2026
kostandinang and others added 2 commits June 17, 2026 23:08
The cook dogfood run halted after the graph-canvas test-author hit the
600s PI_TIMEOUT_MS. It wasn't hung — it was continuously writing files,
reasoning, and re-running vitest right up to the wall (the heaviest slice
in the plan, the React Flow canvas composing every upstream graph slice).
A single fatal pi timeout there halted the whole run; nothing was promoted.

Reframe PI_TIMEOUT_MS as an idle deadline rather than a wall-clock cap.
The timer is re-armed (armIdleTimer) on every session event — text, tool
call, tool-execution progress, thinking — so the budget bounds dead air,
not total work. A genuinely hung session (no events for the budget) still
aborts and rejects exactly as before; a heavy-but-progressing slice no
longer trips it.

Covered by a new test that emits non-text activity every 15ms past a 40ms
budget and asserts the session survives (a wall-clock cap would abort it);
the existing hang/setup-timeout tests still confirm true dead air trips it.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
Formatter-only churn surfaced by npm run fix (oxfmt) — long call/import
lines wrapped to print width. No logic changes.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>
},
];
}
input.emit?.({ kind: 'slice', id: sliceId, epicId, status: 'failed' });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify failures omit slice reason

Low Severity

Net-level verification emits slice failed events without a reason, while evaluate-done and writer abort paths set reasons such as tests failed or infra error. Failed rows from the test-runner transition therefore show no failure reason in the slice grid despite the presenter supporting reason.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3f6ca8d. Configure here.

…acle

The cook dogfood showed every test-writer agent running the same dance:
write tests → confirm red → scaffold a reference implementation → confirm
green → delete it. That duplicates the code-writer that runs right after,
inside the most expensive phase (~120-250s/slice authoring dominates the
critical path), yet the prompt never asked for it — it's emergent from the
heavy 'your tests are the sole oracle' pressure, and the agents satisfied
'write tests only' by deleting the throwaway impl afterward.

Name the pattern and ban it: green-confidence comes from the documented
contract and reading the surrounding code, not from implementing the slice
to watch it pass. Red-for-the-right-reason (run the tests, see absent
behavior) stays. verify-epic shares this prompt, so the epic-integration
authoring gate benefits too.

Prompt-level nudge against emergent behavior, not a hard guarantee — the
proof is the next run's authoring times.

🍳 Built with brunch

Co-Authored-By: Opus 4.8 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7a7e1e5. Configure here.

this.clock = createElapsedClock(now);
this.state = { command, phase: 'prep', lines: [], pending: [] };
this.state = { command, phase: 'prep', lines: [], pending: [], epics: [], slices: [], runStart: now() };
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ink footer timer ignores cook-start

Medium Severity

The Ink footer elapsed clock uses RunStore’s constructor runStart, while action log lines use ElapsedClock seeded from cook-start. On brunch serve, recipe/plan time is included in the footer but not in per-action timestamps, so two timers diverge in one screen.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7a7e1e5. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant