Skip to content
10 changes: 6 additions & 4 deletions memory/PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen
**Full cook orchestrator — Arc 1 (feature delivery; stacks on FE-843, ships without the semantic stack):**

1. `agent-extension-host` — **(contract landed — FE-867)** the pi harness as a dual-mode (`elicit`/`execute`) extension host; cook capabilities register as `execute`-mode plugins. **Bases the Arc-1 linear stack** (2026-06-15 decision): the whole arc stacks on it, coordinated with the unpublished pi-harness thread (which owns the core). Logically it only gates the dispatch-seam frontier (`integration-oracle`), so serializing the seam-independent infra (2–5) behind it is a deliberate coupling of Arc 1 to that coordination, not a hard dependency. Sits over the FE-841 core.
2. `brunch-detect` — **(slice 1 landed — FE-871, `detectProfile`)** resolve a registry profile id from manifest/lockfile evidence at plan time; brownfield-only front of the chain. Slice 2 (wire `detected` into the emitter) remains. *(seam-independent)*
2. `brunch-detect` — **(done — FE-871)** resolve a registry profile id from manifest/lockfile evidence at plan time; brownfield-only front of the chain, now wired into the emitter (slice 2). *(seam-independent)*
3. `harness-dep-install` — **dependency-delta capture + install-failure classification** (the install *action* is agent-native via `bash` + FE-843 conventions; this owns lockfile capture for promotion + the fail/infra split).
4. `dogfood-spike` (ln-spike) — run the full chain on one real brunch feature before committing `integration-oracle`; surfaces the reachability mechanism, dep-install, orientation depth, and brownfield plan-shape cheaply.
5. `app-runtime-probe` — build + boot + exercise the host app; the concrete reachability mechanism `integration-oracle` depends on (without it, "reachable" collapses back to "a test that imports the module").
Expand Down Expand Up @@ -406,11 +406,11 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen
- **Name:** Brunch toolchain detection — read the project toolchain from the repo
- **Linear:** FE-871 · branch `ka/fe-871-brunch-detect` (stacked on FE-867)
- **Kind:** bounded feature
- **Status:** slice 1 done (FE-871) — `detectProfile` / `project-detect.ts`: the evidence-first manifest/lockfile id check (already the cheap "which lockfile is present" check, mapping only to registry ids; known-unsupported stacks return a loud `{detected:false}`). **Slice 2 remains:** wire `detected` into the `plan-emitter` selection chain (brownfield front) + a greenfield-protection test — `detected` is **not yet** wired into `plan-emitter.ts`. Stacked on `agent-extension-host`.
- **Objective:** Resolve a registry `ProfileId` at **plan time** from the repo's manifest/lockfile evidence — the cheap "which lockfile/manifest is present" check, mapping only to ids already in the FE-843 registry. It is **not** a language-detection engine: known-unsupported stacks (`pyproject.toml`, `go.mod`, …) return a loud `{detected:false}` reason, never a guessed profile. Brownfield-only front of the selection chain (`flag ≫ detected ≫ spec ≫ architect ≫ bun`); the resolved id is stamped into `plan.yaml` so `brunch cook` runs the same toolchain. Greenfield never detects (empty worktree). Resolves toolchain **identity** only — real file paths / existing wiring / `writes` reconciliation is cook-time agent grounding, out of scope here.
- **Status:** done (FE-871). Slice 1 — `detectProfile(repoDir)` / `project-detect.ts`: a pure, evidence-first detector mapping manifests/lockfiles to a registry `ProfileId` (bun lockfile → bun; deno config → deno; `package.json` vitest/jest/none → node-vitest/node-jest/node-test). One clear supported signal resolves; ambiguous evidence (both vitest **and** jest declared) and any repo with no JS/TS evidence return a loud `{detected:false, reason}` via one catch-all rather than silently defaulting to bun — the cheap "which lockfile is present" check, not a language-detection engine (no per-stack Python/Go branches; the catch-all message is already actionable). Slice 2 — `detected` is wired into the `plan-emitter` selection chain as the brownfield front (`flag ≫ detected (brownfield) ≫ spec ≫ architect-classified ≫ bun`) via `resolveEmittedProfile`; a loud detection failure throws rather than silently falling to bun (falling through to an explicit spec/architect choice first). Greenfield (or brownfield without a `repoDir`) keeps the unchanged FE-843 chain — the greenfield no-op. `repoDir` threads CLI launch cwd → `runPlan` → `emitPlanFromSnapshot`; an injectable `detect` seam keeps the emitter tests hermetic. Slice 3 — `detectTestDir(repoDir)` co-locates generated tests where the brownfield repo already keeps its own: detection picks the *runner* (profile), this picks the *path*. A profile's default test directory (`tests/{id}.test.ts`) can fall outside a host repo whose vitest `include` is narrowed (e.g. `src/**`), so the chosen path is unrunnable — vitest reports "No test files found" for an explicitly-named file (observed in a real brownfield cook). Rather than parse the runner's executable-TS config, it samples existing `*.test.*`/`*.spec.*` files (zero-dep bounded `fs` walk, skipping `node_modules`/build dirs) and returns the dominant directory; `withTestDir(toolchain, dir)` relocates the targets while preserving the filename convention. Brownfield-only; `null` (no existing tests) keeps the profile default; greenfield never relocates. Slice 4 — monorepo hardening: `detectTestDir` returns the dominant *full* directory (not just the top segment) so a package-rooted include glob still covers the path; `detectProfile` widens runner detection to declared workspace packages (npm/yarn `workspaces`, pnpm `pnpm-workspace.yaml`; literal + single-level `dir/*` globs) **only when the root declares no runner**, scoped to declared workspaces so a stray nested project (docs prototype, example app) can't poison detection — a root runner still wins without scanning, and workspaces collectively declaring both vitest+jest stays loudly ambiguous. Stacked on `agent-extension-host`.
- **Objective:** Resolve a registry `ProfileId` at **plan time** from the repo's manifest/lockfile evidence — the cheap "which lockfile/manifest is present" check, mapping only to ids already in the FE-843 registry. It is **not** a language-detection engine: anything without a single clear supported signal (ambiguous JS runners, or non-JS stacks like Python/Go) returns a loud `{detected:false}` reason via one actionable catch-all, never a guessed profile. Brownfield-only front of the selection chain (`flag ≫ detected ≫ spec ≫ architect ≫ bun`); the resolved id is stamped into `plan.yaml` so `brunch cook` runs the same toolchain. Greenfield never detects (empty worktree). Resolves toolchain **identity** only — real file paths / existing wiring / `writes` reconciliation is cook-time agent grounding, out of scope here.
- **Why now / unlocks:** The "no manual steps" goal requires reading the real toolchain rather than inferring from spec prose or a `--profile` flag — and it must happen at plan time, because the deterministic test runner reads the stamped `plan.profile` with **no agent in the loop** (`cook-cli.ts`, `pi-actions.ts`), so a wrong default runs the wrong test command with no diagnostic. The cook agent's `read`/`bash` cannot substitute. FE-843 built the registry but deferred detection; this closes that gap.
- **Acceptance:** (1) detection maps a real repo to a registry profile id from manifest/lockfile evidence *(slice 1, done)*; (2) brownfield cook/plan resolves toolchain via detection at the front of the FE-843 chain (`--profile` still overrides) *(slice 2)*; (3) greenfield resolution is unchanged (no detection input); (4) ambiguous/unknown repo fails with an actionable message, not a silent default *(slice 1, done)*; (5) the 3 reference fixtures + greenfield smoke score identically before/after.
- **Verification:** detector unit tests *(slice 1, done — per-stack fixtures + loud `{detected:false}`)*; slice 2: resolution-chain precedence tests (detect vs flag vs spec) + greenfield no-op / before-after-identical test.
- **Verification:** detector unit tests *(slice 1, done — per-stack fixtures + loud `{detected:false}`)*; slice 2: resolution-chain precedence tests (detect vs flag vs spec) + greenfield no-op / before-after-identical test; slice 3: `detectTestDir` clustering/skip/null tests + `withTestDir` relocation tests + emitter tests asserting brownfield targets follow the detected dir while greenfield keeps the profile default; slice 4: full-dir/monorepo `detectTestDir` tests + workspace runner-detection tests (npm/yarn/pnpm, root-wins, literal dir, cross-workspace ambiguity).
- **Depends on:** `toolchain-profile-expansion` (FE-843).
- **Traceability:** Requirements 46–50; refines I130-K; greenfield-protecting invariant (new — record in SPEC via ln-sync). **D160-K boundary:** detection is plan-time profile-*id* resolution (an input to authoring), not architect host-introspection — D160-K constrains the architect/authoring stage, not profile resolution, so `brunch-detect` needs no D160-K amendment.
- **Design docs:** `docs/design/orchestrator.md`.
Expand All @@ -436,6 +436,7 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen
- **Kind:** structural
- **Status:** not-started (drafted 2026-06-15) — Arc 1; the concrete mechanism behind `integration-oracle`'s reachability claim. **Scope/feasibility via `dogfood-spike` first.**
- **Objective:** Provide a harness that builds the host application, boots it, and exercises the cooked feature to confirm it is actually reachable in the running app — not merely unit-test-green. Mechanism beyond the test runner: app-boot + a runtime probe (dev-server boot + HTTP/CDP/Playwright-style check), toolchain-derived from the `ProjectProfile`. Mode-aware: brownfield boots the real host; greenfield boots the self-composed epic.
- **Agent-native action vs harness-owned verification:** the frontier's value is the **independent, deterministic assertion** the cook agent cannot shortcut or self-report — not the boot action (the agent already has `bash` and can start a dev server / curl it). FE-800's orphan problem is precisely that the agent's self-report can't be trusted, so what this frontier owns is a read-only probe result outside the agent's authorship (the same discipline that keeps `evaluate-done` read-only at `pi-actions.ts:70`). The **boot mechanics may lean on agent `bash`** (start dev server, hit an endpoint) rather than a bespoke per-stack boot engine; the deterministic, unshortcuttable *check* of the result is the part the harness must own.
- **Why now / unlocks:** `integration-oracle` asserts "feature reachable in the running app," but verification today only runs the test runner in the worktree. Without an app-boot probe, "reachable" degrades to "a test imports the module" and the orphan problem (FE-800) survives. This is the load-bearing reachability mechanism; `integration-oracle` depends on it. The hidden heavy lift inside Arc 1 — validate the mechanism with `dogfood-spike` before committing.
- **Acceptance:** (1) the probe builds + boots the host app from the worktree using the resolved toolchain; (2) it exercises the cooked feature and returns a structured reachable / not-reachable result; (3) the probe result is the evidence `integration-oracle` gates on; (4) brownfield boots the real host, greenfield boots the self-composed epic; (5) infra failure (build/boot broke) is distinguishable from feature-absent (not reachable).
- **Verification:** probe-harness integration test (seeded app + cooked feature → reachable); orphan-replay test (feature module present but unwired → not-reachable, replaying the `spatial_graph_layout` regression); toolchain-derived boot tests; infra-failure-vs-not-reachable split test.
Expand All @@ -451,6 +452,7 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen
- **Status:** not-started (drafted 2026-06-15) — Arc 1; promotes the FE-800 integration-blind follow-on to a frontier.
- **Objective:** Make a cooked feature real and reachable in the host, not orphaned. Three parts: (a) the architect emits a **generic integration/wiring slice** ("wire feature into host") rather than only FE-829's per-epic integration-*test* seam; (b) **cook-time grounding** — the cook agent resolves the real wiring by reading the worktree (no host introspection at plan time, D160-K intact); (c) an **integration oracle** in the FE-738 semantic lane asserts product reachability **via `app-runtime-probe`** (build + boot + exercise the host app — not merely test-runner-green) — brownfield: feature exists/reachable in the running app; greenfield: the epic self-composes (the `__epic__` merge + integration test). Reachability definition forks on `plan.mode`.
- **Why now / unlocks:** The first brownfield cook produced orphan modules that passed criteria without existing in the running app (FE-800 follow-on, 2026-06-04). Reachability is the external reality check that turns "executes a plan" into "ships a feature." Builds on harness fidelity (FE-813 — the harness actually runs the targets) and FE-829 integration seams.
- **Agent-native action vs harness-owned verification:** the wiring *action* (part b) is agent-native — the cook agent reads the worktree and edits the wiring itself; the frontier does **not** build a wiring engine. What it owns is part (c): an **oracle the agent cannot author or shortcut**, asserting product reachability via `app-runtime-probe`'s independent result. The orphan problem is unsolvable by self-report, so the oracle's value is its independence (same read-only discipline as `evaluate-done`, `pi-actions.ts:70`), not the doing.
- **Cook-time grounding decision (settled 2026-06-15):** planning stays host-blind; the cook agent grounds against the real repo. This **softens FE-829 slice-4A `writes` single-writer ownership to *advisory in brownfield only*** (agent reconciles paths against the real layout); greenfield keeps `writes` authoritative (parallel race-safety + eval gate depend on it). Needs a **D160-K amendment + a new grounding decision** recorded in SPEC via ln-sync.
- **Acceptance:** (1) architect emits a generic wiring slice for feature epics; (2) cook agent resolves real wiring by reading the worktree; (3) integration oracle gates completion on product reachability, mode-forked (brownfield reachable-in-app / greenfield self-compose); (4) the brownfield orphan-module regression (`spatial_graph_layout`) is caught; (5) greenfield behavior unchanged — 3 reference fixtures + greenfield smoke score identically; (6) `writes` advisory in brownfield, authoritative in greenfield (contract forks on `plan.mode`); (7) the wiring agent is an `execute`-mode plugin on `agent-extension-host`, not a bespoke `pi` call.
- **Verification:** brownfield smoke asserting reachability (feature present in running app), replaying the orphan regression; greenfield self-compose oracle tests; mode-fork contract tests on `writes`/`checkPlan`; semantic-lane oracle adapter tests.
Expand Down
102 changes: 102 additions & 0 deletions src/orchestrator/src/plan-emitter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import { emitPlanFromSnapshot, emitterWarningCategory, formatEmitterWarning } fr
import { evaluatePlanShape } from './plan-eval.js';
import { loadPlan } from './plan-loader.js';
import type { CompletedSpecSnapshot } from './plan-projection.js';
import type { ProfileDetection } from './project-detect.js';

const snapshot: CompletedSpecSnapshot = {
requirements: [
Expand Down Expand Up @@ -305,6 +306,107 @@ describe('emitPlanFromSnapshot', () => {
expect(result.plan.profile).toBe('node-vitest');
});

it('brownfield detection resolves the profile and beats the spec profile', async () => {
const detect = (): ProfileDetection => ({ detected: true, profile: 'node-vitest', evidence: 'stub' });
const result = await emitPlanFromSnapshot(
{ ...snapshot, mode: 'brownfield', profile: 'brunch' },
{ runModel: draftModel(coveringDraft()), repoDir: '/repo', detect },
);
expect(result.plan.profile).toBe('node-vitest');
});

it('brownfield co-locates generated tests in the repo\u2019s own test directory', async () => {
// node-vitest defaults to tests/{id}.test.ts, but a repo whose vitest
// include is narrowed to src/** can\u2019t run that path. detectTestDir reports
// where the repo already keeps tests; the emitted targets follow it.
const detect = (): ProfileDetection => ({ detected: true, profile: 'node-vitest', evidence: 'stub' });
const result = await emitPlanFromSnapshot(
{ ...snapshot, mode: 'brownfield' },
{
runModel: draftModel(coveringDraft()),
repoDir: '/repo',
detect,
detectTestDir: () => 'src',
},
);
expect(result.plan.profile).toBe('node-vitest');
for (const slice of result.plan.slices) {
expect(slice.verification).toEqual([{ kind: 'unit-test', target: `src/${slice.id}.test.ts` }]);
}
});

it('brownfield keeps the profile default when the repo has no tests to learn from', async () => {
const detect = (): ProfileDetection => ({ detected: true, profile: 'node-vitest', evidence: 'stub' });
const result = await emitPlanFromSnapshot(
{ ...snapshot, mode: 'brownfield' },
{
runModel: draftModel(coveringDraft()),
repoDir: '/repo',
detect,
detectTestDir: () => null,
},
);
for (const slice of result.plan.slices) {
expect(slice.verification).toEqual([{ kind: 'unit-test', target: `tests/${slice.id}.test.ts` }]);
}
});

it('greenfield never relocates tests even with a repoDir (probes invariant)', async () => {
const result = await emitPlanFromSnapshot(snapshot, {
runModel: draftModel(coveringDraft()),
profile: 'node-vitest',
repoDir: '/repo',
detectTestDir: () => {
throw new Error('greenfield must not detect a test dir');
},
});
for (const slice of result.plan.slices) {
expect(slice.verification).toEqual([{ kind: 'unit-test', target: `tests/${slice.id}.test.ts` }]);
}
});

it('the --profile flag beats detection and skips reading the repo', async () => {
const detect = (): ProfileDetection => {
throw new Error('detect should not run when --profile is set');
};
const result = await emitPlanFromSnapshot(
{ ...snapshot, mode: 'brownfield' },
{ runModel: draftModel(coveringDraft()), profile: 'deno', repoDir: '/repo', detect },
);
expect(result.plan.profile).toBe('deno');
});

it('a failed detection falls through to an explicit spec profile, not bun', async () => {
const detect = (): ProfileDetection => ({ detected: false, reason: 'no recognizable manifest' });
const result = await emitPlanFromSnapshot(
{ ...snapshot, mode: 'brownfield', profile: 'brunch' },
{ runModel: draftModel(coveringDraft()), repoDir: '/repo', detect },
);
expect(result.plan.profile).toBe('brunch');
});

it('a failed detection with no spec/architect signal fails loudly instead of defaulting to bun', async () => {
const detect = (): ProfileDetection => ({ detected: false, reason: 'no recognizable manifest' });
await expect(
emitPlanFromSnapshot(
{ ...snapshot, mode: 'brownfield' },
{ runModel: draftModel(coveringDraft()), repoDir: '/repo', detect },
),
).rejects.toThrow(/brunch detect/);
});

it('greenfield never detects even when a repoDir is supplied (protecting invariant)', async () => {
const detect = (): ProfileDetection => {
throw new Error('greenfield must not detect');
};
const result = await emitPlanFromSnapshot(snapshot, {
runModel: draftModel(coveringDraft()),
repoDir: '/repo',
detect,
});
expect(result.plan.profile).toBe('bun');
});

it('round-trips the emitted plan (incl. writes) through loadPlan after YAML serialization', async () => {
const result = await emitPlanFromSnapshot(snapshot, { runModel: draftModel(coveringDraft()) });

Expand Down
Loading
Loading