Skip to content
16 changes: 16 additions & 0 deletions memory/PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen

### Parallel / Low-conflict

- `cook-agent-confinement` — the sandbox is the agent's world: one `ConfinementPolicy` (derived readRoots/writeRoots, default-deny) compiled into file-tool guards, seatbelt/bwrap command wrapping, and the test-runner spawn; stacks on FE-841's in-process SDK seam, independent of semantic-stack work.
- `first-run-provider-setup` — provider/key UX and runtime seam can progress independently of semantic-stack work.
- `workspace-gitignore-assist` — small workspace hygiene surface with low overlap.
- `productized-web-research` — waits on prompt/context scenario substrate for probe quality, but can remain separate from semantic schema work.
Expand Down Expand Up @@ -190,6 +191,20 @@ The May 2026 intent-spec, multi-chat, changeset-ledger, prompt/context, and agen
- **Lexicon:** `evaluator` = read-only observer of verification results, distinct from the test-runner / code-writer; ties to `ln-oracles` "requisite variety."
- **Design docs:** `docs/design/orchestrator.md`; SPEC §Verification Design.

### cook-agent-confinement

- **Name:** Cook agent confinement — OS-level restriction of spawned agent children to the run sandbox
- **Linear:** FE-853
- **Kind:** hardening
- **Status:** building on `ka/fe-853-cook-agent-confinement` (stacked on FE-841). Slice 1 (file-tool guards) **done**: `sandbox-guard.ts` — `createConfinedFileOperations` (realpath-aware containment; absolute, `../`, and symlink escapes all refused) + confined tool definitions shadowing the built-ins via `customTools`; wired in `buildSessionOptions`; real-LLM smoke green. Slice 2 (seatbelt bash hook, exclude-mode first cut) **done**. Slice 3 (limit-mode model) **done**: `ConfinementPolicy { sandboxRoot, readRoots, writeRoots, network }` + `deriveConfinementPolicy` (env-derived toolchain roots, root-`/` guarded) + `compileSeatbeltProfile` (default-deny **writes** globally + re-grant; deny **read-data of the user-data zones** — `$HOME`, `/Users`, `/var/folders`, `/tmp`, `/Volumes` — re-granting sandbox + caches, leaving OS dirs readable so the dyld loader doesn't SIGABRT) + TMPDIR redirected into the sandbox + `confineTestCommand` wrapping the `runTest` spawn; real-LLM smoke green under full confinement. SandboxGuard seam (ln-review #2 deepening) **done**: a per-run `createSandboxGuard(sandboxDir,{backend?,platform?})` → `SandboxGuard { backend, enforcing, policy, bashHook, confineTest(argv), preflight() }` over a private `ConfinementBackend { id, enforces, wrap(NormalizedRequest) }` strategy (seatbelt | none) with `selectBackend` as the sole platform resolution; `createSeatbeltSpawnHook`/`confineTestCommand` deleted, `pi-actions` callers migrated. Adding bwrap (slice 5) is now one backend factory + one `selectBackend` line, zero call-site change. Slice 4 (fail-closed preflight) **done**: `decidePreflight` (pure) + `runConfinementPreflight` (runs the toolchain probe both confined and unconfined), `Toolchain.probeCommand()` (`bun --version` / `npx vitest --version`), a `--confine=on|off` cook flag (default `on`), and `runCook` bootstrap wiring that refuses to start when the toolchain runs unconfined but fails confined (misconfigured profile → exit 1, escape hatch `--confine=off`), warns + proceeds on hosts with no backend, and prints `confine <backend>` on the banner. Verified real probes (`bun`/`vitest --version`) proceed under the seatbelt profile (no false-refusal). Slice 5 (Linux bwrap backend) **code-complete**: `compileBwrapArgs` (limit-mode binds — `--ro-bind / /` + `--dev`/`--proc`/`--tmpfs /tmp`, hide `$HOME`/`/home`/`/root`/`/mnt`/`/media` behind tmpfs, re-grant toolchain `readRoots`, bind `writeRoots` writable last, `--unshare-net` only when `network:false`, `--chdir` sandbox) + a `bwrapBackend` wired into `selectBackend` (`linux → bwrap`). Argv synthesis is **unit-verified on macOS via backend injection**, but bwrap *enforcement* is **UNVERIFIED — manual debt**: must smoke-test a real cook on a Linux host (with `bwrap` installed) before relying on it. Note: on Linux without `bwrap` installed, the fail-closed preflight will refuse (`--confine=off` to bypass) rather than run unconfined. **FE-853 branch-complete (slices 1–5)** pending the Linux bwrap smoke. Built in isolated worktree `../brunch-fe853` (primary checkout was being thrashed by a concurrent PR-194 automation).
- **Objective:** The sandbox is the agent's world (2026-06-11 generalization: **limit, not exclude**). One `ConfinementPolicy` per run — `{ sandboxRoot, readRoots, writeRoots, network }` — is the single source of truth, compiled into each enforcement layer: (a) **file tools** — path-guarded operations, sandbox-only, pure TS, cross-platform (the tightest layer); (b) **command confinement** — the policy compiled to a platform profile wrapping every agent `bash` command (SDK `spawnHook`) *and* the `runTest` children: macOS seatbelt with **global default-deny writes** + re-grant, and **read-data denied across the user-data zones** (`$HOME`, `/Users`, `/var/folders`, `/tmp`, `/Volumes`) + re-grant of the policy roots — OS dirs stay readable so the dyld loader works (denying `/` SIGABRTs the loader); Linux bwrap follow-on (slice 5); (c) future container compilation (deferred sandcastle path) reuses the same policy. `readRoots` are **derived, not hardcoded**: a static per-OS base (`/usr`, `/bin`, `/System`, `/Library`, `/etc`, `/dev`, `/opt/homebrew`, …) plus toolchain roots resolved from the live environment (`process.execPath` install root, PATH resolution of `bun`/`npx`/`git`, cache dirs). The rest of `$HOME` — TCC folders, `~/.ssh`, everything — simply isn't granted. A **preflight probe** (slice 4) runs the toolchain's canonical commands under the compiled profile before the fleet starts and fails closed with an explicit escape hatch (`--confine=standard|off`) rather than degrading silently.
- **Why now / unlocks:** 2026-06-11 field diagnosis: macOS TCC prompts ("Terminal wants to access Apple Music / Photos / Desktop") fired on a fresh machine during a brunch cook demo run. A fully-audited 16-agent instrumented run found **zero** sandbox escapes — agents behaved — but the guarantee is currently statistical, not structural: every agent holds unrestricted `read,write,edit,bash`. Confinement makes the guarantee structural, removes a scary first-run permission dialog from the demo/distribution path, and is the lightweight native step below the deferred sandcastle/container-isolation trigger (criterion (c) in `cook-codebase-mode`'s future-direction note).
- **Acceptance:** (1) On macOS, `runPi` and `runTest` children run under a confinement profile; a probe child attempting `ls ~/Desktop`, `find ~`, or a write outside the run sandbox fails with a permission error. (2) A full `layered-todo` greenfield cook run (serial + parallel) completes green under confinement — no functional regression from toolchain reads, `~/.pi` auth, or network. (3) Brownfield mode still works (worktree + CoW-copied `node_modules` are inside the confined write root). (4) Non-macOS: confinement degrades to a documented no-op; cook still runs. (5) Candidate invariant promoted on build: *cook agent children cannot read TCC-protected user folders or write outside their run sandbox.*
- **Verification:** Unit test on the profile/arg builder (pure); integration test spawning a confined child that attempts the three escape classes and asserting EPERM; existing cook fixture smoke green under confinement; fresh-machine "no TCC prompt" check stays a manual outer-loop oracle (note in `docs/praxis/manual-testing.md` if it earns a row).
- **Depends on:** `pi-sdk-embedding` (FE-841, PR #194) — the in-process SDK is the seam all three confinement points attach to. Touches the same `pi-actions.ts` file as FE-829's toolchain work — stack on top of FE-841.
- **Lexicon:** *confinement* = OS-level child-process restriction; *sandbox* keeps its existing meaning (the run worktree dir, `sandboxDir`).
- **Design docs:** `docs/design/orchestrator.md` (agent dispatch section).

### petri-petrinaut-semantics

- **Name:** Petri-net semantic alignment for Petrinaut visualization
Expand Down Expand Up @@ -735,6 +750,7 @@ agent-fixture-substrate + intent-graph-semantics
└──→ absorbs / reshapes progressive detail / recursive deflation

TRACK E — Low-conflict parallel work
cook-agent-confinement (hardening; attaches to FE-841's in-process SDK tool seam — stack atop PR #194)
first-run-provider-setup
workspace-gitignore-assist
productized-web-research
Expand Down
10 changes: 10 additions & 0 deletions src/orchestrator/src/cook-cli.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,16 @@ describe('parseCookArgs', () => {
expect(opts.maxRetries).toBe(5);
});

it('confines by default and accepts --confine=off as the escape hatch', () => {
expect(parseCookArgs(['./f']).confine).toBe('on');
expect(parseCookArgs(['./f', '--confine=off']).confine).toBe('off');
expect(parseCookArgs(['./f', '--confine=on']).confine).toBe('on');
});

it('rejects an unknown --confine value', () => {
expect(() => parseCookArgs(['./f', '--confine=loose'])).toThrow(/--confine/);
});

it('defaults dir to the launch cwd when no positional dir is given', () => {
const expected = resolve(process.env.BRUNCH_LAUNCH_CWD || process.cwd());
expect(parseCookArgs([]).dir).toBe(expected);
Expand Down
25 changes: 25 additions & 0 deletions src/orchestrator/src/cook-cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import { createPiActions } from './pi-actions.js';
import { loadPlan } from './plan-loader.js';
import { resolveToolchain } from './project-profile.js';
import { promoteGreenfieldRun } from './promote-run.js';
import { type ConfineMode, createSandboxGuard, runConfinementPreflight } from './sandbox-guard.js';
import { parseSpecId, resolveLatestSpecPlanPath, specPlanPath, specsRootDir } from './spec-plan-paths.js';
import { ToolchainTestRunner } from './test-runner.js';
import type { Plan, PlanMode } from './types.js';
Expand Down Expand Up @@ -47,6 +48,8 @@ export type CookOptions = {
outDir?: string;
/** Allow promoting into a non-empty target (otherwise refused). */
force: boolean;
/** OS-level agent confinement: `on` (default, fail-closed) or `off` (escape hatch). */
confine: ConfineMode;
/**
* Explicit specification id whose emitted plan (under
* `<dir>/.brunch/cook/specs/<id>/plan.yaml`) should be cooked.
Expand All @@ -69,6 +72,7 @@ export function parseCookArgs(args: string[]): CookOptions {
let specId: number | undefined;
let outDir: string | undefined;
let force = false;
let confine: ConfineMode = 'on';
let sawNoOpen = false;
let sawUrl = false;

Expand Down Expand Up @@ -114,6 +118,12 @@ export function parseCookArgs(args: string[]): CookOptions {
outDir = arg.slice('--out='.length);
} else if (arg === '--force') {
force = true;
} else if (arg.startsWith('--confine=')) {
const val = arg.split('=')[1]!;
if (val !== 'on' && val !== 'off') {
throw new Error(`Unknown --confine value: ${val}. Use on or off.`);
}
confine = val;
} else if (arg === '--verbose' || arg === '-v') {
verbose = true;
} else if (!arg.startsWith('-')) {
Expand Down Expand Up @@ -150,6 +160,7 @@ export function parseCookArgs(args: string[]): CookOptions {
petrinautUrl,
petrinautOpen,
force,
confine,
...(outDir !== undefined ? { outDir: resolve(launchCwd, outDir) } : {}),
...(specId !== undefined ? { specId } : {}),
};
Expand Down Expand Up @@ -465,6 +476,20 @@ export async function runCook(opts: CookOptions): Promise<void> {
const toolchain = resolveToolchain(plan.profile);
const testRunner = new ToolchainTestRunner(toolchain);

// Fail-closed agent confinement: refuse to start the fleet if the toolchain
Comment thread
kostandinang marked this conversation as resolved.
// works unconfined but not under the sandbox profile (escape hatch: --confine=off).
const guard = createSandboxGuard(sandboxDir);
const preflight = await runConfinementPreflight(guard, toolchain.probeCommand(), opts.confine);
if (preflight.action === 'refuse') {
console.error(preflight.reason);
process.exit(1);
}
if (preflight.action === 'proceed-degraded') {
console.error(` ⚠ ${preflight.warning}`);
}
console.error(` confine ${opts.confine === 'off' ? 'off' : guard.backend}`);
console.error('');

Comment thread
kostandinang marked this conversation as resolved.
const engine = createOrchestrator(opts.policy);

const runStart = Date.now();
Expand Down
24 changes: 24 additions & 0 deletions src/orchestrator/src/pi-actions.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,30 @@ describe('runPi drives an in-process pi session (no subprocess)', () => {
}
});

it('shadows the built-in file tools with sandbox-confined definitions (FE-853)', async () => {
process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session';
const sandboxDir = mkdtempSync(join(tmpdir(), 'brunch-runpi-'));
try {
const fake = makeFakeSession({ emit: 'ok' });
let capturedCustomTools: Array<{ name: string }> | undefined;
const createSession = (async (options: { customTools?: Array<{ name: string }> }) => {
capturedCustomTools = options.customTools;
return { session: fake.session };
}) as unknown as SessionFactory;

await runPi(baseOpts(sandboxDir, 'read,write,edit,bash'), { createSession });

// Same names as the built-ins, so the SDK registry overrides them and the
// per-action allowlist (I126-K) keeps filtering both the same way. On
// macOS the bash tool is shadowed too (seatbelt spawn hook).
const expected =
process.platform === 'darwin' ? ['bash', 'edit', 'read', 'write'] : ['edit', 'read', 'write'];
expect(capturedCustomTools?.map((t) => t.name).sort()).toEqual(expected);
} finally {
rmSync(sandboxDir, { recursive: true, force: true });
}
});

it('captures agent output without writing it to process.stdout', async () => {
process.env.ANTHROPIC_API_KEY ??= 'test-key-unused-fake-session';
const sandboxDir = mkdtempSync(join(tmpdir(), 'brunch-runpi-'));
Expand Down
10 changes: 8 additions & 2 deletions src/orchestrator/src/pi-actions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import {

import { defaultToolchain, type Toolchain } from './project-profile.js';
import { createReport } from './report-helpers.js';
import { createConfinedTools, createSandboxGuard } from './sandbox-guard.js';
import { sliceLabel } from './slice-label.js';
import type { ActionContext, ActionHandlers, Epic, Slice } from './types.js';

Expand Down Expand Up @@ -133,6 +134,10 @@ async function buildSessionOptions(opts: RunPiOpts, isolatedDir: string): Promis
modelRegistry,
resourceLoader,
tools: opts.tools.split(','),
// Confined read/write/edit (+ seatbelt bash on macOS) shadow the built-ins
// so agent access cannot leave the sandbox (FE-853); the allowlist above
// filters both the same way.
customTools: createConfinedTools(opts.sandboxDir),
sessionManager: SessionManager.inMemory(opts.sandboxDir),
settingsManager: SettingsManager.inMemory({ compaction: { enabled: false } }),
};
Expand Down Expand Up @@ -263,8 +268,9 @@ export async function evaluateVerificationTargets(

async function runTest(toolchain: Toolchain, target: string, sandboxDir: string): Promise<boolean> {
return new Promise<boolean>((resolve) => {
const [command, ...args] = toolchain.testCommand(target);
const child = spawn(command!, args, {
// Same confinement as agent bash: the test runner is a spawned child too.
const { command, args } = createSandboxGuard(sandboxDir).confineTest(toolchain.testCommand(target));
const child = spawn(command, args, {
cwd: sandboxDir,
stdio: ['ignore', 'pipe', 'pipe'],
});
Expand Down
10 changes: 10 additions & 0 deletions src/orchestrator/src/project-profile.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,16 @@ describe('toolchain test command', () => {
});
});

describe('toolchain confinement probe', () => {
it('bun probes with `bun --version`', () => {
expect(bunProfile.toolchain.probeCommand()).toEqual(['bun', '--version']);
});

it('brunch probes the vitest runner', () => {
expect(brunchProfile.toolchain.probeCommand()).toEqual(['npx', 'vitest', '--version']);
});
});

describe('toolchain test conventions are framework-specific', () => {
it('bun conventions mention bun:test', () => {
expect(bunProfile.toolchain.testConventions).toContain('bun:test');
Expand Down
8 changes: 8 additions & 0 deletions src/orchestrator/src/project-profile.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ export interface Toolchain {
epicTarget(epicId: string): string;
/** Argv that runs a single test target in the cook sandbox. */
testCommand(target: string): string[];
/**
* A cheap argv that exercises the test runner without running tests, used by
* the confinement preflight to verify the toolchain works under the sandbox
* profile (e.g. `bun --version`).
*/
probeCommand(): string[];
/**
* Agent-facing description of the test framework + import conventions,
* injected into the cook test-writer task so prompts carry no hardcoded
Expand All @@ -28,6 +34,7 @@ export const bunProfile: ProjectProfile = {
sliceTarget: (sliceId) => `tests/${sliceId}.test.ts`,
epicTarget: (epicId) => `tests/${epicId}.integration.test.ts`,
testCommand: (target) => ['bun', 'test', target],
probeCommand: () => ['bun', '--version'],
testConventions:
'Use bun\'s test runner: `import { describe, expect, it } from "bun:test"`. The harness runs each target with `bun test <target>`.',
},
Expand All @@ -40,6 +47,7 @@ export const brunchProfile: ProjectProfile = {
sliceTarget: (sliceId) => `${sliceId}.test.ts`,
epicTarget: (epicId) => `${epicId}.integration.test.ts`,
testCommand: (target) => ['npx', 'vitest', 'run', target],
probeCommand: () => ['npx', 'vitest', '--version'],
testConventions:
'Use vitest: `import { describe, expect, it } from "vitest"`. The harness runs each target with `vitest run <target>`.',
},
Expand Down
Loading
Loading