feat: self-hosted OpenAI-compatible providers (LiteLLM) for Pi#416
feat: self-hosted OpenAI-compatible providers (LiteLLM) for Pi#416hauserkristof wants to merge 22 commits into
Conversation
Adds the brainstormed design spec for registering OpenAI-compatible self-hosted endpoints (e.g. LiteLLM) as named Pi providers in warden.toml, covering all model lanes with env-based auth. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tize Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ets ::1) The re-review premise was inverted: Node's WHATWG URL returns IPv6 hosts bracketed ([::1]), so the [::1] branch is the working one. Restore it and document the behavior; keep ::1 as a defensive fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add providers?: ProvidersConfig to SkillRunnerOptions, AuxiliaryCallOptions,
VerifyFindingsOptions, PostProcessFindingsOptions, JsonOutputRepairOptions,
and FixJudgeRuntimeOptions
- Carry providers through every child-options construction site in analyze.ts
and post-process.ts so verify/dedup lanes do not silently lose the field
- Attach providerOptions: getRuntimeProviderOptions(..., { providers }) to every
runAuxiliary/runSynthesis call in extract.ts, dedup.ts, json-output.ts,
judge.ts, and verify.ts
- Extend findSemanticDuplicates Pick type to include providers
- Add focused integration test asserting providerOptions is forwarded
- Fix judge.runtime-options.test.ts mock to include getRuntimeProviderOptions
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Inject env into verifyCustomProviderAuthForRun so the preflight tests no longer depend on the runner's environment (a CI-set LITELLM_API_KEY would otherwise false-pass the no-key case). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address final-review nits: document the intentional process.env read in getRuntimeProviderOptions (preflight resolves against the same env), and include ::1 in the loopback auth note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ry points Address PR #1 review findings for the LiteLLM custom-provider feature: - schedule.ts: scheduled workflows ran assertValidPiModelSelectors but skipped the assertCustomProviderAuth preflight that the trigger executor and CLI already perform. Add it for pi-runtime triggers so a missing remote provider key fails fast instead of at run time. - main.ts (runConfigMode): the custom-provider preflight used raw trigger.runtime, ignoring the --runtime override that the adjacent Claude auth check and the actual run already honor. Thread `options.runtime ?? trigger.runtime` so preflight matches what executes. - judge.runtime-options.test.ts: add coverage proving providerOptions flow from evaluateFix through to runAuxiliary (previously unverified). - outline.test.ts: add a repoPath case exercising the runStructuredSkillBuilderAgent branch; provider forwarding there was untested (only the runSynthesis branch was covered). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…anes The auxiliary lane (structured extraction, dedup, merge, fix evaluation) and the synthesis lane resolved their model only from [defaults.auxiliary] / [defaults.synthesis]. When those were unset, the lanes fell back to the runtime's built-in default model, which for the Pi runtime resolves against a different provider entirely (e.g. gemini via Google's API). For self-hosted custom providers this is a real defect, not just a docs gap: configuring `defaults.model = "litellm/..."` kept only the agent lane on the proxy while extraction/dedup/synthesis silently escaped to an unconfigured external provider. That both breaks (auth/quota errors mid-run) and, worse, leaks model output carrying code context off the self-hosted endpoint the user deliberately chose. Make the auxiliary and synthesis lanes fall back to the resolved global default model (defaults.agent.model -> defaults.model -> --model -> WARDEN_MODEL) when their own model is unset, in both resolution sites (CLI resolveCliDefault* and loader resolveSkillConfigs). Explicit auxiliary/synthesis models still win, and lanes still ignore skill/trigger-level overrides (agent lane only). A single configured model now drives every lane. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lane resolveWorkflowAuxiliaryOptions read only [defaults.auxiliary].model and never fell back to the global default model, unlike resolveSkillConfigs and the CLI resolvers. In the GitHub Action PR workflow, dedup, consolidation, and fix evaluation could still hit the runtime's built-in default model on another provider while `providers` pointed at a custom (e.g. self-hosted) endpoint - the same escape the lane-inheritance fix closed for the other entry points. Add the global-model fallback (agent.model -> model) after the explicit auxiliary models, keeping the existing base-first enforced-baseline precedence. Export the resolver and unit-test the inheritance and precedence. Reported by Cursor Bugbot on PR getsentry#416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…luation evaluateFix called getRuntime(runtimeOptions.runtime) - which defaults an omitted runtime to 'pi' - but resolved provider options with getRuntimeProviderOptions(runtimeOptions.runtime ?? 'claude', ...). On the omitted-runtime path the two disagreed: the call ran on Pi while provider options were built for Claude, so custom providers were never registered and fix evaluation silently escaped to a runtime default on another provider. Resolve one effective runtime (runtimeOptions.runtime ?? 'pi') and pass it to both getRuntime and getRuntimeProviderOptions so they can never diverge. Update the null-options test and add a test for the omitted-runtime forwarding path. Reported by Cursor Bugbot on PR getsentry#416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
inheritRepoLayerDefaults carried the org base runtime and verification defaults into the repo layer but not defaults.providers. A repo config that only added skills therefore lost the custom providers defined by the org base config, so its resolved triggers ran without them. Inherit defaults.providers as an execution-environment default, alongside runtime. Per-skill policy defaults (model, failOn, ignorePaths, ...) still do not cross layers, matching the existing selective-inheritance contract. Reported by Cursor Bugbot on PR getsentry#416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The generated-skill build/improve command resolved its model lanes and skipped the provider preflight differently from every other entry point, so a config that points only `defaults.model` at a custom (e.g. self-hosted) provider could silently escape to a runtime default on another provider, and a keyless remote provider failed mid-build instead of failing fast. - Synthesis model now falls back to the global default chain (synthesis -> auxiliary -> agent.model -> model -> --model -> WARDEN_MODEL) via a shared resolveDefaultModel helper, matching resolveCliDefaultSynthesisModel. - Repair model gains the same auxiliary inheritance chain instead of reading only defaults.auxiliary.model. - Add the assertCustomProviderAuth preflight (fail-fast with reporter.error) before synthesis, matching the CLI, executor, and workflow entry points. Export the resolvers and unit-test the inheritance/precedence; add an integration test that build fails fast on a keyless remote provider. Reported by Cursor Bugbot on PR getsentry#416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In runStructuredSkillBuilderAgent, when structured output fails validation and the primary repairStructuredSkillBuilderOutput path also fails, the secondary parseJsonFromOutput repair call omitted `providers`. The primary path forwards them, so on Pi with a self-hosted model the fallback repair ran without the registered custom providers and could fail or hit the wrong backend. Pass providers through to the fallback repair options (JsonOutputRepairOptions already supports and forwards them). Add a test covering the fallback path. Reported by Cursor Bugbot on PR getsentry#416. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e31e6fe. Configure here.
| if (base?.providers !== undefined && inherited.providers === undefined) { | ||
| inherited.providers = base.providers; | ||
| } | ||
|
|
There was a problem hiding this comment.
Layered repo triggers miss model
High Severity
Org-level defaults.providers and defaults.model for a self-hosted lane are handled inconsistently across layers. Repo-only triggers inherit custom providers from the base config but not the base default model, so trigger execution can run auxiliary and synthesis calls without the org’s litellm/… model while providers are present. Workflow-scoped dedup and consolidate still resolve the base model via resolveWorkflowAuxiliaryOptions, so analysis and posting can disagree on which model and provider lane runs.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit e31e6fe. Configure here.
| model: options.model, | ||
| maxTokens: 512, | ||
| maxRetries: options.maxRetries, | ||
| providerOptions: getRuntimeProviderOptions(options.runtime ?? 'claude', { providers: options.providers }), |
There was a problem hiding this comment.
Dedup defaults drop Pi providers
Medium Severity
Semantic dedup and batch consolidate now forward providerOptions using getRuntimeProviderOptions, but still default an omitted runtime to claude for both getRuntime and the provider-options lookup. Elsewhere in this change set the effective default runtime is pi, and getRuntimeProviderOptions only builds custom provider registration for Pi. Callers that pass providers without an explicit runtime therefore hit the Claude adapter and silently omit custom provider registration.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit e31e6fe. Configure here.


Summary
Adds support for self-hosted, OpenAI-compatible LLM providers (such as a LiteLLM proxy) to Warden's Pi runtime. You can now register a named provider in
warden.toml, point any model lane at it with the usualprovider/modelselector, and run Warden entirely against your own infrastructure — no dependency on a hosted vendor.The provider is generic: any OpenAI-compatible endpoint works. LiteLLM is the worked example, not a hard requirement.
What's included
[defaults.providers.<name>]blocks (base URL, API kind, optional headers, model list). Onlyidis required per model; Warden fills sensible defaults (name,reasoning,input,contextWindow,maxTokens, zerocost).getRuntimeProviderOptions, and threaded through every model lane (agent, auxiliary, synthesis).apiKeyEnv→WARDEN_<NAME>_API_KEY→<NAME>_API_KEY), never fromwarden.toml.localhost,127.0.0.1,::1, bracketed[::1]) may run keyless.config/models.mdx, plus the Pimodels.jsonalternative.Behavior decisions
One model drives every lane. The auxiliary lane (structured extraction, dedup, merge, fix evaluation) and the synthesis lane now inherit the resolved global default model (
defaults.agent.model→defaults.model→--model→WARDEN_MODEL) when their own model is unset. Explicit[defaults.auxiliary]/[defaults.synthesis]still win.This is important for self-hosted setups: previously, setting only
defaults.model = "litellm/..."kept just the agent lane on your proxy while extraction/dedup/synthesis silently fell back to the runtime's built-in default model — which resolves against a different provider (e.g. Gemini via Google's API). That both broke runs (auth/quota errors mid-analysis) and leaked model output carrying code context off the endpoint you deliberately chose. The inheritance fix closes that gap.Loopback exemption. Unauthenticated local endpoints are a legitimate setup, so loopback hosts are allowed without a key while every other host requires one.
Configuration example
Then:
export WARDEN_LITELLM_API_KEY=...and run Warden as usual.Testing
pnpm lint,pnpm build,pnpm test, docs build.Notes
$0cost unless explicit costs are configured.🤖 Generated with Claude Code
https://claude.ai/code/session_01E7ptHmVh79CqJev6WHnWNm