Skip to content

feat: self-hosted OpenAI-compatible providers (LiteLLM) for Pi#416

Open
hauserkristof wants to merge 22 commits into
getsentry:mainfrom
hauserkristof:feat/custom-pi-provider-litellm
Open

feat: self-hosted OpenAI-compatible providers (LiteLLM) for Pi#416
hauserkristof wants to merge 22 commits into
getsentry:mainfrom
hauserkristof:feat/custom-pi-provider-litellm

Conversation

@hauserkristof

Copy link
Copy Markdown

Summary

Adds support for self-hosted, OpenAI-compatible LLM providers (such as a LiteLLM proxy) to Warden's Pi runtime. You can now register a named provider in warden.toml, point any model lane at it with the usual provider/model selector, and run Warden entirely against your own infrastructure — no dependency on a hosted vendor.

The provider is generic: any OpenAI-compatible endpoint works. LiteLLM is the worked example, not a hard requirement.

What's included

  • Config schema[defaults.providers.<name>] blocks (base URL, API kind, optional headers, model list). Only id is required per model; Warden fills sensible defaults (name, reasoning, input, contextWindow, maxTokens, zero cost).
  • Runtime registration — custom providers are normalized and registered in the Pi adapter, exposed through getRuntimeProviderOptions, and threaded through every model lane (agent, auxiliary, synthesis).
  • Env-only credentials — API keys are resolved from the environment (apiKeyEnvWARDEN_<NAME>_API_KEY<NAME>_API_KEY), never from warden.toml.
  • Fail-fast preflight — a non-loopback provider with no resolvable key fails before any analysis, with a clear message. Loopback URLs (localhost, 127.0.0.1, ::1, bracketed [::1]) may run keyless.
  • Consistent across entry points — the preflight runs in the CLI, the trigger executor, scheduled workflows, and the PR workflow.
  • Docs — new "Self-hosted / OpenAI-compatible providers" section in config/models.mdx, plus the Pi models.json alternative.

Behavior decisions

  • One model drives every lane. The auxiliary lane (structured extraction, dedup, merge, fix evaluation) and the synthesis lane now inherit the resolved global default model (defaults.agent.modeldefaults.model--modelWARDEN_MODEL) when their own model is unset. Explicit [defaults.auxiliary] / [defaults.synthesis] still win.

    This is important for self-hosted setups: previously, setting only defaults.model = "litellm/..." kept just the agent lane on your proxy while extraction/dedup/synthesis silently fell back to the runtime's built-in default model — which resolves against a different provider (e.g. Gemini via Google's API). That both broke runs (auth/quota errors mid-analysis) and leaked model output carrying code context off the endpoint you deliberately chose. The inheritance fix closes that gap.

  • Loopback exemption. Unauthenticated local endpoints are a legitimate setup, so loopback hosts are allowed without a key while every other host requires one.

Configuration example

[defaults]
runtime = "pi"
model = "litellm/my-model"   # provider-prefixed; drives all lanes

[defaults.providers.litellm]
baseUrl = "http://localhost:4000/v1"
api = "openai-completions"
# apiKeyEnv = "WARDEN_LITELLM_API_KEY"   # optional; overrides the default lookup

[[defaults.providers.litellm.models]]
id = "my-model"
# reasoning = true        # mark reasoning models
# maxTokens = 16384       # give reasoning models headroom

Then: export WARDEN_LITELLM_API_KEY=... and run Warden as usual.

Testing

  • Unit + integration coverage for provider normalization, key resolution, loopback detection, the auth preflight, provider forwarding through all lanes, and the new auxiliary/synthesis model inheritance (both resolution sites).
  • Custom-provider preflight tests are hermetic (no ambient env dependence).
  • Validated end-to-end against a live LiteLLM proxy: config parsing → key resolution → preflight → registration → request → response, across agent, auxiliary, and synthesis lanes, including a reasoning model.
  • Full suite green: pnpm lint, pnpm build, pnpm test, docs build.

Notes

  • v1 is OpenAI-completions only; other API shapes are out of scope.
  • Self-hosted models report $0 cost unless explicit costs are configured.

🤖 Generated with Claude Code

https://claude.ai/code/session_01E7ptHmVh79CqJev6WHnWNm

hauserkristof and others added 17 commits June 26, 2026 23:31
Adds the brainstormed design spec for registering OpenAI-compatible
self-hosted endpoints (e.g. LiteLLM) as named Pi providers in warden.toml,
covering all model lanes with env-based auth.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tize

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…ets ::1)

The re-review premise was inverted: Node's WHATWG URL returns IPv6 hosts
bracketed ([::1]), so the [::1] branch is the working one. Restore it and
document the behavior; keep ::1 as a defensive fallback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Add providers?: ProvidersConfig to SkillRunnerOptions, AuxiliaryCallOptions,
  VerifyFindingsOptions, PostProcessFindingsOptions, JsonOutputRepairOptions,
  and FixJudgeRuntimeOptions
- Carry providers through every child-options construction site in analyze.ts
  and post-process.ts so verify/dedup lanes do not silently lose the field
- Attach providerOptions: getRuntimeProviderOptions(..., { providers }) to every
  runAuxiliary/runSynthesis call in extract.ts, dedup.ts, json-output.ts,
  judge.ts, and verify.ts
- Extend findSemanticDuplicates Pick type to include providers
- Add focused integration test asserting providerOptions is forwarded
- Fix judge.runtime-options.test.ts mock to include getRuntimeProviderOptions

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Inject env into verifyCustomProviderAuthForRun so the preflight tests no
longer depend on the runner's environment (a CI-set LITELLM_API_KEY would
otherwise false-pass the no-key case).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address final-review nits: document the intentional process.env read in
getRuntimeProviderOptions (preflight resolves against the same env), and
include ::1 in the loopback auth note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ry points

Address PR #1 review findings for the LiteLLM custom-provider feature:

- schedule.ts: scheduled workflows ran assertValidPiModelSelectors but
  skipped the assertCustomProviderAuth preflight that the trigger executor
  and CLI already perform. Add it for pi-runtime triggers so a missing
  remote provider key fails fast instead of at run time.
- main.ts (runConfigMode): the custom-provider preflight used raw
  trigger.runtime, ignoring the --runtime override that the adjacent Claude
  auth check and the actual run already honor. Thread
  `options.runtime ?? trigger.runtime` so preflight matches what executes.
- judge.runtime-options.test.ts: add coverage proving providerOptions flow
  from evaluateFix through to runAuxiliary (previously unverified).
- outline.test.ts: add a repoPath case exercising the
  runStructuredSkillBuilderAgent branch; provider forwarding there was
  untested (only the runSynthesis branch was covered).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…anes

The auxiliary lane (structured extraction, dedup, merge, fix evaluation) and the
synthesis lane resolved their model only from [defaults.auxiliary] /
[defaults.synthesis]. When those were unset, the lanes fell back to the runtime's
built-in default model, which for the Pi runtime resolves against a different
provider entirely (e.g. gemini via Google's API).

For self-hosted custom providers this is a real defect, not just a docs gap:
configuring `defaults.model = "litellm/..."` kept only the agent lane on the
proxy while extraction/dedup/synthesis silently escaped to an unconfigured
external provider. That both breaks (auth/quota errors mid-run) and, worse, leaks
model output carrying code context off the self-hosted endpoint the user
deliberately chose.

Make the auxiliary and synthesis lanes fall back to the resolved global default
model (defaults.agent.model -> defaults.model -> --model -> WARDEN_MODEL) when
their own model is unset, in both resolution sites (CLI resolveCliDefault* and
loader resolveSkillConfigs). Explicit auxiliary/synthesis models still win, and
lanes still ignore skill/trigger-level overrides (agent lane only). A single
configured model now drives every lane.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/warden/src/action/fix-evaluation/judge.ts Outdated
Comment thread packages/warden/src/action/workflow/pr-workflow.ts
Comment thread packages/warden/src/config/loader.ts
hauserkristof and others added 3 commits June 30, 2026 16:34
…lane

resolveWorkflowAuxiliaryOptions read only [defaults.auxiliary].model and never
fell back to the global default model, unlike resolveSkillConfigs and the CLI
resolvers. In the GitHub Action PR workflow, dedup, consolidation, and fix
evaluation could still hit the runtime's built-in default model on another
provider while `providers` pointed at a custom (e.g. self-hosted) endpoint -
the same escape the lane-inheritance fix closed for the other entry points.

Add the global-model fallback (agent.model -> model) after the explicit
auxiliary models, keeping the existing base-first enforced-baseline precedence.
Export the resolver and unit-test the inheritance and precedence.

Reported by Cursor Bugbot on PR getsentry#416.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…luation

evaluateFix called getRuntime(runtimeOptions.runtime) - which defaults an
omitted runtime to 'pi' - but resolved provider options with
getRuntimeProviderOptions(runtimeOptions.runtime ?? 'claude', ...). On the
omitted-runtime path the two disagreed: the call ran on Pi while provider
options were built for Claude, so custom providers were never registered and
fix evaluation silently escaped to a runtime default on another provider.

Resolve one effective runtime (runtimeOptions.runtime ?? 'pi') and pass it to
both getRuntime and getRuntimeProviderOptions so they can never diverge. Update
the null-options test and add a test for the omitted-runtime forwarding path.

Reported by Cursor Bugbot on PR getsentry#416.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
inheritRepoLayerDefaults carried the org base runtime and verification defaults
into the repo layer but not defaults.providers. A repo config that only added
skills therefore lost the custom providers defined by the org base config, so
its resolved triggers ran without them.

Inherit defaults.providers as an execution-environment default, alongside
runtime. Per-skill policy defaults (model, failOn, ignorePaths, ...) still do
not cross layers, matching the existing selective-inheritance contract.

Reported by Cursor Bugbot on PR getsentry#416.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/warden/src/cli/commands/build.ts
Comment thread packages/warden/src/cli/commands/build.ts
Comment thread packages/warden/src/cli/commands/build.ts
The generated-skill build/improve command resolved its model lanes and skipped
the provider preflight differently from every other entry point, so a config
that points only `defaults.model` at a custom (e.g. self-hosted) provider could
silently escape to a runtime default on another provider, and a keyless remote
provider failed mid-build instead of failing fast.

- Synthesis model now falls back to the global default chain
  (synthesis -> auxiliary -> agent.model -> model -> --model -> WARDEN_MODEL)
  via a shared resolveDefaultModel helper, matching resolveCliDefaultSynthesisModel.
- Repair model gains the same auxiliary inheritance chain instead of reading
  only defaults.auxiliary.model.
- Add the assertCustomProviderAuth preflight (fail-fast with reporter.error)
  before synthesis, matching the CLI, executor, and workflow entry points.

Export the resolvers and unit-test the inheritance/precedence; add an
integration test that build fails fast on a keyless remote provider.

Reported by Cursor Bugbot on PR getsentry#416.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/warden/src/skill-builder/agentic.ts
In runStructuredSkillBuilderAgent, when structured output fails validation and
the primary repairStructuredSkillBuilderOutput path also fails, the secondary
parseJsonFromOutput repair call omitted `providers`. The primary path forwards
them, so on Pi with a self-hosted model the fallback repair ran without the
registered custom providers and could fail or hit the wrong backend.

Pass providers through to the fallback repair options (JsonOutputRepairOptions
already supports and forwards them). Add a test covering the fallback path.

Reported by Cursor Bugbot on PR getsentry#416.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e31e6fe. Configure here.

if (base?.providers !== undefined && inherited.providers === undefined) {
inherited.providers = base.providers;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Layered repo triggers miss model

High Severity

Org-level defaults.providers and defaults.model for a self-hosted lane are handled inconsistently across layers. Repo-only triggers inherit custom providers from the base config but not the base default model, so trigger execution can run auxiliary and synthesis calls without the org’s litellm/… model while providers are present. Workflow-scoped dedup and consolidate still resolve the base model via resolveWorkflowAuxiliaryOptions, so analysis and posting can disagree on which model and provider lane runs.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e31e6fe. Configure here.

model: options.model,
maxTokens: 512,
maxRetries: options.maxRetries,
providerOptions: getRuntimeProviderOptions(options.runtime ?? 'claude', { providers: options.providers }),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dedup defaults drop Pi providers

Medium Severity

Semantic dedup and batch consolidate now forward providerOptions using getRuntimeProviderOptions, but still default an omitted runtime to claude for both getRuntime and the provider-options lookup. Elsewhere in this change set the effective default runtime is pi, and getRuntimeProviderOptions only builds custom provider registration for Pi. Callers that pass providers without an explicit runtime therefore hit the Claude adapter and silently omit custom provider registration.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e31e6fe. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant