fix(subconscious): halt + demote on permanent provider rate-cap 413 (#4404)#4410
fix(subconscious): halt + demote on permanent provider rate-cap 413 (#4404)#4410oxoxDev wants to merge 3 commits into
Conversation
…humansai#4404) Recognize a direct BYO-provider 413 whose single-request token count exceeds the account's tokens-per-minute cap (groq on_demand free tier). Anchored on both "request too large" (single-request permanence) and a tokens-per-minute marker, so a transient 429 burst and context-window overflow stay in their own buckets. Single source of truth for the Sentry classifier and the subconscious circuit breaker. Verbatim-body test guards against wording drift.
…nsai#4404) TAURI-RUST-HXF: a direct BYO provider (groq on_demand free tier) rejecting a single request that exceeds the account per-minute token cap is user-config state OpenHuman cannot lift, not a product bug. Add it to is_provider_user_state_message so the domain=agent re-report demotes instead of paging. The managed-backend PAYLOAD_TOO_LARGE guard-leak still force-captures earlier, so this arm only sees direct-provider TPM rejections. Regression test pins the managed path still pages and a transient/bare 413 is not demoted.
…yhumansai#4404) TAURI-RUST-HXF: when a tick's provider config keeps rejecting with a permanent per-minute token cap (413/TPM), the loop re-fired the doomed request every 5-30 min and re-reported it — 2232 events from one user, the cron-billing-flood family (tinyhumansai#3913). Add a circuit breaker keyed on the Subconscious provider signature: on a permanent rate-cap agent error, halt the agent run; skip subsequent ticks while the same config is set; auto-clear the moment the user switches model/provider/tier. Mirrors the existing tool-capability (TAURI-RUST-ADC) permanent-failure arm. In-memory only — a restart re-probes once, then re-halts. Pure helpers unit-tested.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
👮 Files not reviewed due to content moderation or server errors (5)
📝 Walkthrough
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a35309d2af
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| pub fn is_provider_rate_cap_exceeded_message(body: &str) -> bool { | ||
| let lower = body.to_ascii_lowercase(); | ||
| lower.contains("request too large") | ||
| && (lower.contains("tokens per minute") || lower.contains("(tpm)")) | ||
| } |
There was a problem hiding this comment.
Wire the rate-cap matcher into api_error
For the affected direct-compatible provider paths I checked (compatible_provider_impl.rs calls api_error on non-2xx responses), a 413/TPM body reaches api_error; this new matcher is only used by expected_error_kind and the subconscious breaker, so api_error still falls through to should_report_provider_http_failure(status) and emits a domain=llm_provider Sentry event for status 413. That leaves the first provider-origin event/page in place for exactly the Groq scenario this PR is trying to demote; add an is_provider_rate_cap_exceeded_message(&body) branch (and/or a before-send net) before the status gate.
Useful? React with 👍 / 👎.
| match resolve_subconscious_route(config) { | ||
| SubconsciousProviderRoute::LocalOllama { model } => format!("local:{model}"), | ||
| SubconsciousProviderRoute::OpenHumanCloud => "cloud".to_string(), | ||
| SubconsciousProviderRoute::Other(raw) => format!("other:{raw}"), |
There was a problem hiding this comment.
Include credential changes in the halt signature
This signature keys only on the raw workload route. When a user fixes the TPM cap by pasting a higher-tier API key under the same slug (setCloudProviderKey stores provider:<slug> separately from Config) or by editing the provider row without changing subconscious_provider, the signature stays other:<same route>, so should_skip_for_rate_cap_halt keeps skipping and never re-probes until an app restart or a fake model/provider change. Include relevant provider-entry/credential versioning in the halt key, or clear the halt when AI provider credentials/settings are saved.
Useful? React with 👍 / 👎.
Summary
PAYLOAD_TOO_LARGEguard-leaks still page, unchanged.Problem
A user pointed the Subconscious agent at a groq
on_demandfree-tier model (openai/gpt-oss-120b) whose cap is 8000 tokens/minute. A subconscious turn builds ~42k tokens of context — 5× over the per-minute rate cap (not the context window, so trimming can't help), so groq rejects every call:Two defects follow:
provider_chatboundary re-reports it each time (the cron-billing-flood family, fix(cron): stop cron billing-state Sentry floods — 402 credits + 400 budget (TAURI-RUST-514 / -BMW) #3913), while also burning the user's provider quota.Solution
is_provider_rate_cap_exceeded_message(inference/provider/ops/http_error.rs): recognizes a permanent per-request rate-cap 413, anchored on both"request too large"(single-request permanence) and a tokens-per-minute marker — so a transient 429 burst and context-window overflow stay in their own buckets. Single source of truth for the two consumers below (no wording drift).core::observability::is_provider_user_state_message): the direct-provider TPM rejection demotes toProviderUserState. Ordered after the managed-backend guard-leak arm, so managedPAYLOAD_TOO_LARGEstill force-captures.subconscious::engine): on a permanent rate-cap agent error, arm a halt keyed on the Subconscious provider signature; subsequent ticks skip the agent run entirely until the signature changes (user picks a new model/tier), then auto-resume. In-memory only — a restart re-probes once, then re-halts (one event/launch, not a flood). Mirrors the existing tool-capability (TAURI-RUST-ADC) permanent-failure arm.State transitions and the matcher are extracted into pure/unit-tested helpers; only trivial glue remains in the async tick path.
Submission Checklist
cargo llvm-cov+diff-cover --compare-branch=upstream/main: observability 100%, http_error 100%, engine 66.7% (live-agent tick glue only), total 82%.docs/TEST-COVERAGE-MATRIX.mdCloses #NNNin the## RelatedsectionImpact
Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
pnpm --filter openhuman-app format:check)pnpm typecheck)cargo test --libforinference::provider::ops::http_error,core::observability::tests,subconscious::engine— all green (new: rate-cap matcher, demote + managed-still-pages regression, breaker state transitions)cargo fmt --checkclean;cargo clippy --libno new warnings on touched filesapp/src-taurichanges (Tauri fmt/check)Validation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes