fix(subconscious): halt + demote on permanent provider rate-cap 413 (#4404) by oxoxDev · Pull Request #4410 · tinyhumansai/openhuman

oxoxDev · 2026-07-02T11:13:17Z

Summary

Stop the subconscious background agent from re-firing — and re-reporting — a permanently-doomed provider request every tick when the configured model rejects it with a per-minute token cap (413/TPM). Sentry TAURI-RUST-HXF: 2232 events from one user.
Add a circuit breaker that halts subconscious ticks while the offending Subconscious provider config is still set, and auto-resumes the moment the user switches model/provider/tier.
Demote a direct BYO-provider 413/TPM rejection from an unexpected Sentry crash to expected user-config state (the account's rate tier is not a lever OpenHuman controls). Managed-backend PAYLOAD_TOO_LARGE guard-leaks still page, unchanged.

Problem

A user pointed the Subconscious agent at a groq on_demand free-tier model (openai/gpt-oss-120b) whose cap is 8000 tokens/minute. A subconscious turn builds ~42k tokens of context — 5× over the per-minute rate cap (not the context window, so trimming can't help), so groq rejects every call:

groq API error (413 Payload Too Large): Request too large for model `openai/gpt-oss-120b` …
service tier `on_demand` on tokens per minute (TPM): Limit 8000, Requested 42084

Two defects follow:

Per-tick re-report flood — the tick loop re-fires the identical, permanently-doomed request every 5–30 min and the provider_chat boundary re-reports it each time (the cron-billing-flood family, fix(cron): stop cron billing-state Sentry floods — 402 credits + 400 budget (TAURI-RUST-514 / -BMW) #3913), while also burning the user's provider quota.
Mis-classified as a crash — a raw direct-provider 413/TPM matched no user-state/transient classifier, so it paged as an unexpected error.

Solution

New shared matcher is_provider_rate_cap_exceeded_message (inference/provider/ops/http_error.rs): recognizes a permanent per-request rate-cap 413, anchored on both "request too large" (single-request permanence) and a tokens-per-minute marker — so a transient 429 burst and context-window overflow stay in their own buckets. Single source of truth for the two consumers below (no wording drift).
Sentry demotion (core::observability::is_provider_user_state_message): the direct-provider TPM rejection demotes to ProviderUserState. Ordered after the managed-backend guard-leak arm, so managed PAYLOAD_TOO_LARGE still force-captures.
Circuit breaker (subconscious::engine): on a permanent rate-cap agent error, arm a halt keyed on the Subconscious provider signature; subsequent ticks skip the agent run entirely until the signature changes (user picks a new model/tier), then auto-resume. In-memory only — a restart re-probes once, then re-halts (one event/launch, not a flood). Mirrors the existing tool-capability (TAURI-RUST-ADC) permanent-failure arm.

State transitions and the matcher are extracted into pure/unit-tested helpers; only trivial glue remains in the async tick path.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — verified locally via cargo llvm-cov + diff-cover --compare-branch=upstream/main: observability 100%, http_error 100%, engine 66.7% (live-agent tick glue only), total 82%.
N/A: behaviour-only change — no feature rows added/removed/renamed in docs/TEST-COVERAGE-MATRIX.md
N/A: behaviour-only change — no matrix feature IDs apply
No new external network dependencies introduced (mock backend used per Testing Strategy)
N/A: no release-cut surface touched (background subconscious loop + Sentry classifier only)
Linked issue closed via Closes #NNN in the ## Related section

Impact

Platform: desktop (all OSes) — subconscious runs in the in-process core.
Reliability: eliminates a per-tick Sentry flood and stops burning a user's provider quota on a request that can never succeed on their tier.
Observability: the underlying condition is now expected user-config state, not a page; the user sees an actionable "pick a higher-tier model" reason in Subconscious status. No masking of real defects — the managed-backend guard-leak still pages, and a transient 429 stays retryable + Sentry-visible.
Security/migration: none. In-memory breaker state only; no schema/config change.

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A (tracked as GitHub issue fix(subconscious): halt + demote on permanent provider rate-cap 413 (groq TPM flood) #4404)
URL: N/A

Commit & Branch

Branch: fix/4404-subconscious-rate-cap-halt
Commit SHA: a35309d

Validation Run

N/A: no frontend changes (pnpm --filter openhuman-app format:check)
N/A: no frontend/TypeScript changes (pnpm typecheck)
Focused tests: cargo test --lib for inference::provider::ops::http_error, core::observability::tests, subconscious::engine — all green (new: rate-cap matcher, demote + managed-still-pages regression, breaker state transitions)
Rust fmt/check: cargo fmt --check clean; cargo clippy --lib no new warnings on touched files
N/A: no app/src-tauri changes (Tauri fmt/check)

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: a permanent per-minute token-cap (413/TPM) rejection from a direct BYO Subconscious provider no longer pages Sentry and no longer re-fires every tick; ticks halt until the provider config changes.
User-visible effect: Subconscious pauses with an actionable status message ("pick a higher-tier model or provider") instead of silently failing every few minutes; no functional change for correctly-provisioned providers.

…humansai#4404) Recognize a direct BYO-provider 413 whose single-request token count exceeds the account's tokens-per-minute cap (groq on_demand free tier). Anchored on both "request too large" (single-request permanence) and a tokens-per-minute marker, so a transient 429 burst and context-window overflow stay in their own buckets. Single source of truth for the Sentry classifier and the subconscious circuit breaker. Verbatim-body test guards against wording drift.

…nsai#4404) TAURI-RUST-HXF: a direct BYO provider (groq on_demand free tier) rejecting a single request that exceeds the account per-minute token cap is user-config state OpenHuman cannot lift, not a product bug. Add it to is_provider_user_state_message so the domain=agent re-report demotes instead of paging. The managed-backend PAYLOAD_TOO_LARGE guard-leak still force-captures earlier, so this arm only sees direct-provider TPM rejections. Regression test pins the managed path still pages and a transient/bare 413 is not demoted.

…yhumansai#4404) TAURI-RUST-HXF: when a tick's provider config keeps rejecting with a permanent per-minute token cap (413/TPM), the loop re-fired the doomed request every 5-30 min and re-reported it — 2232 events from one user, the cron-billing-flood family (tinyhumansai#3913). Add a circuit breaker keyed on the Subconscious provider signature: on a permanent rate-cap agent error, halt the agent run; skip subsequent ticks while the same config is set; auto-clear the moment the user switches model/provider/tier. Mirrors the existing tool-capability (TAURI-RUST-ADC) permanent-failure arm. In-memory only — a restart re-probes once, then re-halts. Pure helpers unit-tested.

coderabbitai · 2026-07-02T11:13:48Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cea2df89-362c-4222-8e1d-179375bdec49

📥 Commits

Reviewing files that changed from the base of the PR and between f979bfa and a35309d.

📒 Files selected for processing (5)

src/core/observability.rs
src/openhuman/inference/provider/ops/http_error.rs
src/openhuman/inference/provider/ops/mod.rs
src/openhuman/subconscious/engine.rs
src/openhuman/subconscious/engine_tests.rs

👮 Files not reviewed due to content moderation or server errors (5)

src/core/observability.rs
src/openhuman/inference/provider/ops/mod.rs
src/openhuman/subconscious/engine_tests.rs
src/openhuman/inference/provider/ops/http_error.rs
src/openhuman/subconscious/engine.rs

📝 Walkthrough

[!WARNING]

Walkthrough skipped

File diffs could not be summarized.

_{Comment @coderabbitai help to get the list of available commands.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a35309d2af

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T11:16:31Z

+pub fn is_provider_rate_cap_exceeded_message(body: &str) -> bool {
+    let lower = body.to_ascii_lowercase();
+    lower.contains("request too large")
+        && (lower.contains("tokens per minute") || lower.contains("(tpm)"))
+}


Wire the rate-cap matcher into api_error

For the affected direct-compatible provider paths I checked (compatible_provider_impl.rs calls api_error on non-2xx responses), a 413/TPM body reaches api_error; this new matcher is only used by expected_error_kind and the subconscious breaker, so api_error still falls through to should_report_provider_http_failure(status) and emits a domain=llm_provider Sentry event for status 413. That leaves the first provider-origin event/page in place for exactly the Groq scenario this PR is trying to demote; add an is_provider_rate_cap_exceeded_message(&body) branch (and/or a before-send net) before the status gate.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T11:16:31Z

+    match resolve_subconscious_route(config) {
+        SubconsciousProviderRoute::LocalOllama { model } => format!("local:{model}"),
+        SubconsciousProviderRoute::OpenHumanCloud => "cloud".to_string(),
+        SubconsciousProviderRoute::Other(raw) => format!("other:{raw}"),


Include credential changes in the halt signature

This signature keys only on the raw workload route. When a user fixes the TPM cap by pasting a higher-tier API key under the same slug (setCloudProviderKey stores provider:<slug> separately from Config) or by editing the provider row without changing subconscious_provider, the signature stays other:<same route>, so should_skip_for_rate_cap_halt keeps skipping and never re-probes until an app restart or a fake model/provider change. Include relevant provider-entry/credential versioning in the halt key, or clear the halt when AI provider credentials/settings are saved.

Useful? React with 👍 / 👎.

oxoxDev added 3 commits July 2, 2026 16:16

oxoxDev requested a review from a team July 2, 2026 11:13

coderabbitai Bot approved these changes Jul 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(subconscious): halt + demote on permanent provider rate-cap 413 (#4404)#4410

fix(subconscious): halt + demote on permanent provider rate-cap 413 (#4404)#4410
oxoxDev wants to merge 3 commits into
tinyhumansai:mainfrom
oxoxDev:fix/4404-subconscious-rate-cap-halt

oxoxDev commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Walkthrough skipped

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

oxoxDev commented Jul 2, 2026

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough skipped

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading