fix(codex-executor): handle response.incomplete + raise output-token budget by sfreudenthaler · Pull Request #38 · dotCMS/ai-workflows

sfreudenthaler · 2026-06-13T14:29:32Z

Problem

~43% of recent GPT-5.5 automatic reviews in dotCMS/core posted "❌ Codex Review failed — job failed before producing output" (e.g. core PRs #36150, #36149, #36144, #36130, #36124, #36112). Those PRs got no review, not a false "clean".

Root cause

On the OpenAI Responses API, max_output_tokens caps the combined reasoning + visible-answer tokens — not just the answer. At reasoning_effort: medium, GPT-5.5 sometimes spends the entire 2048 budget reasoning, and the stream ends with status=incomplete (incomplete_details.reason=max_output_tokens) and zero output_text.delta events.

mantle_review.py captured usage only on response.completed and only logged errors on response.failed/error, so an incomplete terminal event fell through everything → empty review → sys.exit(1) → the generic failure sticky. The failure signature in the logs is Tokens: in: ? · out: ? (usage None) with no ::error:: line. It's non-deterministic by reasoning load, not diff size (a 137-line diff failed while a 208-line diff passed).

Fix

Raise max_output_tokens default 2048 → 8000 so medium-effort reasoning + the answer both fit.
Capture usage/status from response.incomplete (and response.failed), not just response.completed.
Retry once when the answer is empty because of max_output_tokens: bump the budget (≥16000) and drop reasoning_effort to low so the visible answer fits.
Graceful diagnostic: if still empty, post a clear truncation message and exit 0 (sticky shows the reason) instead of a generic failure. Partial answers are kept and flagged.
Correct the stale "max_output_tokens does NOT cap reasoning tokens" comments.

No consumer interface change. → release as v3.1.3.

Validation

YAML parses; embedded mantle_review.py compiles.
E2E on dotCMS/steve-quarterly-planning (linked after the tag is cut): reproduce an incomplete/empty review, confirm v3.1.3 produces a real review.

…budget ~43% of recent GPT-5.5 reviews in dotCMS/core were posting "❌ Codex Review failed — job failed before producing output." Root cause: on the Responses API, max_output_tokens caps the COMBINED reasoning + visible-answer tokens, not just the answer. At reasoning_effort=medium, GPT-5.5 sometimes spends the whole 2048-token budget thinking and the stream ends status=incomplete (incomplete_details.reason=max_output_tokens) with ZERO output_text.delta. The executor captured usage only on response.completed and only logged errors on response.failed/error, so an incomplete response fell through to an empty review -> sys.exit(1) -> generic failure sticky. Non-deterministic by reasoning load, not diff size (a 137-line diff failed while a 208-line diff passed). Fix: - Raise max_output_tokens default 2048 -> 8000 so medium-effort reasoning plus the answer both fit. - Capture usage/status from the response.incomplete (and response.failed) terminal events, not just response.completed. - Retry once when the answer is empty *because* of max_output_tokens: bump the budget (>=16000) and drop reasoning_effort to low so the visible answer fits. - If still empty, post a clear truncation diagnostic and exit 0 (sticky shows the reason) instead of a generic "review failed". Keep + flag partial answers. - Correct the stale "max_output_tokens does NOT cap reasoning tokens" comments. No consumer interface change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-13T14:29:39Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

## Summary Bumps the `dotCMS/ai-workflows` pin in the AI review workflows from `@v3.1.2` to **`@v3.1.4`** (orchestrator + backend reviewer). Supersedes the intermediate v3.1.3 bump — v3.1.4 includes everything in v3.1.3 plus the outcome-signaling improvements, so we go straight to it. ## What's in v3.1.3 + v3.1.4 **v3.1.3 — silent-failure fix ([ai-workflows#38](dotCMS/ai-workflows#38 ~43% of GPT-5.5 reviews were posting "❌ Codex Review failed — job failed before producing output." Root cause: `max_output_tokens` caps reasoning+answer combined, so medium-effort GPT-5.5 sometimes spent the whole budget reasoning and returned `incomplete` with no text. Fix: budget 2048→8000, handle `response.incomplete`, retry once with a bigger budget + lighter reasoning. **v3.1.4 — clear outcome signaling ([ai-workflows#39](dotCMS/ai-workflows#39 - Sticky header reflects the outcome: `🤖 Codex Review` / `⚠️ truncated` / `❌ no output` / `⏱️ canceled` - The job **fails (red ✗ in checks)** when no review is produced — surfaces the outcome without gating merges (advisory review) - Canceled / timed-out runs rewrite the sticky to `⏱️ Codex Review canceled` instead of leaving it stuck on `🔄 in progress` ## Validation - v3.1.3 before/after e2e: steve-quarterly-planning dotCMS#105 (@v3.1.2 failed, @v3.1.3 recovered) - v3.1.4 signaling e2e: steve-quarterly-planning dotCMS#106 (🤖+green confirmed; ⏱️ cancellation confirmed; retry makes the ❌ path a hardened safety net) Closes: dotCMS#36158 --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

sfreudenthaler requested review from a team as code owners June 13, 2026 14:29

sfreudenthaler merged commit 8f58de2 into main Jun 13, 2026
3 checks passed

sfreudenthaler deleted the fix/codex-incomplete-token-budget branch June 13, 2026 14:32

This was referenced Jun 13, 2026

chore(ai-reviews): bump ai-workflows pin to v3.1.4 (silent-failure fix + clear outcome signaling) dotCMS/core#36158

Open

chore(ai-reviews): bump ai-workflows pin to v3.1.4 dotCMS/core#36159

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codex-executor): handle response.incomplete + raise output-token budget#38

fix(codex-executor): handle response.incomplete + raise output-token budget#38
sfreudenthaler merged 1 commit into
mainfrom
fix/codex-incomplete-token-budget

sfreudenthaler commented Jun 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sfreudenthaler commented Jun 13, 2026

Problem

Root cause

Fix

Validation

Uh oh!

chatgpt-codex-connector Bot commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant