feat(codex-executor): surface response.failed error reason + retry once#40
Merged
Merged
Conversation
When the mantle Responses API returns a response.failed terminal event, the executor discarded response.error and posted a generic "failed before producing output" sticky — giving no clue whether it was the PR, our code, or AWS. (Seen live: gpt-5.5 on mantle 500ing on every request with server_error in both regions, while gpt-5.4 / gpt-oss-120b were healthy — an AWS-side model outage.) Changes: - Capture response.error (code + message) from the failed event; print it as a ::error job annotation AND show it in the ❌ sticky, with a note that this is usually an AWS/mantle service-side issue, not the PR. - Retry once on a transient response.failed (same params). Won't rescue a full model outage, but recovers one-off 5xx blips. - New outcome "failed" → ❌ header + reason; job fails (red ✗). The fail-job step now fires for any non-(ok|truncated) outcome, not just incomplete-empty. - Exit 0 whenever there's a terminal response to diagnose; exit 1 only when no terminal event arrived at all (resp is None → generic failure path). No consumer interface change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
riccardoruocco
pushed a commit
to riccardoruocco/core
that referenced
this pull request
Jun 16, 2026
… DeepSeek V3.2 (dotCMS#36182) Two changes to get automatic PR reviews working again and keep them legible: ## 1. Switch automatic reviewer GPT-5.5 → DeepSeek V3.2 `openai.gpt-5.5` is failing on Bedrock Mantle in **both** us-east-1 and us-east-2 as of 2026-06-15 (`invalid_prompt: 404 Not Found: Engine not found` — AWS-side; the alias *and* its `-2026-04-23` snapshot both 404 while still listed in `/v1/models`). Raised with our AWS TAM. `deepseek.v3.2` is healthy on **bedrock-runtime** (Converse) and is still a non-Claude reviewer, so it preserves the model-diversity rationale (Claude writes the code, a different family reviews it). `deepseek.*` routes to the existing `bedrock-generic` (Converse) executor — **no IAM or ai-workflows change** needed (`BedrockInvokeReviewModels` already allows `bedrock:Converse` on `foundation-model/deepseek.*`). - Renamed `gpt-automatic-review` → `ai-automatic-review` (model-agnostic); dropped `reasoning_effort` (mantle-only). - Interactive `@claude` and the backend reviewer stay on Anthropic Claude. - Validated e2e on steve-quarterly-planning dotCMS#108 (routed to bedrock-generic, posted a real review catching all planted bugs). ## 2. Bump pin v3.1.4 → v3.1.5 [ai-workflows#40](dotCMS/ai-workflows#40): surfaces the actual failure reason on `response.failed` (e.g. `server_error` / `Engine not found`) in the sticky + job log instead of a generic "failed before producing output", and retries once on transient failures. This is exactly what made the gpt-5.5 outage diagnosable. Switch back to a mantle model by setting `model_id` to an `openai.*` id once AWS restores gpt-5.5. Closes: dotCMS#36181 --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
While investigating a reproducible "❌ Codex Review failed — job failed before producing output" on dotCMS/core PR #35987, a direct probe showed the real cause:
openai.gpt-5.5on bedrock-mantle is currently 500-ing on every request (even a trivial "hello world"), in both us-east-1 and us-east-2, withresponse.error = server_error.gpt-5.4andgpt-oss-120bare healthy in the same account/region/auth — so it's an AWS-side outage of the gpt-5.5 model, not the PR, the diff, or our executor.The executor did surface it as a red ✗ + failure sticky (thanks to v3.1.4), but the sticky said nothing about why — it discarded the
response.error. That's the gap this fixes.Changes
response.error(code + message) from theresponse.failedevent — printed as a::errorjob annotation and shown in the ❌ sticky, with a note that aserver_erroris typically an AWS/mantle-side issue, not the PR.response.failed(same params). Won't rescue a full model outage (every call fails then), but recovers one-off 5xx blips.failedoutcome →## ❌ Codex Review — model service errorheader + reason; job fails (red ✗). The fail-job step now fires for any non-(ok|truncated)outcome.resp is None).No consumer interface change. → release as v3.1.5.
Out of scope (follow-up)
Auto-fallback to a secondary model when the primary is down — tracked separately.
Validation
mantle_review.pycompiles.server_error: ...and the job to go red.