feat(codex-executor): clear human-facing outcome signaling#39
Merged
Conversation
Reviewers couldn't always tell a review apart from a non-review. Three gaps: - incomplete/empty and truncated results wore the success header (🤖 Codex Review) - canceled/timed-out runs left the sticky stuck on "🔄 review in progress" (failure() does not fire on cancellation) - the GPT path is comment-only — nothing surfaced in the PR's checks list Changes: - mantle_review.py emits an outcome flag (ok | truncated | incomplete-empty | error), exported as a step output. - Sticky header now reflects the outcome: 🤖 Codex Review /⚠️ truncated / ❌ no output — status is visible without opening the body. - New "Fail job when no review was produced" step: after the diagnostic comment is posted, the job exits non-zero on incomplete-empty so a red ✗ shows in PR checks (advisory/non-blocking review, so it surfaces without gating merges). - New "Report cancellation" step (if: cancelled()) rewrites the sticky to "⏱️ Codex Review canceled" so it never stays stuck on in-progress. No consumer interface change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
This was referenced Jun 13, 2026
riccardoruocco
pushed a commit
to riccardoruocco/core
that referenced
this pull request
Jun 16, 2026
## Summary Bumps the `dotCMS/ai-workflows` pin in the AI review workflows from `@v3.1.2` to **`@v3.1.4`** (orchestrator + backend reviewer). Supersedes the intermediate v3.1.3 bump — v3.1.4 includes everything in v3.1.3 plus the outcome-signaling improvements, so we go straight to it. ## What's in v3.1.3 + v3.1.4 **v3.1.3 — silent-failure fix ([ai-workflows#38](dotCMS/ai-workflows#38 ~43% of GPT-5.5 reviews were posting "❌ Codex Review failed — job failed before producing output." Root cause: `max_output_tokens` caps reasoning+answer combined, so medium-effort GPT-5.5 sometimes spent the whole budget reasoning and returned `incomplete` with no text. Fix: budget 2048→8000, handle `response.incomplete`, retry once with a bigger budget + lighter reasoning. **v3.1.4 — clear outcome signaling ([ai-workflows#39](dotCMS/ai-workflows#39 - Sticky header reflects the outcome: `🤖 Codex Review` / `⚠️ truncated` / `❌ no output` / `⏱️ canceled` - The job **fails (red ✗ in checks)** when no review is produced — surfaces the outcome without gating merges (advisory review) - Canceled / timed-out runs rewrite the sticky to `⏱️ Codex Review canceled` instead of leaving it stuck on `🔄 in progress` ## Validation - v3.1.3 before/after e2e: steve-quarterly-planning dotCMS#105 (@v3.1.2 failed, @v3.1.3 recovered) - v3.1.4 signaling e2e: steve-quarterly-planning dotCMS#106 (🤖+green confirmed; ⏱️ cancellation confirmed; retry makes the ❌ path a hardened safety net) Closes: dotCMS#36158 --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Reviewers couldn't reliably tell a real review apart from a non-review. Three gaps in the sticky comment:
## 🤖 Codex Review) — a skimming reviewer reads the friendly header and misses that no (or only partial) review happened.🔄 review in progress— the failure step is gated onfailure(), which does not fire oncancelled().Changes
mantle_review.py(ok | truncated | incomplete-empty | error), exported as a step output.## 🤖 Codex Review(ok) ·## ⚠️ Codex Review — truncated·## ❌ Codex Review — no outputincomplete-empty— after posting the diagnostic comment, exit non-zero so a red ✗ appears in PR checks. The review is advisory/non-blocking, so this surfaces the outcome without gating merges.if: cancelled()step rewrites the sticky to## ⏱️ Codex Review canceledso it never stays stuck on in-progress.Truncated (partial review present) stays a green job with a⚠️ header — it's still a usable review. Only a genuine no-output fails the job.
No consumer interface change. → release as v3.1.4.
Validation
mantle_review.pycompiles.dotCMS/steve-quarterly-planning(linked after tag): forceincomplete-empty(tiny budget + high effort) → expect ❌ header + red ✗ job + diagnostic comment.