feat(codex-executor): clear human-facing outcome signaling by sfreudenthaler · Pull Request #39 · dotCMS/ai-workflows

sfreudenthaler · 2026-06-13T15:21:46Z

Problem

Reviewers couldn't reliably tell a real review apart from a non-review. Three gaps in the sticky comment:

Incomplete/empty and truncated results wore the success header (## 🤖 Codex Review) — a skimming reviewer reads the friendly header and misses that no (or only partial) review happened.
Canceled / timed-out runs left the sticky stuck on 🔄 review in progress — the failure step is gated on failure(), which does not fire on cancelled().
Comment-only — no top-level signal. Unlike the backend reviewer (formal PR review), the GPT path posts only a comment; nothing shows in the PR's checks list.

Changes

Outcome flag from mantle_review.py (ok | truncated | incomplete-empty | error), exported as a step output.
Outcome-driven sticky header — status is visible without opening the body:
- ## 🤖 Codex Review (ok) · ## ⚠️ Codex Review — truncated · ## ❌ Codex Review — no output
Fail the job on incomplete-empty — after posting the diagnostic comment, exit non-zero so a red ✗ appears in PR checks. The review is advisory/non-blocking, so this surfaces the outcome without gating merges.
if: cancelled() step rewrites the sticky to ## ⏱️ Codex Review canceled so it never stays stuck on in-progress.

Truncated (partial review present) stays a green job with a ⚠️ header — it's still a usable review. Only a genuine no-output fails the job.

No consumer interface change. → release as v3.1.4.

Validation

YAML parses; embedded mantle_review.py compiles.
E2E on dotCMS/steve-quarterly-planning (linked after tag): force incomplete-empty (tiny budget + high effort) → expect ❌ header + red ✗ job + diagnostic comment.

Reviewers couldn't always tell a review apart from a non-review. Three gaps: - incomplete/empty and truncated results wore the success header (🤖 Codex Review) - canceled/timed-out runs left the sticky stuck on "🔄 review in progress" (failure() does not fire on cancellation) - the GPT path is comment-only — nothing surfaced in the PR's checks list Changes: - mantle_review.py emits an outcome flag (ok | truncated | incomplete-empty | error), exported as a step output. - Sticky header now reflects the outcome: 🤖 Codex Review / ⚠️ truncated / ❌ no output — status is visible without opening the body. - New "Fail job when no review was produced" step: after the diagnostic comment is posted, the job exits non-zero on incomplete-empty so a red ✗ shows in PR checks (advisory/non-blocking review, so it surfaces without gating merges). - New "Report cancellation" step (if: cancelled()) rewrites the sticky to "⏱️ Codex Review canceled" so it never stays stuck on in-progress. No consumer interface change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-06-13T15:21:50Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

## Summary Bumps the `dotCMS/ai-workflows` pin in the AI review workflows from `@v3.1.2` to **`@v3.1.4`** (orchestrator + backend reviewer). Supersedes the intermediate v3.1.3 bump — v3.1.4 includes everything in v3.1.3 plus the outcome-signaling improvements, so we go straight to it. ## What's in v3.1.3 + v3.1.4 **v3.1.3 — silent-failure fix ([ai-workflows#38](dotCMS/ai-workflows#38 ~43% of GPT-5.5 reviews were posting "❌ Codex Review failed — job failed before producing output." Root cause: `max_output_tokens` caps reasoning+answer combined, so medium-effort GPT-5.5 sometimes spent the whole budget reasoning and returned `incomplete` with no text. Fix: budget 2048→8000, handle `response.incomplete`, retry once with a bigger budget + lighter reasoning. **v3.1.4 — clear outcome signaling ([ai-workflows#39](dotCMS/ai-workflows#39 - Sticky header reflects the outcome: `🤖 Codex Review` / `⚠️ truncated` / `❌ no output` / `⏱️ canceled` - The job **fails (red ✗ in checks)** when no review is produced — surfaces the outcome without gating merges (advisory review) - Canceled / timed-out runs rewrite the sticky to `⏱️ Codex Review canceled` instead of leaving it stuck on `🔄 in progress` ## Validation - v3.1.3 before/after e2e: steve-quarterly-planning dotCMS#105 (@v3.1.2 failed, @v3.1.3 recovered) - v3.1.4 signaling e2e: steve-quarterly-planning dotCMS#106 (🤖+green confirmed; ⏱️ cancellation confirmed; retry makes the ❌ path a hardened safety net) Closes: dotCMS#36158 --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

sfreudenthaler requested review from a team as code owners June 13, 2026 15:21

sfreudenthaler merged commit d41c857 into main Jun 13, 2026
3 checks passed

sfreudenthaler deleted the fix/codex-outcome-signaling branch June 13, 2026 15:24

This was referenced Jun 13, 2026

chore(ai-reviews): bump ai-workflows pin to v3.1.4 dotCMS/core#36159

Merged

Enhancement: codex-executor auto-fallback to a secondary model when the primary is down #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(codex-executor): clear human-facing outcome signaling#39

feat(codex-executor): clear human-facing outcome signaling#39
sfreudenthaler merged 1 commit into
mainfrom
fix/codex-outcome-signaling

sfreudenthaler commented Jun 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sfreudenthaler commented Jun 13, 2026

Problem

Changes

Validation

Uh oh!

chatgpt-codex-connector Bot commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant