Skip to content

feat(codex-executor): clear human-facing outcome signaling#39

Merged
sfreudenthaler merged 1 commit into
mainfrom
fix/codex-outcome-signaling
Jun 13, 2026
Merged

feat(codex-executor): clear human-facing outcome signaling#39
sfreudenthaler merged 1 commit into
mainfrom
fix/codex-outcome-signaling

Conversation

@sfreudenthaler

Copy link
Copy Markdown
Member

Problem

Reviewers couldn't reliably tell a real review apart from a non-review. Three gaps in the sticky comment:

  1. Incomplete/empty and truncated results wore the success header (## 🤖 Codex Review) — a skimming reviewer reads the friendly header and misses that no (or only partial) review happened.
  2. Canceled / timed-out runs left the sticky stuck on 🔄 review in progress — the failure step is gated on failure(), which does not fire on cancelled().
  3. Comment-only — no top-level signal. Unlike the backend reviewer (formal PR review), the GPT path posts only a comment; nothing shows in the PR's checks list.

Changes

  • Outcome flag from mantle_review.py (ok | truncated | incomplete-empty | error), exported as a step output.
  • Outcome-driven sticky header — status is visible without opening the body:
    • ## 🤖 Codex Review (ok) · ## ⚠️ Codex Review — truncated · ## ❌ Codex Review — no output
  • Fail the job on incomplete-emptyafter posting the diagnostic comment, exit non-zero so a red ✗ appears in PR checks. The review is advisory/non-blocking, so this surfaces the outcome without gating merges.
  • if: cancelled() step rewrites the sticky to ## ⏱️ Codex Review canceled so it never stays stuck on in-progress.

Truncated (partial review present) stays a green job with a ⚠️ header — it's still a usable review. Only a genuine no-output fails the job.

No consumer interface change. → release as v3.1.4.

Validation

  • YAML parses; embedded mantle_review.py compiles.
  • E2E on dotCMS/steve-quarterly-planning (linked after tag): force incomplete-empty (tiny budget + high effort) → expect ❌ header + red ✗ job + diagnostic comment.

Reviewers couldn't always tell a review apart from a non-review. Three gaps:
- incomplete/empty and truncated results wore the success header (🤖 Codex Review)
- canceled/timed-out runs left the sticky stuck on "🔄 review in progress"
  (failure() does not fire on cancellation)
- the GPT path is comment-only — nothing surfaced in the PR's checks list

Changes:
- mantle_review.py emits an outcome flag (ok | truncated | incomplete-empty | error),
  exported as a step output.
- Sticky header now reflects the outcome: 🤖 Codex Review / ⚠️ truncated /
  ❌ no output — status is visible without opening the body.
- New "Fail job when no review was produced" step: after the diagnostic comment is
  posted, the job exits non-zero on incomplete-empty so a red ✗ shows in PR checks
  (advisory/non-blocking review, so it surfaces without gating merges).
- New "Report cancellation" step (if: cancelled()) rewrites the sticky to
  "⏱️ Codex Review canceled" so it never stays stuck on in-progress.

No consumer interface change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sfreudenthaler sfreudenthaler requested review from a team as code owners June 13, 2026 15:21
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@sfreudenthaler sfreudenthaler merged commit d41c857 into main Jun 13, 2026
3 checks passed
@sfreudenthaler sfreudenthaler deleted the fix/codex-outcome-signaling branch June 13, 2026 15:24
riccardoruocco pushed a commit to riccardoruocco/core that referenced this pull request Jun 16, 2026
## Summary

Bumps the `dotCMS/ai-workflows` pin in the AI review workflows from
`@v3.1.2` to **`@v3.1.4`** (orchestrator + backend reviewer). Supersedes
the intermediate v3.1.3 bump — v3.1.4 includes everything in v3.1.3 plus
the outcome-signaling improvements, so we go straight to it.

## What's in v3.1.3 + v3.1.4

**v3.1.3 — silent-failure fix
([ai-workflows#38](dotCMS/ai-workflows#38
~43% of GPT-5.5 reviews were posting "❌ Codex Review failed — job failed
before producing output." Root cause: `max_output_tokens` caps
reasoning+answer combined, so medium-effort GPT-5.5 sometimes spent the
whole budget reasoning and returned `incomplete` with no text. Fix:
budget 2048→8000, handle `response.incomplete`, retry once with a bigger
budget + lighter reasoning.

**v3.1.4 — clear outcome signaling
([ai-workflows#39](dotCMS/ai-workflows#39
- Sticky header reflects the outcome: `🤖 Codex Review` / `⚠️ truncated`
/ `❌ no output` / `⏱️ canceled`
- The job **fails (red ✗ in checks)** when no review is produced —
surfaces the outcome without gating merges (advisory review)
- Canceled / timed-out runs rewrite the sticky to `⏱️ Codex Review
canceled` instead of leaving it stuck on `🔄 in progress`

## Validation

- v3.1.3 before/after e2e: steve-quarterly-planning dotCMS#105 (@v3.1.2
failed, @v3.1.3 recovered)
- v3.1.4 signaling e2e: steve-quarterly-planning dotCMS#106 (🤖+green
confirmed; ⏱️ cancellation confirmed; retry makes the ❌ path a hardened
safety net)

Closes: dotCMS#36158

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant