diff --git a/.github/workflows/claude-orchestrator.yml b/.github/workflows/claude-orchestrator.yml index 6b4240a..60ac906 100644 --- a/.github/workflows/claude-orchestrator.yml +++ b/.github/workflows/claude-orchestrator.yml @@ -219,9 +219,8 @@ jobs: with: model_id: ${{ inputs.model_id }} bedrock_role_arn: ${{ inputs.bedrock_role_arn }} - # aws_region defaults to us-east-1; the codex executor remaps that to us-east-2, - # where GPT-5.5/5.4 are served (the mantle endpoint exists in us-east-1 but the - # models do not yet). Consumers only set model_id. + # aws_region defaults to us-east-1, where GPT-5.5/5.4 are served (also us-east-2; + # GPT-5.4 also us-west-2). Consumers only set model_id. aws_region: ${{ inputs.aws_region }} prompt: ${{ inputs.prompt }} sticky_namespace: ${{ inputs.sticky_namespace }} diff --git a/.github/workflows/codex-executor.yml b/.github/workflows/codex-executor.yml index 985eead..aa712d1 100644 --- a/.github/workflows/codex-executor.yml +++ b/.github/workflows/codex-executor.yml @@ -2,10 +2,17 @@ # Codex Executor Workflow (Reusable) # # PURPOSE: Reviews PRs using OpenAI GPT/Codex models (GPT-5.5, GPT-5.4) served by the -# AWS **bedrock-mantle** endpoint — the OpenAI Responses API at -# https://bedrock-mantle.{region}.api.aws/v1. These models are NOT on bedrock-runtime: -# there is no InvokeModel/Converse, so the generic Bedrock executor cannot reach them. -# Maintains the same auto-updating sticky comment as the other executors. +# AWS **bedrock-mantle** endpoint via the OpenAI Responses API. These models are NOT on +# bedrock-runtime: there is no InvokeModel/Converse, so the generic Bedrock executor +# cannot reach them. Maintains the same auto-updating sticky comment as the other executors. +# +# BASE PATH (critical): bedrock-mantle serves the two OpenAI families on DIFFERENT +# OpenAI-compatible base paths on the same host. The frontier GPT-5.x / Codex models are +# served under **/openai/v1** (per the AWS launch docs); the open-weight gpt-oss-* models +# are served under **/v1**. They are mutually exclusive — verified live 2026-06-11: +# gpt-5.5/5.4 reject /v1 ("model does not support the '/v1/responses' API") and gpt-oss-120b +# rejects /openai/v1. So we pick the path from the model id (see mantle_review.py). Sending +# every model to /v1 (the pre-fix behavior) is why GPT-5.5/5.4 appeared "unavailable" (#34). # # AUTH: OIDC -> assumed role -> a SHORT-TERM Bedrock bearer token minted from that session # (aws-bedrock-token-generator `provide_token()`), passed to the OpenAI SDK. The OpenAI SDK @@ -20,11 +27,9 @@ # the whole response and looks like a 60-100s hang. We stream the Responses API and accumulate # response.output_text.delta events. max_output_tokens does NOT cap reasoning tokens. # -# REGION: the bedrock-mantle ENDPOINT exists in many regions including us-east-1, BUT the -# GPT-5.5/5.4 MODELS are currently served only in us-east-2 — verified live via the Models -# API (us-east-1 lists gpt-oss but no gpt-5*; us-east-2 lists openai.gpt-5.5 / openai.gpt-5.4). -# So this executor remaps the us-east-1 default to us-east-2 where the models live. GPT-5.4 is -# also offered in us-west-2. (Re-check the Models API if AWS expands GPT-5.x to us-east-1.) +# REGION: the requested aws_region is used as-is. GPT-5.5/5.4 are served in us-east-1 and +# us-east-2 (GPT-5.4 also us-west-2); gpt-oss is in all of them — so the orchestrator's +# us-east-1 default works for every model. Verified live 2026-06-11. # # DATA RETENTION: the Responses API defaults store=true, which retains input+output for 30 # days in-region for previous_response_id chaining. Code review is single-shot, so we send @@ -51,7 +56,7 @@ on: required: true type: string aws_region: - description: 'AWS region for the mantle endpoint. The us-east-1 default is remapped to us-east-2, where GPT-5.5/5.4 are served (the endpoint exists in us-east-1 but the models do not yet).' + description: 'AWS region for the mantle endpoint, used as-is. GPT-5.5/5.4 are served in us-east-1 and us-east-2 (GPT-5.4 also us-west-2); gpt-oss in all.' required: false type: string default: 'us-east-1' @@ -83,6 +88,11 @@ on: required: false type: string default: 'medium' + mantle_api_path: + description: 'Override the mantle OpenAI-compat base path. Empty (default) auto-selects: /openai/v1 for GPT-5.x/Codex, /v1 for gpt-oss-*. Set only if AWS changes the routing.' + required: false + type: string + default: '' timeout_minutes: description: 'Job timeout in minutes. Reasoning models stream slowly; default is generous.' required: false @@ -110,23 +120,6 @@ jobs: # Default uses the model id so different models naturally get different stickies. STICKY_MARKER: ${{ format('', inputs.sticky_namespace != '' && inputs.sticky_namespace || inputs.model_id) }} steps: - - name: Resolve mantle region - id: region - env: - REQUESTED_REGION: ${{ inputs.aws_region }} - run: | - set -euo pipefail - # The mantle endpoint exists in us-east-1, but GPT-5.5/5.4 are served only in - # us-east-2 (us-east-1 lists gpt-oss but no gpt-5*). Treat the us-east-1 default as - # "send to where the models live" so consumers only set model_id. An explicit - # us-west-2 (valid for GPT-5.4) is honored as-is. - REGION="${REQUESTED_REGION}" - if [ -z "${REGION}" ] || [ "${REGION}" = "us-east-1" ]; then - REGION="us-east-2" - fi - echo "Effective mantle region: ${REGION}" - echo "region=${REGION}" >> "$GITHUB_OUTPUT" - - uses: actions/checkout@v4 with: fetch-depth: 1 @@ -135,7 +128,7 @@ jobs: uses: aws-actions/configure-aws-credentials@v4 with: role-to-assume: ${{ inputs.bedrock_role_arn }} - aws-region: ${{ steps.region.outputs.region }} + aws-region: ${{ inputs.aws_region }} - name: Set up uv uses: astral-sh/setup-uv@v6 @@ -220,7 +213,14 @@ jobs: # mantle region by configure-aws-credentials, so the token is signed for that region. token = provide_token() - client = OpenAI(base_url=f"https://bedrock-mantle.{region}.api.aws/v1", api_key=token) + # bedrock-mantle serves the two OpenAI families on different OpenAI-compatible base + # paths on the same host: frontier GPT-5.x / Codex live under /openai/v1 (per the AWS + # launch docs), open-weight gpt-oss-* under /v1. They reject each other's path, so + # pick by model id. Verified live 2026-06-11 (#34). MANTLE_API_PATH overrides if AWS + # ever unifies them. (Path includes the OpenAI-compat segment; host is mantle.) + api_path = os.environ.get("MANTLE_API_PATH") or ("/v1" if "gpt-oss" in model else "/openai/v1") + client = OpenAI(base_url=f"https://bedrock-mantle.{region}.api.aws{api_path}", api_key=token) + print(f"mantle base path: {api_path} (model: {model})", file=sys.stderr) text_parts, usage = [], None try: @@ -348,7 +348,8 @@ jobs: - name: Invoke bedrock-mantle (OpenAI Responses API, streaming) id: invoke env: - MANTLE_REGION: ${{ steps.region.outputs.region }} + MANTLE_REGION: ${{ inputs.aws_region }} + MANTLE_API_PATH: ${{ inputs.mantle_api_path }} run: | set -euo pipefail # Dependencies (openai SDK + aws-bedrock-token-generator) are declared inline in the diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index e90e706..4d5172d 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -151,11 +151,12 @@ Uses the Bedrock Converse API, which is model-family-agnostic. Maintains its own #### 4. `codex-executor.yml` (OpenAI GPT/Codex via bedrock-mantle) -For `openai.*` models (GPT-5.5, GPT-5.4), which are **not** on bedrock-runtime — there is no `InvokeModel`/`Converse`. They are served only by the separate **bedrock-mantle** endpoint exposing the OpenAI Responses API (`https://bedrock-mantle.{region}.api.aws/v1/responses`). The executor: +For `openai.*` models (GPT-5.5, GPT-5.4), which are **not** on bedrock-runtime — there is no `InvokeModel`/`Converse`. They are served only by the separate **bedrock-mantle** endpoint exposing the OpenAI Responses API. The frontier GPT-5.x/Codex models live under `https://bedrock-mantle.{region}.api.aws/openai/v1/responses`; the open-weight `gpt-oss-*` models live under `…/v1/responses`. The executor: - Calls mantle with the **OpenAI SDK**, authenticated by a **short-term Bedrock bearer token** minted in-process from the assumed-role session via `aws-bedrock-token-generator` (`provide_token()`). The SDK can't consume SigV4 directly, but a short-term key keeps the OIDC-only posture: it's derived from the current STS credentials (no long-lived secret), inherits the role's permissions, expires with the role session (≤1h here, ≤12h cap), is **not a stored resource** (nothing to delete), and is never written to env/disk/logs. No marketplace subscription; no long-term API key. - **Streams** Server-Sent Events and accumulates `response.output_text.delta` chunks. Streaming is mandatory: GPT-5.x reasons before emitting, so a non-streaming call buffers and looks like a 60–100s hang. -- Remaps the orchestrator's `us-east-1` default to **us-east-2**, where GPT-5.5/5.4 are served. The mantle *endpoint* exists in us-east-1, but the *models* are not there yet (verified via the Models API: us-east-1 lists gpt-oss but no gpt-5*). GPT-5.4 also accepts an explicit us-west-2. +- **Selects the OpenAI-compat base path by model id**: `/openai/v1` for frontier GPT-5.x/Codex, `/v1` for `gpt-oss-*`. The two families reject each other's path (`400 validation_error: "does not support the '…/responses' API"`) — verified live 2026-06-11 (#34). Sending all models to `/v1` (the original behavior) is why GPT-5.5/5.4 looked unavailable. A `mantle_api_path` input overrides the auto-selection if AWS unifies the routing. +- Uses the requested region as-is. GPT-5.5/5.4 are served in us-east-1 and us-east-2 (GPT-5.4 also us-west-2) and gpt-oss in all, so the orchestrator's us-east-1 default works for every model (verified live 2026-06-11 — at v3.1.0's authoring GPT-5.x were us-east-2-only, which is why an earlier revision remapped the region; that remap has been removed). - Sends `store: false` on each request for **zero data retention** — the Responses API otherwise defaults `store: true`, retaining input+output for 30 days in-region for `previous_response_id` chaining, which single-shot review doesn't need. - Reuses the same `/tmp` sticky-comment helper and `sticky_namespace` input as the generic executor. Exposes `reasoning_effort` (default `medium`). Note `max_output_tokens` caps only the visible answer, **not** reasoning tokens. diff --git a/CLAUDE.md b/CLAUDE.md index 177c758..a5f08cc 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -15,7 +15,7 @@ The repository implements a reusable workflow architecture with model-aware rout - **Claude Orchestrator** (`.github/workflows/claude-orchestrator.yml`): Lightweight wrapper that handles @claude mention detection AND routes to the appropriate executor based on `model_id`. Consumer repositories call this with `trigger_mode: interactive` or `trigger_mode: automatic`. Exactly one executor runs per call. - **Claude Executor** (`.github/workflows/claude-executor.yml`): Execution engine for Anthropic models — runs `anthropics/claude-code-action@v1` either against the direct Anthropic API (`provider: anthropic-api`, default) or via AWS Bedrock (`provider: anthropic-bedrock`, OIDC + `use_bedrock=true`). - **Bedrock Generic Executor** (`.github/workflows/bedrock-generic-executor.yml`): Execution engine for **any non-Anthropic Bedrock model** (Amazon Nova, Meta Llama, Mistral, Cohere, AI21). Uses the Bedrock Converse API and maintains its own sticky comment via an inlined helper (set up to `/tmp` at job start, so no cross-repo path dependency). -- **Codex Executor** (`.github/workflows/codex-executor.yml`): Execution engine for **OpenAI GPT/Codex models** (`openai.gpt-5.5`, `openai.gpt-5.4`). These are served only by the separate **bedrock-mantle** endpoint (OpenAI Responses API), not bedrock-runtime — so it calls mantle with the **OpenAI SDK** authenticated by a **short-term Bedrock bearer token** minted in-process from the OIDC-assumed-role session (`aws-bedrock-token-generator`), and streams `response.output_text.delta` events. The token is OIDC-derived (no long-lived secret, nothing to clean up, ≤1h via the role session) and never written to env/disk/logs; IAM grants `bedrock-mantle:CallWithBearerToken` scoped to `BearerTokenType=SHORT_TERM`. Streaming is mandatory (GPT-5.x reasons before emitting). Remaps the `us-east-1` default to `us-east-2`, where GPT-5.5/5.4 are served (the mantle endpoint exists in us-east-1 but the models are not there yet — verified via the Models API). Sends `store: false` for zero data retention. Reuses the same `/tmp` sticky-comment helper. See dotCMS/Infrastructure-as-code#7836. +- **Codex Executor** (`.github/workflows/codex-executor.yml`): Execution engine for **OpenAI GPT/Codex models** (`openai.gpt-5.5`, `openai.gpt-5.4`). These are served only by the separate **bedrock-mantle** endpoint (OpenAI Responses API), not bedrock-runtime — so it calls mantle with the **OpenAI SDK** authenticated by a **short-term Bedrock bearer token** minted in-process from the OIDC-assumed-role session (`aws-bedrock-token-generator`), and streams `response.output_text.delta` events. The token is OIDC-derived (no long-lived secret, nothing to clean up, ≤1h via the role session) and never written to env/disk/logs; IAM grants `bedrock-mantle:CallWithBearerToken` scoped to `BearerTokenType=SHORT_TERM`. Streaming is mandatory (GPT-5.x reasons before emitting). **Base path is model-dependent:** frontier GPT-5.x/Codex are served under `/openai/v1`, open-weight `gpt-oss-*` under `/v1` — the executor picks by model id (they reject each other's path; verified live 2026-06-11, #34). Uses the requested region as-is (GPT-5.5/5.4 are served in us-east-1 and us-east-2, GPT-5.4 also us-west-2). Sends `store: false` for zero data retention. Reuses the same `/tmp` sticky-comment helper. See dotCMS/Infrastructure-as-code#7836. - **Deployment Guard** (`.github/workflows/deployment-guard.yml`): Reusable workflow for validating deployment changes with configurable rules. Features organization-based bypass for trusted members, file allowlist validation, image-only change detection, and comprehensive image validation (format, repository, version pattern, registry existence, anti-downgrade logic). ### Multi-model Routing (v3) @@ -27,7 +27,7 @@ The orchestrator picks the executor by inspecting `model_id`: | _(empty / unset)_ | `claude-executor` (`anthropic-api`)| Backward-compat default; requires `ANTHROPIC_API_KEY` secret | | `*.anthropic.*` (e.g. `global.anthropic.claude-sonnet-4-6`) | `claude-executor` (`anthropic-bedrock`) | Requires `bedrock_role_arn` input | | `anthropic.*` (bare) | `claude-executor` (`anthropic-bedrock`) | Requires `bedrock_role_arn` input | -| `openai.*` (e.g. `openai.gpt-5.5`, `openai.gpt-5.4`) | `codex-executor` | Requires `bedrock_role_arn`; mantle path (us-east-2) | +| `openai.*` (e.g. `openai.gpt-5.5`, `openai.gpt-5.4`) | `codex-executor` | Requires `bedrock_role_arn`; mantle `/openai/v1` (gpt-oss → `/v1`) | | Anything else (Nova, Llama, Mistral, …) | `bedrock-generic-executor` | Requires `bedrock_role_arn` input | The matches for the Anthropic and OpenAI families are anchored: `^([a-z]+\.)?anthropic\.` and `^([a-z]+\.)?openai\.` — so a model ID that merely contains the substring `anthropic.`/`openai.` (e.g. `us.not-anthropic.foo`) is **not** misrouted. `openai.*` is checked before the generic fallback.