fix: gate prompt-cache features by capability, not exact model name#7985
fix: gate prompt-cache features by capability, not exact model name#7985sumleo wants to merge 1 commit into
Conversation
prompt_cache_key was gated by an exact-name set (_CACHE_KEY_MODELS =
{'gpt-5.4', 'gpt-5.4-mini'}) and prompt_cache_retention='24h' was gated
by 'model == gpt-5.1'. Both are brittle: they silently stop applying the
moment a model is renamed or a new family member ships, and they are
already inconsistent with OpenAI's matrix (e.g. gpt-5.4 is 24h-eligible
but never received retention).
Replace the exact-name gates with capability detection by model family:
- _supports_prompt_cache_key(): gpt-4o / gpt-4.1 / gpt-5.x / o-series
- _supports_cache_retention(): gpt-5.x / o-series
prompt_cache_key routing (get_llm) and prompt_cache_retention (both the
default and BYOK OpenAI factories) now key off these helpers, so every
cache-capable model gets the feature regardless of point release.
Add regression tests: a renamed gpt-5 family model still gets retention
and cache_key routing; non-cache-capable models (e.g. Gemini) are
unaffected; gpt-5.1 stays retention-capable. Existing source-coupled
assertions updated to the capability-based wiring.
Greptile SummaryThis PR replaces three brittle exact-model-name gates for OpenAI prompt-cache features with two capability-based prefix helpers (
Confidence Score: 4/5Safe to merge; the capability-based helpers are correct for all current model names, and all three gate sites are updated consistently. The core logic is sound: existing models (gpt-5.1, gpt-5.4, gpt-5.4-mini) retain their previous behavior, gpt-5.4 correctly gains the previously missing 24h retention, and new families (gpt-4o, gpt-4.1, o-series) are onboarded as documented. The o-series prefixes (o1, o3, o4) are still individually listed rather than covered by a single root, so a future o5 or o6 model would silently miss both features until manually added — the same class of forward-compatibility gap the PR addresses for the gpt-5 family. backend/utils/llm/clients.py — the _CACHE_KEY_MODEL_PREFIXES and _CACHE_RETENTION_MODEL_PREFIXES tuples for the o-series entries. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[get_llm / _create_byok_client / _get_or_create_openai_llm] --> B{_supports_cache_retention?}
B -- Yes --> C[extra_body: prompt_cache_retention=24h]
B -- No --> D[No retention header]
A --> E{cache_key provided AND _supports_prompt_cache_key?}
E -- Yes --> F[result.bind prompt_cache_key=cache_key]
E -- No --> G[Return plain LLM]
subgraph Prefixes
H["_CACHE_RETENTION_MODEL_PREFIXES\n('gpt-5', 'o1', 'o3', 'o4')"]
I["_CACHE_KEY_MODEL_PREFIXES\n('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4')"]
end
B -.->|model.startswith| H
E -.->|model.startswith| I
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[get_llm / _create_byok_client / _get_or_create_openai_llm] --> B{_supports_cache_retention?}
B -- Yes --> C[extra_body: prompt_cache_retention=24h]
B -- No --> D[No retention header]
A --> E{cache_key provided AND _supports_prompt_cache_key?}
E -- Yes --> F[result.bind prompt_cache_key=cache_key]
E -- No --> G[Return plain LLM]
subgraph Prefixes
H["_CACHE_RETENTION_MODEL_PREFIXES\n('gpt-5', 'o1', 'o3', 'o4')"]
I["_CACHE_KEY_MODEL_PREFIXES\n('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4')"]
end
B -.->|model.startswith| H
E -.->|model.startswith| I
Reviews (1): Last reviewed commit: "fix: gate prompt-cache features by capab..." | Re-trigger Greptile |
| _CACHE_KEY_MODEL_PREFIXES = ('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4') | ||
|
|
||
| # Family prefixes whose models support 24h prompt-cache retention. | ||
| _CACHE_RETENTION_MODEL_PREFIXES = ('gpt-5', 'o1', 'o3', 'o4') |
There was a problem hiding this comment.
O-series prefixes remain individually enumerated
The o-series entries (o1, o3, o4) are still listed one-by-one — the same pattern this PR correctly fixes for the gpt-5 family. OpenAI skipped o2 entirely and has been shipping new o-series models (o1 → o3 → o4) at a steady pace; a future o5 or o6 model would silently receive neither prompt_cache_key routing nor prompt_cache_retention until someone manually adds the prefix. Consolidating to a single 'o' prefix may be too broad (non-OpenAI models), but using a narrow shared root like 'o1-'/'o3-' won't help either. A pragmatic middle ground would be to add a comment flagging this and pairing each new o-series release with a prefix update, or to derive the o-series check from a small set such as ('gpt-5', 'gpt-4.1', 'gpt-4o', 'o') with an additional digit guard.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
kodjima33
left a comment
There was a problem hiding this comment.
Backend prompt-cache gating by capability — approve only, Nik's LLM area
|
Hi @josancamon19, gentle nudge on this when you have a moment. It's a small, self-contained prompt-caching fix, and I'm happy to rebase or tweak anything if that would make review easier. Thanks for the project and your time! |
OpenAI prompt-cache features in
backend/utils/llm/clients.pywere gated by exact model names:prompt_cache_keyrouting was limited to_CACHE_KEY_MODELS = {'gpt-5.4', 'gpt-5.4-mini'}(inget_llm).prompt_cache_retention: "24h"was gated bymodel == 'gpt-5.1'(in both_get_or_create_openai_llmand_create_byok_client).These exact-name gates are brittle: they silently stop applying the moment a model is renamed or a new family member ships, and they are already inconsistent with OpenAI's matrix (e.g.
gpt-5.4is 24h-retention-eligible but never received retention under the oldgpt-5.1-only check).Fix
Detect the capability by model family instead of matching exact names:
_supports_prompt_cache_key(model)— gpt-4o / gpt-4.1 / gpt-5.x / o-series_supports_cache_retention(model)— gpt-5.x / o-seriesAll three gate sites now use these helpers, so every cache-capable model gets the feature regardless of point release. Behavior for existing models is preserved (gpt-5.1 still gets 24h retention; gpt-5.4 / gpt-5.4-mini still get
prompt_cache_key), andgpt-5.4now also correctly receives retention.Tests
Added regression coverage in
tests/unit/test_prompt_cache_integration.py:test_renamed_gpt5_model_still_gets_cache_features— a future/renamed gpt-5 family model still getsprompt_cache_retention=24hand is eligible forprompt_cache_key.test_non_cache_capable_model_is_unchanged— Gemini-style names get neither; gpt-4.1-mini gets routing but not 24h retention.test_get_llm_binds_cache_key_for_cache_capable_models—get_llmbindsprompt_cache_keyfor cache-capable models.Existing source-coupled assertions in
test_prompt_caching.py/test_prompt_cache_optimization.pywere updated to validate the capability-based wiring (including a check thatgpt-5.1stays retention-capable).Ran locally (targeted, via the existing stubbed unit-test harness):
Files formatted with
black --line-length 120 --skip-string-normalization.