Skip to content

fix(desktop): enable Anthropic prompt caching for macOS chat#7951

Open
Git-on-my-level wants to merge 2 commits into
BasedHardware:mainfrom
Git-on-my-level:fix/macos-anthropic-cache-control
Open

fix(desktop): enable Anthropic prompt caching for macOS chat#7951
Git-on-my-level wants to merge 2 commits into
BasedHardware:mainfrom
Git-on-my-level:fix/macos-anthropic-cache-control

Conversation

@Git-on-my-level

Copy link
Copy Markdown
Collaborator

Summary

Enables Anthropic prompt caching for the macOS desktop Rust OpenAI-compatible chat completions translator.

The translator previously sent the extracted system / developer prompt to Anthropic as a plain string:

"system": "...large prompt..."

Anthropic prompt caching requires cache control on content blocks, so this change serializes system prompts as:

"system": [
  {
    "type": "text",
    "text": "...large prompt...",
    "cache_control": { "type": "ephemeral" }
  }
]

No behavior changes when no system/developer prompt is present; the system field remains omitted.

Cost / production impact

This addresses a cost optimization finding:

  • A macOS Anthropic API key showed 0.0% prompt cache hit rate.
  • The desktop system prompt is large and repetitive — every request paid full uncached input-token price.
  • Current estimated macOS Anthropic spend: ~$2,623/month.
  • Estimated savings from enabling prompt caching: ~$723–$1,157/month expected.

User impact: neutral-to-positive (same behavior, lower cost after cache warmup, potentially lower latency on cache hits).

Implementation details

  • Changes AnthropicRequest.system from Option<String> to Option<Vec<AnthropicSystemContentBlock>>.
  • Adds serializable Rust structs for AnthropicSystemContentBlock and AnthropicCacheControl.
  • Wraps extracted system / developer prompt in a single text block with cache_control: { type: "ephemeral" }.
  • Filters empty/whitespace system prompts (Anthropic rejects empty cached blocks with 400).
  • Adds tests for exact JSON serialization, developer-role extraction, and empty-prompt edge case.

Validation

Run from desktop/macos/Backend-Rust:

cargo check
cargo test test_translate_request_ -- --nocapture
cargo test -- --nocapture

Results:

  • cargo check ✅ passed
  • targeted chat-completion translation tests ✅ 13 passed / 0 failed
  • full Rust desktop backend test suite ✅ 257 passed / 0 failed

Rollback

Safe rollback: revert these commits to return system serialization to the previous plain-string shape.

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR enables Anthropic prompt caching for the macOS desktop Rust backend by changing AnthropicRequest.system from a plain Option<String> to Option<Vec<AnthropicSystemContentBlock>>, wrapping every non-empty system prompt in a content block with cache_control: { type: "ephemeral" }. An empty/whitespace guard prevents sending blank cached blocks that Anthropic would reject with a 400.

  • Model change (models/chat_completions.rs): introduces AnthropicSystemContentBlock and AnthropicCacheControl structs; both use plain String for their type discriminants rather than enums.
  • Route change (routes/chat_completions.rs): translate_request now wraps the extracted system prompt in a cached content block via and_then, with a trim() + empty-check guard; four new tests cover serialisation, the no-system case, and empty/whitespace/None edge cases.

Confidence Score: 4/5

Safe to merge — the change is narrow, well-tested, and correctly handles the empty-prompt edge case that would cause a 400 from Anthropic.

The core logic is correct: the trim + empty-check guard prevents the Anthropic 400 on blank cached blocks, the skip_serializing_if on system means the field is omitted when absent, and existing tests were updated alongside four new targeted tests. The only rough edge is that block_type and cache_type are raw strings rather than typed enums — a typo would be invisible to the compiler and only surface as a runtime API error. That is non-blocking but worth addressing before the pattern spreads to more content block types.

desktop/macos/Backend-Rust/src/models/chat_completions.rs — the two new structs use plain String discriminants.

Important Files Changed

Filename Overview
desktop/macos/Backend-Rust/src/models/chat_completions.rs Adds AnthropicSystemContentBlock and AnthropicCacheControl structs; changes AnthropicRequest.system from Option to Option<Vec>. Both new structs use plain String for the "type" discriminants rather than typed enums, giving up compile-time safety for those values.
desktop/macos/Backend-Rust/src/routes/chat_completions.rs Updates translate_request() to wrap the system prompt in a cached content block, adds trim + empty-string guard (correct: Anthropic 400s on empty cached blocks), and adds four new tests covering serialization, the no-system case, and the empty/whitespace/None edge cases. Logic and tests look sound.

Sequence Diagram

sequenceDiagram
    participant Client as macOS Client (OpenAI API)
    participant Router as chat_completions route
    participant Translator as translate_request()
    participant Anthropic as Anthropic API

    Client->>Router: POST /v1/chat/completions (OpenAI format w/ system message)
    Router->>Translator: translate_request(req, model)
    Translator->>Translator: Extract system/developer message text
    Translator->>Translator: trim() → empty check
    alt system prompt non-empty
        Translator->>Translator: "Wrap in AnthropicSystemContentBlock { type: text, cache_control: ephemeral }"
        Translator-->>Router: "AnthropicRequest { system: Some(Vec[block]), ... }"
        Router->>Anthropic: "POST /messages system:[{ type, text, cache_control }]"
        Anthropic-->>Router: Response (cached after first call)
    else system prompt empty/whitespace/absent
        Translator-->>Router: "AnthropicRequest { system: None, ... }"
        Router->>Anthropic: POST /messages (no system field)
        Anthropic-->>Router: Response
    end
    Router-->>Client: OpenAI-format response
Loading

Reviews (1): Last reviewed commit: "fix: skip caching empty/whitespace syste..." | Re-trigger Greptile

Comment on lines +183 to +195
#[derive(Debug, Clone, Serialize, PartialEq, Eq)]
pub struct AnthropicSystemContentBlock {
#[serde(rename = "type")]
pub block_type: String,
pub text: String,
pub cache_control: AnthropicCacheControl,
}

#[derive(Debug, Clone, Serialize, PartialEq, Eq)]
pub struct AnthropicCacheControl {
#[serde(rename = "type")]
pub cache_type: String,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The block_type and cache_type fields are plain String, so a typo (e.g. "Ephemeral" or "Text") compiles cleanly but produces a 400 from Anthropic at runtime. Since these fields are discriminants with a fixed, known set of valid values, typed enums would catch mistakes at compile time with zero runtime cost.

Suggested change
#[derive(Debug, Clone, Serialize, PartialEq, Eq)]
pub struct AnthropicSystemContentBlock {
#[serde(rename = "type")]
pub block_type: String,
pub text: String,
pub cache_control: AnthropicCacheControl,
}
#[derive(Debug, Clone, Serialize, PartialEq, Eq)]
pub struct AnthropicCacheControl {
#[serde(rename = "type")]
pub cache_type: String,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum AnthropicContentBlockType {
Text,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum AnthropicCacheControlType {
Ephemeral,
}
#[derive(Debug, Clone, Serialize, PartialEq, Eq)]
pub struct AnthropicSystemContentBlock {
#[serde(rename = "type")]
pub block_type: AnthropicContentBlockType,
pub text: String,
pub cache_control: AnthropicCacheControl,
}
#[derive(Debug, Clone, Serialize, PartialEq, Eq)]
pub struct AnthropicCacheControl {
#[serde(rename = "type")]
pub cache_type: AnthropicCacheControlType,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants