Skip to content

fix: keep current datetime out of the cached system prefix#7984

Open
sumleo wants to merge 2 commits into
BasedHardware:mainfrom
sumleo:fix/datetime-out-of-cached-prefix
Open

fix: keep current datetime out of the cached system prefix#7984
sumleo wants to merge 2 commits into
BasedHardware:mainfrom
sumleo:fix/datetime-out-of-cached-prefix

Conversation

@sumleo

@sumleo sumleo commented Jun 17, 2026

Copy link
Copy Markdown

The agentic chat system prompt is wrapped in a single Anthropic cache_control breakpoint (backend/utils/retrieval/agentic.py), but the system prompt embeds the current datetime with microsecond ISO precision near the top (_get_agentic_qa_prompt in backend/utils/llm/chat.py, via current_datetime_iso / current_datetime_str).

Because datetime.now(...).isoformat() changes on every request, the cached prefix is byte-different each time, so the Anthropic prompt cache never hits (effectively 0% cache read) for the agent loop.

Fix

Move the live datetime out of the cached system prefix and into the user turn:

  • chat.py: _get_agentic_qa_prompt no longer embeds the live time. It carries a stable placeholder that points to the user turn, so the datetime instructions still read correctly. New helpers get_current_datetime_block() and get_user_timezone() keep the same timezone resolution (and UTC fallback) as before.
  • agentic.py: the current-datetime block is injected into the latest user message — i.e. after the cache_control breakpoint — so the model still receives the current time, but the static system bytes stop changing.

This aligns the Anthropic explicit-breakpoint path with the intent already documented in chat.py for the OpenAI auto-cache path (push dynamic content to the end so the static prefix stays cacheable). The static prefix and per-user sections are otherwise unchanged.

Tests

Extended tests/unit/test_prompt_cache_integration.py with regression coverage:

  • test_system_prompt_is_time_invariant — the full system prompt is byte-identical across two calls at different wall-clock times, and no microsecond timestamp leaks into it.
  • test_current_datetime_block_carries_live_time — the live time is present in the block destined for the user turn.
  • test_datetime_injected_into_user_turn_not_system — the block is prepended to the latest user turn only.

Ran locally (targeted, via the existing stubbed unit-test harness):

pytest tests/unit/test_prompt_cache_optimization.py \
       tests/unit/test_prompt_caching.py \
       tests/unit/test_prompt_cache_integration.py
# 53 passed

Files formatted with black --line-length 120 --skip-string-normalization.

The agentic chat system prompt is wrapped in a single Anthropic
cache_control breakpoint, but it embedded the current datetime with
microsecond ISO precision near the top of the prompt. That made the
cached prefix byte-different on every request, so the prompt cache never
hit (0% cache read).

Move the live datetime out of the system prompt and into the user turn:

- chat.py: _get_agentic_qa_prompt no longer embeds the live time; it
  carries a stable placeholder that points at the user turn. Add
  get_current_datetime_block() / get_user_timezone() helpers so the
  datetime is built with the same timezone resolution as before.
- agentic.py: inject the current-datetime block into the latest user
  message (after the cache breakpoint) so the model still receives the
  current time without invalidating the cached prefix.

This aligns the Anthropic explicit-breakpoint path with the existing
intent (already documented for the OpenAI auto-cache path) of pushing
dynamic content to the end.

Add a regression test asserting the system prompt is time-invariant and
that the live datetime is delivered via the user turn.
@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a prompt-cache miss caused by the current datetime (with microsecond ISO precision) being embedded directly in the Anthropic cache_control system prefix, which made the cached bytes different on every request. The fix moves the live datetime out of the system prompt and into the latest user turn via two new helpers (get_current_datetime_block, get_user_timezone) and a new _inject_current_datetime function in agentic.py.

  • chat.py: _get_agentic_qa_prompt now substitutes a stable placeholder for the live datetime; two new module-level helpers encapsulate timezone resolution and datetime block construction.
  • agentic.py: _inject_current_datetime prepends the datetime block to the last user message (after _messages_to_anthropic), keeping the cache_control system prefix byte-stable while the model still receives the current time.
  • test_prompt_cache_integration.py: three new regression tests assert system-prompt time-invariance, live-time presence in the datetime block, and correct injection into the user turn.

Confidence Score: 4/5

Safe to merge — the core caching fix is correct and well-tested; the findings are prompt-quality and efficiency concerns that do not block correctness.

The caching fix itself is sound: datetime is removed from the system prefix and injected into the user turn, with three focused regression tests covering the key invariants. The main concern is that {current_datetime_iso} still appears inside concrete worked examples in tool_datetime_rules, which now renders as a placeholder string rather than an illustrative ISO timestamp, making the example incoherent. A secondary concern is that _inject_current_datetime silently falls back to appending a bare user message when no string-content user turn is found, which could cause unexpected conversation-structure issues if message formats ever include list content. Neither of these blocks the intended cache-hit improvement.

backend/utils/llm/chat.py — the tool_datetime_rules worked examples still reference the placeholder; they should use a static illustrative date instead.

Important Files Changed

Filename Overview
backend/utils/llm/chat.py Extracts datetime logic into two new helpers (get_user_timezone, get_current_datetime_block), replaces live timestamps with CURRENT_DATETIME_PLACEHOLDER in _get_agentic_qa_prompt. The core fix is correct, but the {current_datetime_iso} placeholder now appears inside concrete worked examples in tool_datetime_rules, breaking their illustrative value. Also introduces a second notification_db.get_user_time_zone call per request.
backend/utils/retrieval/agentic.py Adds _inject_current_datetime helper and injects the datetime block into the latest user turn before passing messages to the Anthropic API. Logic is sound for the current string-content message format; the function only handles str content and silently falls back to appending a new user message if none is found.
backend/tests/unit/test_prompt_cache_integration.py Adds three new regression tests covering time-invariance of the system prompt, live-time presence in the datetime block, and correct injection into the user turn. Tests are well-structured and use the existing stub harness correctly.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant execute_agent_chat_stream
    participant _get_agentic_qa_prompt
    participant get_current_datetime_block
    participant _messages_to_anthropic
    participant _inject_current_datetime
    participant _run_anthropic_agent_stream

    Client->>execute_agent_chat_stream: request (uid, messages)
    execute_agent_chat_stream->>_get_agentic_qa_prompt: uid
    note over _get_agentic_qa_prompt: Resolves tz only.<br/>current_datetime = PLACEHOLDER.<br/>System prompt is byte-stable.
    _get_agentic_qa_prompt-->>execute_agent_chat_stream: system_prompt (stable)

    execute_agent_chat_stream->>_messages_to_anthropic: messages
    _messages_to_anthropic-->>execute_agent_chat_stream: anthropic_messages (string content)

    execute_agent_chat_stream->>get_current_datetime_block: uid
    note over get_current_datetime_block: Calls notification_db for tz,<br/>returns live datetime XML block
    get_current_datetime_block-->>execute_agent_chat_stream: datetime_block (live)

    execute_agent_chat_stream->>_inject_current_datetime: anthropic_messages, datetime_block
    note over _inject_current_datetime: Prepends datetime_block to<br/>last user message content
    _inject_current_datetime-->>execute_agent_chat_stream: anthropic_messages (datetime in user turn)

    execute_agent_chat_stream->>_run_anthropic_agent_stream: system_prompt + anthropic_messages
    note over _run_anthropic_agent_stream: system wrapped in cache_control.<br/>Cache hits because system bytes<br/>are stable. Model sees datetime<br/>via user turn.
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant execute_agent_chat_stream
    participant _get_agentic_qa_prompt
    participant get_current_datetime_block
    participant _messages_to_anthropic
    participant _inject_current_datetime
    participant _run_anthropic_agent_stream

    Client->>execute_agent_chat_stream: request (uid, messages)
    execute_agent_chat_stream->>_get_agentic_qa_prompt: uid
    note over _get_agentic_qa_prompt: Resolves tz only.<br/>current_datetime = PLACEHOLDER.<br/>System prompt is byte-stable.
    _get_agentic_qa_prompt-->>execute_agent_chat_stream: system_prompt (stable)

    execute_agent_chat_stream->>_messages_to_anthropic: messages
    _messages_to_anthropic-->>execute_agent_chat_stream: anthropic_messages (string content)

    execute_agent_chat_stream->>get_current_datetime_block: uid
    note over get_current_datetime_block: Calls notification_db for tz,<br/>returns live datetime XML block
    get_current_datetime_block-->>execute_agent_chat_stream: datetime_block (live)

    execute_agent_chat_stream->>_inject_current_datetime: anthropic_messages, datetime_block
    note over _inject_current_datetime: Prepends datetime_block to<br/>last user message content
    _inject_current_datetime-->>execute_agent_chat_stream: anthropic_messages (datetime in user turn)

    execute_agent_chat_stream->>_run_anthropic_agent_stream: system_prompt + anthropic_messages
    note over _run_anthropic_agent_stream: system wrapped in cache_control.<br/>Cache hits because system bytes<br/>are stable. Model sees datetime<br/>via user turn.
Loading

Comments Outside Diff (1)

  1. backend/utils/llm/chat.py, line 723-734 (link)

    P2 Broken concrete examples in tool_datetime_rules

    {current_datetime_iso} now expands to "(see <current_datetime> in the latest user message)" inside what is supposed to be a concrete worked example. The section at line 730–732 reads the placeholder text in place of an actual ISO timestamp, then immediately follows with a hardcoded date (2024-01-19T14:23:45-08:00) that no longer relates to the stated "current time", producing an incoherent example. The model can still resolve the actual time from the <current_datetime> block in the user turn, but the intended teaching value of the example is lost, which risks degraded tool-call accuracy for relative-time queries like "3 hours ago". The examples should use a static illustrative date string rather than the placeholder.

Reviews (1): Last reviewed commit: "fix: keep current datetime out of the ca..." | Re-trigger Greptile

Comment on lines +353 to +356
for msg in reversed(anthropic_messages):
if msg["role"] == "user" and isinstance(msg.get("content"), str):
msg["content"] = f"{datetime_block}\n\n{msg['content']}"
return anthropic_messages

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 List-content user messages silently skipped

_inject_current_datetime only injects into user messages whose content is a plain str. If _messages_to_anthropic ever produces a user message with list content (e.g., when file-attachment support is added or if the multi-turn tool-result format changes), this function silently falls through and appends a standalone user message with just the datetime block. Appending a bare user message after the real last message alters the expected user→assistant turn structure and could break the API call or cause the model to respond to only the datetime prompt instead of the real question. Adding a branch that handles list content (prepending a text block to the list) would make the function safe for that format.

Comment thread backend/utils/llm/chat.py
Comment on lines +399 to +429
def get_user_timezone(uid: str) -> str:
"""Resolve the user's timezone, falling back to UTC when missing/invalid."""
tz = notification_db.get_user_time_zone(uid)
try:
ZoneInfo(tz)
return tz
except Exception:
return "UTC"


def get_current_datetime_block(uid: str) -> str:
"""Build the current-datetime block injected into the user turn.

Kept out of the cached system prefix so the cached bytes stay stable across requests
while the model still receives the live time. Mirrors the timezone resolution used by
_get_agentic_qa_prompt.
"""
tz = get_user_timezone(uid)
try:
current_datetime_user = datetime.now(ZoneInfo(tz))
except Exception:
current_datetime_user = datetime.now(timezone.utc)
tz = "UTC"
current_datetime_str = current_datetime_user.strftime('%Y-%m-%d %H:%M:%S')
current_datetime_iso = current_datetime_user.isoformat()
return (
"<current_datetime>\n"
f"Current date time in {tz}: {current_datetime_str}\n"
f"Current date time ISO format: {current_datetime_iso}\n"
"</current_datetime>"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Double notification_db.get_user_time_zone call per request

Both _get_agentic_qa_prompt (via get_user_timezone) and get_current_datetime_block (via get_user_timezone) call notification_db.get_user_time_zone(uid) independently on every request. Since execute_agent_chat_stream calls both in sequence, every agentic request makes two round-trips to the notification database for the same immutable-per-request value. Passing the resolved timezone string into get_current_datetime_block (e.g., get_current_datetime_block(uid, tz=tz)) would eliminate the redundant lookup without changing any observable behaviour.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

The current-datetime placeholder fed into the <tool_datetime_rules>
worked example produced an incoherent example (placeholder text where a
concrete ISO timestamp was expected). Point the time reference at the
<current_datetime> block in the user turn, and use a static illustrative
timestamp in the example so it stays internally consistent.

@kodjima33 kodjima33 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend prompt-cache correctness (keep datetime out of cached prefix) — approve only, Nik's LLM area

@sumleo

sumleo commented Jun 18, 2026

Copy link
Copy Markdown
Author

Hi @josancamon19, gentle nudge on this when you have a moment. It's a small, self-contained prompt-caching fix, and I'm happy to rebase or tweak anything if that would make review easier. Thanks for the project and your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants