fix: keep current datetime out of the cached system prefix#7984
Conversation
The agentic chat system prompt is wrapped in a single Anthropic cache_control breakpoint, but it embedded the current datetime with microsecond ISO precision near the top of the prompt. That made the cached prefix byte-different on every request, so the prompt cache never hit (0% cache read). Move the live datetime out of the system prompt and into the user turn: - chat.py: _get_agentic_qa_prompt no longer embeds the live time; it carries a stable placeholder that points at the user turn. Add get_current_datetime_block() / get_user_timezone() helpers so the datetime is built with the same timezone resolution as before. - agentic.py: inject the current-datetime block into the latest user message (after the cache breakpoint) so the model still receives the current time without invalidating the cached prefix. This aligns the Anthropic explicit-breakpoint path with the existing intent (already documented for the OpenAI auto-cache path) of pushing dynamic content to the end. Add a regression test asserting the system prompt is time-invariant and that the live datetime is delivered via the user turn.
Greptile SummaryThis PR fixes a prompt-cache miss caused by the current datetime (with microsecond ISO precision) being embedded directly in the Anthropic
Confidence Score: 4/5Safe to merge — the core caching fix is correct and well-tested; the findings are prompt-quality and efficiency concerns that do not block correctness. The caching fix itself is sound: datetime is removed from the system prefix and injected into the user turn, with three focused regression tests covering the key invariants. The main concern is that backend/utils/llm/chat.py — the tool_datetime_rules worked examples still reference the placeholder; they should use a static illustrative date instead. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Client
participant execute_agent_chat_stream
participant _get_agentic_qa_prompt
participant get_current_datetime_block
participant _messages_to_anthropic
participant _inject_current_datetime
participant _run_anthropic_agent_stream
Client->>execute_agent_chat_stream: request (uid, messages)
execute_agent_chat_stream->>_get_agentic_qa_prompt: uid
note over _get_agentic_qa_prompt: Resolves tz only.<br/>current_datetime = PLACEHOLDER.<br/>System prompt is byte-stable.
_get_agentic_qa_prompt-->>execute_agent_chat_stream: system_prompt (stable)
execute_agent_chat_stream->>_messages_to_anthropic: messages
_messages_to_anthropic-->>execute_agent_chat_stream: anthropic_messages (string content)
execute_agent_chat_stream->>get_current_datetime_block: uid
note over get_current_datetime_block: Calls notification_db for tz,<br/>returns live datetime XML block
get_current_datetime_block-->>execute_agent_chat_stream: datetime_block (live)
execute_agent_chat_stream->>_inject_current_datetime: anthropic_messages, datetime_block
note over _inject_current_datetime: Prepends datetime_block to<br/>last user message content
_inject_current_datetime-->>execute_agent_chat_stream: anthropic_messages (datetime in user turn)
execute_agent_chat_stream->>_run_anthropic_agent_stream: system_prompt + anthropic_messages
note over _run_anthropic_agent_stream: system wrapped in cache_control.<br/>Cache hits because system bytes<br/>are stable. Model sees datetime<br/>via user turn.
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Client
participant execute_agent_chat_stream
participant _get_agentic_qa_prompt
participant get_current_datetime_block
participant _messages_to_anthropic
participant _inject_current_datetime
participant _run_anthropic_agent_stream
Client->>execute_agent_chat_stream: request (uid, messages)
execute_agent_chat_stream->>_get_agentic_qa_prompt: uid
note over _get_agentic_qa_prompt: Resolves tz only.<br/>current_datetime = PLACEHOLDER.<br/>System prompt is byte-stable.
_get_agentic_qa_prompt-->>execute_agent_chat_stream: system_prompt (stable)
execute_agent_chat_stream->>_messages_to_anthropic: messages
_messages_to_anthropic-->>execute_agent_chat_stream: anthropic_messages (string content)
execute_agent_chat_stream->>get_current_datetime_block: uid
note over get_current_datetime_block: Calls notification_db for tz,<br/>returns live datetime XML block
get_current_datetime_block-->>execute_agent_chat_stream: datetime_block (live)
execute_agent_chat_stream->>_inject_current_datetime: anthropic_messages, datetime_block
note over _inject_current_datetime: Prepends datetime_block to<br/>last user message content
_inject_current_datetime-->>execute_agent_chat_stream: anthropic_messages (datetime in user turn)
execute_agent_chat_stream->>_run_anthropic_agent_stream: system_prompt + anthropic_messages
note over _run_anthropic_agent_stream: system wrapped in cache_control.<br/>Cache hits because system bytes<br/>are stable. Model sees datetime<br/>via user turn.
|
| for msg in reversed(anthropic_messages): | ||
| if msg["role"] == "user" and isinstance(msg.get("content"), str): | ||
| msg["content"] = f"{datetime_block}\n\n{msg['content']}" | ||
| return anthropic_messages |
There was a problem hiding this comment.
List-content user messages silently skipped
_inject_current_datetime only injects into user messages whose content is a plain str. If _messages_to_anthropic ever produces a user message with list content (e.g., when file-attachment support is added or if the multi-turn tool-result format changes), this function silently falls through and appends a standalone user message with just the datetime block. Appending a bare user message after the real last message alters the expected user→assistant turn structure and could break the API call or cause the model to respond to only the datetime prompt instead of the real question. Adding a branch that handles list content (prepending a text block to the list) would make the function safe for that format.
| def get_user_timezone(uid: str) -> str: | ||
| """Resolve the user's timezone, falling back to UTC when missing/invalid.""" | ||
| tz = notification_db.get_user_time_zone(uid) | ||
| try: | ||
| ZoneInfo(tz) | ||
| return tz | ||
| except Exception: | ||
| return "UTC" | ||
|
|
||
|
|
||
| def get_current_datetime_block(uid: str) -> str: | ||
| """Build the current-datetime block injected into the user turn. | ||
|
|
||
| Kept out of the cached system prefix so the cached bytes stay stable across requests | ||
| while the model still receives the live time. Mirrors the timezone resolution used by | ||
| _get_agentic_qa_prompt. | ||
| """ | ||
| tz = get_user_timezone(uid) | ||
| try: | ||
| current_datetime_user = datetime.now(ZoneInfo(tz)) | ||
| except Exception: | ||
| current_datetime_user = datetime.now(timezone.utc) | ||
| tz = "UTC" | ||
| current_datetime_str = current_datetime_user.strftime('%Y-%m-%d %H:%M:%S') | ||
| current_datetime_iso = current_datetime_user.isoformat() | ||
| return ( | ||
| "<current_datetime>\n" | ||
| f"Current date time in {tz}: {current_datetime_str}\n" | ||
| f"Current date time ISO format: {current_datetime_iso}\n" | ||
| "</current_datetime>" | ||
| ) |
There was a problem hiding this comment.
Double
notification_db.get_user_time_zone call per request
Both _get_agentic_qa_prompt (via get_user_timezone) and get_current_datetime_block (via get_user_timezone) call notification_db.get_user_time_zone(uid) independently on every request. Since execute_agent_chat_stream calls both in sequence, every agentic request makes two round-trips to the notification database for the same immutable-per-request value. Passing the resolved timezone string into get_current_datetime_block (e.g., get_current_datetime_block(uid, tz=tz)) would eliminate the redundant lookup without changing any observable behaviour.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
The current-datetime placeholder fed into the <tool_datetime_rules> worked example produced an incoherent example (placeholder text where a concrete ISO timestamp was expected). Point the time reference at the <current_datetime> block in the user turn, and use a static illustrative timestamp in the example so it stays internally consistent.
kodjima33
left a comment
There was a problem hiding this comment.
Backend prompt-cache correctness (keep datetime out of cached prefix) — approve only, Nik's LLM area
|
Hi @josancamon19, gentle nudge on this when you have a moment. It's a small, self-contained prompt-caching fix, and I'm happy to rebase or tweak anything if that would make review easier. Thanks for the project and your time! |
The agentic chat system prompt is wrapped in a single Anthropic
cache_controlbreakpoint (backend/utils/retrieval/agentic.py), but the system prompt embeds the current datetime with microsecond ISO precision near the top (_get_agentic_qa_promptinbackend/utils/llm/chat.py, viacurrent_datetime_iso/current_datetime_str).Because
datetime.now(...).isoformat()changes on every request, the cached prefix is byte-different each time, so the Anthropic prompt cache never hits (effectively 0% cache read) for the agent loop.Fix
Move the live datetime out of the cached system prefix and into the user turn:
chat.py:_get_agentic_qa_promptno longer embeds the live time. It carries a stable placeholder that points to the user turn, so the datetime instructions still read correctly. New helpersget_current_datetime_block()andget_user_timezone()keep the same timezone resolution (and UTC fallback) as before.agentic.py: the current-datetime block is injected into the latest user message — i.e. after thecache_controlbreakpoint — so the model still receives the current time, but the static system bytes stop changing.This aligns the Anthropic explicit-breakpoint path with the intent already documented in
chat.pyfor the OpenAI auto-cache path (push dynamic content to the end so the static prefix stays cacheable). The static prefix and per-user sections are otherwise unchanged.Tests
Extended
tests/unit/test_prompt_cache_integration.pywith regression coverage:test_system_prompt_is_time_invariant— the full system prompt is byte-identical across two calls at different wall-clock times, and no microsecond timestamp leaks into it.test_current_datetime_block_carries_live_time— the live time is present in the block destined for the user turn.test_datetime_injected_into_user_turn_not_system— the block is prepended to the latest user turn only.Ran locally (targeted, via the existing stubbed unit-test harness):
Files formatted with
black --line-length 120 --skip-string-normalization.