support background entry removal for events cache by gbates101 · Pull Request #10523 · temporalio/temporal

gbates101 · 2026-06-04T02:55:24Z

What changed?

Using what was contributed in #7902, this adds a background eviction goroutine + config for the events cache.

Why?

I think the reason this PR (and the original) are useful is how cache size is computed. The amount of heap used by the MS/events cache reaches well beyond the bytes limit set in the config (more below), which I think is partly due to how cache size is computed against the entry's serialized size. For events its historyEventCacheItemImpl.CacheSize() returning event.Size(), which is the protobuf wire size rather than the size of the unmarshalled *HistoryEvent (with its nested duration/timestamp messages, maps, etc), which is presumably much larger than its serialized bytes, so the cache's upper bound is seemingly much higher than what is set in config. This lets cache that's past its TTL eat into the rest of the heap, potentially consuming most or all of it.

In our case, we were running 2500 concurrent workflows as a benchmark (e.g. as one workflow completes another is scheduled to maintain this number) with 1GB events cache limit, and after ~6 hours of monotonic heap growth the GOMEMLIMIT of 10GB was reached for all three history service pods, with the inuse_space memory profile showing nearly all of the heap consumed by events-related call stacks. cache_usage{cache_type="events"} counter showed only ~730MB used. Rerunning the benchmark with the change in this PR enabled avoided the monotonic heap growth once the 1h TTL was reached, since stale entries were evicted at the rate new ones were created, and our 2500 concurrent workflow benchmark has now been running indefinitely at 5GB heap consumed and ~130MB cache usage per pod.

How did you test it?

Also benchmarked the change as mentioned above.

Potential risks

Off by default, so behavior is unchanged unless explicitly enabled, and it reuses the same background-eviction machinery already proven for the workflow cache in #7902. When enabled, the only side effect is that an evicted event is re-read from persistence if a workflow genuinely needs it again. The TTL provides the grace window for near-term reads (e.g. start/completion events read by transfer tasks shortly after close), so this should be rare. As with the workflow cache setting, it requires a service restart to take effect.

CLAassistant · 2026-06-04T14:53:45Z

All committers have signed the CLA.

add support for events cache background eviction

e2b0c12

gbates101 requested review from a team as code owners June 4, 2026 02:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support background entry removal for events cache#10523

support background entry removal for events cache#10523
gbates101 wants to merge 1 commit into
temporalio:mainfrom
gbates101:events-cache-background-eviction

gbates101 commented Jun 4, 2026

Uh oh!

CLAassistant commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gbates101 commented Jun 4, 2026

What changed?

Why?

How did you test it?

Potential risks

Uh oh!

CLAassistant commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Jun 4, 2026 •

edited

Loading