[None][perf] kv_cache_manager_v2: batch block-key SHA-256 hashing by lancelly · Pull Request #14994 · NVIDIA/TensorRT-LLM

lancelly · 2026-06-05T06:18:50Z

Description

Hasher.update in kv_cache_manager_v2/_block_radix_tree.py hashed each token of a block one at a time — a Python-level int.to_bytes(8, "little") + sha256.update() call per token. For a long warm prefix this chained per-token hashing is the dominant cost of BlockRadixTree.match, which is invoked:

by the attention-DP KV-cache-aware router (KVCacheAwareADPRouter.gather_prefix_matches → probe_prefix_match_length → probe_reuse) as a per-request probe on every DP rank before routing, gating the tp_allgather, and
again by create_kv_cache for the actual reuse lookup (same _match_reuse).

This is especially hot for DeepSeek-V4 (DeepseekV4CacheManager(KVCacheManagerV2), tokens_per_block ∈ {128, 256}) on long-context agentic workloads (mean ISL ~38k)

Summary by CodeRabbit

Tests
- Added comprehensive unit tests to verify KV cache block key hashing produces deterministic and correct SHA-256 values for both integer and multimodal (mixed-type) token blocks.
Refactor
- Optimized block hashing performance through efficient batch processing of integer sequences, with graceful fallback for mixed-type blocks.

lancelly · 2026-06-05T06:38:54Z

/bot run --disable-fail-fast

coderabbitai · 2026-06-05T06:44:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1f5e5e4f-dd3d-4a51-ae48-e2ea1c4637c6

📥 Commits

Reviewing files that changed from the base of the PR and between 21ffdc7 and 2ddd0f8.

📒 Files selected for processing (2)

tensorrt_llm/runtime/kv_cache_manager_v2/_block_radix_tree.py
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py

📝 Walkthrough

Walkthrough

This PR optimizes the Hasher.update() method in the KV cache radix tree to bulk-hash integer sequences using Python's array type, with a fallback to per-item hashing for mixed or non-integer blocks. Tests verify the optimization produces correct SHA-256 hashes.

Changes

Hasher Bulk Hashing Optimization

Layer / File(s)	Summary
Hasher bulk hashing implementation `tensorrt_llm/runtime/kv_cache_manager_v2/_block_radix_tree.py`	Imports `array` and modifies `Hasher.update()` to attempt packing `int` items into `array("Q")` and hashing the bytes in a single call; on type or overflow error, falls back to the prior per-item hashing logic for mixed/byte-containing blocks.
TestBlockKeyHashing verification `tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py`	Imports `hashlib` and updates conditional `_block_radix_tree` imports to include `Hasher`, then adds `TestBlockKeyHashing` with a reference SHA-256 implementation and test assertions for integer-only and multimodal (bytes + ints) blocks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main performance optimization: batching block-key SHA-256 hashing in kv_cache_manager_v2, which is the primary change across both modified files.
Description check	✅ Passed	The PR description provides a clear explanation of the performance problem, the solution, and the impact, but the description section lacks a complete PR Checklist with verification marks as required by the template.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-06-05T06:45:18Z

PR_Github #52282 [ run ] triggered by Bot. Commit: 2ddd0f8 Link to invocation

tensorrt-cicd · 2026-06-05T13:34:17Z

PR_Github #52282 [ run ] completed with state SUCCESS. Commit: 2ddd0f8
/LLM/main/L0_MergeRequest_PR pipeline #41592 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lancelly · 2026-06-06T03:41:19Z

/bot run --disable-fail-fast

Hasher.update hashed each token of a block with its own int.to_bytes(8) + sha256.update() call. For long warm prefix matches this is the dominant cost of BlockRadixTree.match, which the attention-DP KV-cache-aware router (KVCacheAwareADPRouter) runs as a per-request probe on every DP rank before routing -- and which create_kv_cache repeats for the actual reuse lookup. Pack the whole token block into bytes once (array("Q", block).tobytes()) and do a single sha256.update(). All NVIDIA GPU host platforms (x86_64, aarch64/ Grace) are little-endian, so this is byte-identical to the per-token to_bytes(8, "little") loop -- block reuse / cross-run cache-hit behavior is unchanged. Multimodal blocks (which contain bytes items) fall back to the per-token loop via except (TypeError, OverflowError). Speeds up the probe and the create-time reuse lookup equally. On a GB300 Grace node the real BlockRadixTree.match warm-prefix cost at ISL~38k drops 2.85-3.05x at tokens_per_block=128/256 (DeepseekV4CacheManager). Adds TestBlockKeyHashing to lock in the bit-identical contract incl. multi-modal blocks. Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>

lancelly · 2026-06-06T08:38:49Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-06T08:45:10Z

PR_Github #52493 [ run ] triggered by Bot. Commit: be1309c Link to invocation

tensorrt-cicd · 2026-06-06T12:36:58Z

PR_Github #52493 [ run ] completed with state SUCCESS. Commit: be1309c
/LLM/main/L0_MergeRequest_PR pipeline #41785 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lancelly · 2026-06-06T12:55:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-06T13:00:52Z

PR_Github #52506 [ run ] triggered by Bot. Commit: be1309c Link to invocation

tensorrt-cicd · 2026-06-06T13:54:07Z

PR_Github #52506 [ run ] completed with state SUCCESS. Commit: be1309c
/LLM/main/L0_MergeRequest_PR pipeline #41797 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned lancelly Jun 5, 2026

lancelly force-pushed the perf/kvcache-v2-batched-blockkey-hashing-main branch from dd25e51 to 2ddd0f8 Compare June 5, 2026 06:37

lancelly marked this pull request as ready for review June 5, 2026 06:38

lancelly force-pushed the perf/kvcache-v2-batched-blockkey-hashing-main branch from 2ddd0f8 to be1309c Compare June 6, 2026 08:29

Conversation

lancelly commented Jun 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

lancelly commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

tensorrt-cicd commented Jun 5, 2026

Uh oh!

lancelly commented Jun 6, 2026

Uh oh!

lancelly commented Jun 6, 2026

Uh oh!

tensorrt-cicd commented Jun 6, 2026

Uh oh!

tensorrt-cicd commented Jun 6, 2026

Uh oh!

lancelly commented Jun 6, 2026

Uh oh!

tensorrt-cicd commented Jun 6, 2026

Uh oh!

tensorrt-cicd commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lancelly commented Jun 5, 2026 •

edited by coderabbitai Bot

Loading