[https://nvbugs/6266705][fix] Gate the FlashInfer import-time selection on `get_sm_version() == 90` (in… by tensorrt-cicd · Pull Request #14973 · NVIDIA/TensorRT-LLM

tensorrt-cicd · 2026-06-04T20:12:54Z

Summary

Root cause: gdn_mixer.py bound chunk_gated_delta_rule to the SM90-only FlashInfer GDN prefill kernel at import time with no device-arch guard, so on SM120 (Blackwell RTX PRO 6000) prefill aborted with "delta rule kernel does not support this device major version: 12" during model load.
Fix: Gate the FlashInfer import-time selection on get_sm_version() == 90 (in addition to the existing env flag); all other archs fall back to the device-agnostic Triton chunk_gated_delta_rule. Verified EXIT_CODE=0 (model loads and serves).
Automated fix generated by repair-bot

Test plan

Verify fix on the same GPU type as the original failure
Check for regressions in related tests

Links

Bug: https://nvbugs/6266705

Summary by CodeRabbit

Bug Fixes
- Improved GPU kernel selection for GDN prefill operations to intelligently choose the optimal implementation based on GPU architecture capabilities, with a fallback mechanism to ensure compatibility across different hardware configurations.

…to Triton elsewhere Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

coderabbitai · 2026-06-04T20:17:48Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 23800b9a-f680-4a7c-aab1-64c9888cedcf

📥 Commits

Reviewing files that changed from the base of the PR and between 8e5d9e2 and d14234e.

📒 Files selected for processing (1)

tensorrt_llm/_torch/modules/mamba/gdn_mixer.py

📝 Walkthrough

Walkthrough

The GDN prefill kernel selection logic is updated to restrict FlashInfer usage to SM90 GPUs only. The change adds a get_sm_version() import and gates FlashInfer activation behind an additional SM version equality check, falling back to Triton-based chunk_gated_delta_rule for non-SM90 hardware.

Changes

GDN prefill kernel selection gating

Layer / File(s)	Summary
SM90 gating for FlashInfer GDN prefill `tensorrt_llm/_torch/modules/mamba/gdn_mixer.py`	Import `get_sm_version` and extend kernel selection condition to require GPU SM version 90 in addition to the `TLLM_USE_FLASHINFER_GDN_PREFILL` environment variable check; otherwise fall back to Triton kernel.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description provides clear context on the root cause, the fix applied, and test verification. However, it lacks the structured template sections (Description, Test Coverage, PR Checklist) that are required by the repository template.	Reformat the description to match the repository template structure with explicit Description, Test Coverage, and PR Checklist sections to ensure consistency with repository standards.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly references the fix applied to the codebase: gating FlashInfer selection on SM version 90, which directly matches the main change in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

LarryXFly · 2026-06-05T06:58:43Z

+# device major version 9 (SM90/Hopper); on other archs it aborts at load. Gate
+# the selection on SM90 so non-Hopper GPUs (e.g. Blackwell SM120) use the
+# device-agnostic Triton path.
+if os.getenv("TLLM_USE_FLASHINFER_GDN_PREFILL", "1") == "1" and get_sm_version() == 90:


The FlashInfer GDN prefill kernel Requires SM90 (Hopper) or SM100 or SM103 (Blackwell) architecture.

[nvbugs/6266705][fix] Gate FlashInfer GDN prefill to SM90, fall back …

d14234e

…to Triton elsewhere Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

tensorrt-cicd requested review from a team as code owners June 4, 2026 20:12

tensorrt-cicd requested a review from tomeras91 June 4, 2026 20:12

tensorrt-cicd assigned nv-guomingz Jun 4, 2026

tensorrt-cicd requested a review from symphonylyh June 4, 2026 20:12

github-actions Bot assigned tensorrt-cicd Jun 4, 2026

LarryXFly reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6266705][fix] Gate the FlashInfer import-time selection on `get_sm_version() == 90` (in…#14973

[https://nvbugs/6266705][fix] Gate the FlashInfer import-time selection on `get_sm_version() == 90` (in…#14973
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6266705

tensorrt-cicd commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Walkthrough

Changes

❌ Failed checks (1 inconclusive)

Uh oh!

LarryXFly Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tensorrt-cicd commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Links

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026

Walkthrough

Changes

❌ Failed checks (1 inconclusive)

Uh oh!

LarryXFly Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tensorrt-cicd commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading