Skip to content

[https://nvbugs/6266705][fix] Gate the FlashInfer import-time selection on get_sm_version() == 90 (in…#14973

Open
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6266705
Open

[https://nvbugs/6266705][fix] Gate the FlashInfer import-time selection on get_sm_version() == 90 (in…#14973
tensorrt-cicd wants to merge 1 commit into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6266705

Conversation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

@tensorrt-cicd tensorrt-cicd commented Jun 4, 2026

Summary

  • Root cause: gdn_mixer.py bound chunk_gated_delta_rule to the SM90-only FlashInfer GDN prefill kernel at import time with no device-arch guard, so on SM120 (Blackwell RTX PRO 6000) prefill aborted with "delta rule kernel does not support this device major version: 12" during model load.
  • Fix: Gate the FlashInfer import-time selection on get_sm_version() == 90 (in addition to the existing env flag); all other archs fall back to the device-agnostic Triton chunk_gated_delta_rule. Verified EXIT_CODE=0 (model loads and serves).
  • Automated fix generated by repair-bot

Test plan

  • Verify fix on the same GPU type as the original failure
  • Check for regressions in related tests

Links

Summary by CodeRabbit

  • Bug Fixes
    • Improved GPU kernel selection for GDN prefill operations to intelligently choose the optimal implementation based on GPU architecture capabilities, with a fallback mechanism to ensure compatibility across different hardware configurations.

…to Triton elsewhere

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 23800b9a-f680-4a7c-aab1-64c9888cedcf

📥 Commits

Reviewing files that changed from the base of the PR and between 8e5d9e2 and d14234e.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/modules/mamba/gdn_mixer.py

📝 Walkthrough

Walkthrough

The GDN prefill kernel selection logic is updated to restrict FlashInfer usage to SM90 GPUs only. The change adds a get_sm_version() import and gates FlashInfer activation behind an additional SM version equality check, falling back to Triton-based chunk_gated_delta_rule for non-SM90 hardware.

Changes

GDN prefill kernel selection gating

Layer / File(s) Summary
SM90 gating for FlashInfer GDN prefill
tensorrt_llm/_torch/modules/mamba/gdn_mixer.py
Import get_sm_version and extend kernel selection condition to require GPU SM version 90 in addition to the TLLM_USE_FLASHINFER_GDN_PREFILL environment variable check; otherwise fall back to Triton kernel.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description provides clear context on the root cause, the fix applied, and test verification. However, it lacks the structured template sections (Description, Test Coverage, PR Checklist) that are required by the repository template. Reformat the description to match the repository template structure with explicit Description, Test Coverage, and PR Checklist sections to ensure consistency with repository standards.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly references the fix applied to the codebase: gating FlashInfer selection on SM version 90, which directly matches the main change in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

# device major version 9 (SM90/Hopper); on other archs it aborts at load. Gate
# the selection on SM90 so non-Hopper GPUs (e.g. Blackwell SM120) use the
# device-agnostic Triton path.
if os.getenv("TLLM_USE_FLASHINFER_GDN_PREFILL", "1") == "1" and get_sm_version() == 90:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FlashInfer GDN prefill kernel Requires SM90 (Hopper) or SM100 or SM103 (Blackwell) architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants