Skip to content

[None][perf] Remove redundant allreduce#14974

Open
mikeiovine wants to merge 2 commits into
NVIDIA:feat/deepseek_v4from
mikeiovine:remove-redundant-allreduce
Open

[None][perf] Remove redundant allreduce#14974
mikeiovine wants to merge 2 commits into
NVIDIA:feat/deepseek_v4from
mikeiovine:remove-redundant-allreduce

Conversation

@mikeiovine
Copy link
Copy Markdown
Collaborator

@mikeiovine mikeiovine commented Jun 4, 2026

Description

Skip a redundant allreduce.

  1. When attention DP is off and TP > 1, the DSV4 attention op reduces its output in o_b_proj.
  2. This reduced output would then go through the fused_hc
  3. We would then do another fused all reduce sum + RMSNorm. This does not affect correctness because RMSNorm with positive epsilon is roughly scale invariant.
  4. The extra allreduce is redundant here.

This PR changes the flow. The RMS norm is folded into fused_hc, and the extra allreduce is eliminated.

Test Coverage

Existing tests.

TPS/User improves ~3% at batch size 1 (DSV4 Pro, 8xB300) in my local tests.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Mike Iovine <miovine@nvidia.com>
@mikeiovine mikeiovine force-pushed the remove-redundant-allreduce branch from 58ae3c8 to 40b9a3b Compare June 4, 2026 21:19
Signed-off-by: Mike Iovine <miovine@nvidia.com>
@mikeiovine mikeiovine marked this pull request as ready for review June 4, 2026 21:25
@mikeiovine mikeiovine requested a review from a team as a code owner June 4, 2026 21:25
@mikeiovine mikeiovine requested review from yechank-nvidia and removed request for a team June 4, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants