[cudax] Update lane mask inside mappings only when unit is thread by davebayer · Pull Request #9264 · NVIDIA/cccl

davebayer · 2026-06-04T20:29:53Z

There was a bug in our mappings that made the map method update the lane mask no matter what the unit is. We want to modify lane mask only when the unit is a thread.

coderabbitai · 2026-06-04T20:42:50Z

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Overview

This PR fixes a bug in the CUDA Experimental Group mappings implementation where the map method incorrectly updated the lane mask regardless of the unit type. The fix restricts lane mask updates to occur only when the unit is thread_level, ensuring correct behavior in hierarchical group operations.

Changes Made

Core API Update

The map method signature across all mapping implementations has been updated to accept an additional leading _Unit parameter:

Before: map(const _ParentGroup&, const _PrevMappingResult&)
After: map(const _Unit&, const _ParentGroup&, const _PrevMappingResult&)

This change propagates through:

cuda::experimental::group (base group implementation)
cuda::experimental::binary_partition
cuda::experimental::composite_mapping
cuda::experimental::group_as (both static and dynamic specializations)
cuda::experimental::group_by (both fixed-extent and dynamic-extent variants)
cuda::experimental::identity_mapping

Lane Mask Computation Fix

The critical fix for the lane mask issue is in the mapping implementations (group_as and group_by). Lane mask computation is now conditional at compile time:

When _Unit is thread_level: Lane mask is derived using __make_lane_mask_for_n with the previous mapping's lane mask plus computed lane indices and rank
Otherwise: The previous mapping's lane mask is reused unchanged without modification

This ensures lane masks are only updated when performing thread-level grouping operations.

Supporting Updates

Group construction: The __mapping_result_ is now computed via the updated __do_mapping(__unit, ...) path, passing the group _Unit through to mapping calls
Synchronizer creation: Updated to accept and forward the provided _unit to __synchronizer.make_instance(...) instead of using default-constructed _Unit{}
Type inference helpers: Updated __group_mapping_result_t and __group_synchronizer_instance_t in traits.cuh to use cuda::std::declval expressions instead of default-constructed temporaries for more accurate type deduction

Test Updates

All mapping tests have been updated to pass cuda::gpu_thread as the first argument to map invocations:

binary_partition.cu
composite_mapping.cu
group_as.cu
group_by.cu
identity_mapping.cu
barrier_synchronizer.cu
lane_synchronizer.cu

The test assertions for result types and noexcept specifications were updated to match the new map(...) signature.

Impact

This is a breaking API change for any code that directly calls the map method on mapping objects. All callers must be updated to pass a _Unit argument as the first parameter. The fix ensures correctness in hierarchical group operations where unit type determines whether lane mask tracking should be updated.

important: Walkthrough

This PR updates the CUDA group mapping pipeline to accept and propagate a _Unit template parameter as the first argument to all mapping map(...) method signatures and to forward that unit into synchronizer factory calls; lane-mask generation is made conditional on _Unit being thread_level.

important: Changes

Unit parameter integration across group mappings

Layer / File(s)	Summary
Group mapping invocation contract `cudax/include/cuda/experimental/__group/group.cuh`, `cudax/include/cuda/experimental/__group/traits.cuh`	Group construction, mapping result inference, and synchronizer factory `make_instance` now use `const _Unit&` in `decltype` and calls, and the constructor forwards `__unit` into mapping and synchronizer creation.
Identity and binary partition mappings `cudax/include/cuda/experimental/__group/mapping/identity_mapping.cuh`, `cudax/include/cuda/experimental/__group/mapping/binary_partition.cuh`	`map` now takes `const _Unit&` first; identity returns the previous result unchanged, binary_partition adds a `static_assert` requiring `_Unit` == `thread_level` while preserving partition logic.
Composite mapping recursive forwarding `cudax/include/cuda/experimental/__group/mapping/composite_mapping.cuh`	Recursive `__map_impl` and public `map` now accept and forward a `const _Unit&` through each mapping step, threading intermediate mapping results.
Conditional lane-mask mappings `cudax/include/cuda/experimental/__group/mapping/group_as.cuh`, `cudax/include/cuda/experimental/__group/mapping/group_by.cuh`	`map` signatures accept `const _Unit&`; lane-mask computation is performed only when `_Unit` is `thread_level`, otherwise prior lane_mask() is forwarded.
Mapping and synchronizer test suite updates `cudax/test/group/mapping/.cu`, `cudax/test/group/synchronizer/.cu`	All tests updated to pass `cuda::gpu_thread` as the first argument to `mapping.map(...)`; corresponding `__group_mapping_result` and `noexcept` static assertions were adjusted.

suggestion: Possibly related PRs

NVIDIA/cccl#8894: Related changes to binary_partition mapping API introducing _Unit parameter.
NVIDIA/cccl#9140: Related work touching lane-mask storage/usage in mapping results.

suggestion: Suggested labels

cudax

suggestion: Suggested reviewers

andralex
gevtushenko
griwes

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

cudax/include/cuda/experimental/__group/mapping/identity_mapping.cuh (1)

37-39: ⚡ Quick win

suggestion: The new _Unit parameter is unconstrained here, so direct map(...) callers can pass arbitrary types and bypass group's __is_hierarchy_level_v<_Unit> contract. Add the hierarchy-level constraint on the overload itself, and mirror it on the other updated mapping map(...) signatures.

As per coding guidelines: "Use C++20 concept macros instead of SFINAE, e.g., _CCCL_TEMPLATE(...) and _CCCL_REQUIRES(...), for template constraints."

cudax/include/cuda/experimental/__group/mapping/group_as.cuh (1)

166-169: ⚡ Quick win

suggestion: Add a regression that instantiates this mapping with a non-thread unit and asserts lane_mask() is forwarded unchanged. The updated test cohort for this stack only passes cuda::gpu_thread, so the new false arm can still regress silently.

Also applies to: 292-295

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8c8dc5c7-7b33-4549-bb6e-da4e1566714c

📥 Commits

Reviewing files that changed from the base of the PR and between 89c81d7 and 7261ef1.

📒 Files selected for processing (13)

cudax/include/cuda/experimental/__group/group.cuh
cudax/include/cuda/experimental/__group/mapping/binary_partition.cuh
cudax/include/cuda/experimental/__group/mapping/composite_mapping.cuh
cudax/include/cuda/experimental/__group/mapping/group_as.cuh
cudax/include/cuda/experimental/__group/mapping/group_by.cuh
cudax/include/cuda/experimental/__group/mapping/identity_mapping.cuh
cudax/test/group/mapping/binary_partition.cu
cudax/test/group/mapping/composite_mapping.cu
cudax/test/group/mapping/group_as.cu
cudax/test/group/mapping/group_by.cu
cudax/test/group/mapping/identity_mapping.cu
cudax/test/group/synchronizer/barrier_synchronizer.cu
cudax/test/group/synchronizer/lane_synchronizer.cu

coderabbitai

🧹 Nitpick comments (1)

cudax/include/cuda/experimental/__group/traits.cuh (1)

35-36: suggestion: Consider removing or updating __group_mapping_result_t to avoid dead, stale 2-arg map inference. The only reference to __group_mapping_result_t is its definition in cudax/include/cuda/experimental/__group/traits.cuh (lines 35-36); no traits/concepts use it elsewhere, so the 2-arg contract mismatch with the current 3-arg map(unit, parent, initial_mapping_result) interface won’t impact builds.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f50647ac-1ff9-4b69-9ab5-b79649a06662

📥 Commits

Reviewing files that changed from the base of the PR and between 7261ef1 and 6239984.

📒 Files selected for processing (14)

cudax/include/cuda/experimental/__group/group.cuh
cudax/include/cuda/experimental/__group/mapping/binary_partition.cuh
cudax/include/cuda/experimental/__group/mapping/composite_mapping.cuh
cudax/include/cuda/experimental/__group/mapping/group_as.cuh
cudax/include/cuda/experimental/__group/mapping/group_by.cuh
cudax/include/cuda/experimental/__group/mapping/identity_mapping.cuh
cudax/include/cuda/experimental/__group/traits.cuh
cudax/test/group/mapping/binary_partition.cu
cudax/test/group/mapping/composite_mapping.cu
cudax/test/group/mapping/group_as.cu
cudax/test/group/mapping/group_by.cu
cudax/test/group/mapping/identity_mapping.cu
cudax/test/group/synchronizer/barrier_synchronizer.cu
cudax/test/group/synchronizer/lane_synchronizer.cu

🚧 Files skipped from review as they are similar to previous changes (11)

cudax/test/group/synchronizer/lane_synchronizer.cu
cudax/test/group/synchronizer/barrier_synchronizer.cu
cudax/include/cuda/experimental/__group/mapping/binary_partition.cuh
cudax/include/cuda/experimental/__group/mapping/identity_mapping.cuh
cudax/test/group/mapping/group_as.cu
cudax/test/group/mapping/binary_partition.cu
cudax/include/cuda/experimental/__group/mapping/composite_mapping.cuh
cudax/test/group/mapping/identity_mapping.cu
cudax/test/group/mapping/composite_mapping.cu
cudax/test/group/mapping/group_by.cu
cudax/include/cuda/experimental/__group/mapping/group_by.cuh

github-actions · 2026-06-05T07:42:31Z

🥳 CI Workflow Results

🟩 Finished in 34m 58s: Pass: 100%/55 | Total: 8h 16m | Max: 34m 58s | Hits: 69%/47114

See results here.

davebayer requested a review from a team as a code owner June 4, 2026 20:29

davebayer requested a review from andralex June 4, 2026 20:29

github-project-automation Bot added this to CCCL Jun 4, 2026

github-project-automation Bot moved this to Todo in CCCL Jun 4, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 4, 2026

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

miscco reviewed Jun 5, 2026

View reviewed changes

Comment thread cudax/include/cuda/experimental/__group/mapping/composite_mapping.cuh Outdated

[cudax] Update lane mask inside mappings only when unit is thread

6239984

davebayer force-pushed the groups_update_lane_mask_only_for_thread_level branch from 7261ef1 to 6239984 Compare June 5, 2026 07:05

davebayer requested a review from miscco June 5, 2026 07:08

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

miscco approved these changes Jun 5, 2026

View reviewed changes

davebayer merged commit 281a0e4 into NVIDIA:main Jun 5, 2026
78 checks passed

github-project-automation Bot moved this from In Review to Done in CCCL Jun 5, 2026

davebayer deleted the groups_update_lane_mask_only_for_thread_level branch June 5, 2026 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cudax] Update lane mask inside mappings only when unit is thread#9264

[cudax] Update lane mask inside mappings only when unit is thread#9264
davebayer merged 1 commit into
NVIDIA:mainfrom
davebayer:groups_update_lane_mask_only_for_thread_level

davebayer commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

This comment has been minimized.

Uh oh!

coderabbitai Bot left a comment

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

davebayer commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes Made

Core API Update

Lane Mask Computation Fix

Supporting Updates

Test Updates

Impact

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 5, 2026

🥳 CI Workflow Results

🟩 Finished in 34m 58s: Pass: 100%/55 | Total: 8h 16m | Max: 34m 58s | Hits: 69%/47114

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading