[cudax] Update lane mask inside mappings only when unit is thread#9264
Conversation
OverviewThis PR fixes a bug in the CUDA Experimental Group mappings implementation where the Changes MadeCore API UpdateThe
This change propagates through:
Lane Mask Computation FixThe critical fix for the lane mask issue is in the mapping implementations (
This ensures lane masks are only updated when performing thread-level grouping operations. Supporting Updates
Test UpdatesAll mapping tests have been updated to pass
The test assertions for result types and ImpactThis is a breaking API change for any code that directly calls the important: Walkthrough This PR updates the CUDA group mapping pipeline to accept and propagate a important: Changes Unit parameter integration across group mappings
suggestion: Possibly related PRs
suggestion: Suggested labels
suggestion: Suggested reviewers
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
cudax/include/cuda/experimental/__group/mapping/identity_mapping.cuh (1)
37-39: ⚡ Quick winsuggestion: The new
_Unitparameter is unconstrained here, so directmap(...)callers can pass arbitrary types and bypassgroup's__is_hierarchy_level_v<_Unit>contract. Add the hierarchy-level constraint on the overload itself, and mirror it on the other updated mappingmap(...)signatures.As per coding guidelines: "Use C++20 concept macros instead of SFINAE, e.g.,
_CCCL_TEMPLATE(...)and_CCCL_REQUIRES(...), for template constraints."cudax/include/cuda/experimental/__group/mapping/group_as.cuh (1)
166-169: ⚡ Quick winsuggestion: Add a regression that instantiates this mapping with a non-thread unit and asserts
lane_mask()is forwarded unchanged. The updated test cohort for this stack only passescuda::gpu_thread, so the newfalsearm can still regress silently.Also applies to: 292-295
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8c8dc5c7-7b33-4549-bb6e-da4e1566714c
📒 Files selected for processing (13)
cudax/include/cuda/experimental/__group/group.cuhcudax/include/cuda/experimental/__group/mapping/binary_partition.cuhcudax/include/cuda/experimental/__group/mapping/composite_mapping.cuhcudax/include/cuda/experimental/__group/mapping/group_as.cuhcudax/include/cuda/experimental/__group/mapping/group_by.cuhcudax/include/cuda/experimental/__group/mapping/identity_mapping.cuhcudax/test/group/mapping/binary_partition.cucudax/test/group/mapping/composite_mapping.cucudax/test/group/mapping/group_as.cucudax/test/group/mapping/group_by.cucudax/test/group/mapping/identity_mapping.cucudax/test/group/synchronizer/barrier_synchronizer.cucudax/test/group/synchronizer/lane_synchronizer.cu
This comment has been minimized.
This comment has been minimized.
7261ef1 to
6239984
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cudax/include/cuda/experimental/__group/traits.cuh (1)
35-36: suggestion: Consider removing or updating__group_mapping_result_tto avoid dead, stale 2-argmapinference. The only reference to__group_mapping_result_tis its definition incudax/include/cuda/experimental/__group/traits.cuh(lines 35-36); no traits/concepts use it elsewhere, so the 2-arg contract mismatch with the current 3-argmap(unit, parent, initial_mapping_result)interface won’t impact builds.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f50647ac-1ff9-4b69-9ab5-b79649a06662
📒 Files selected for processing (14)
cudax/include/cuda/experimental/__group/group.cuhcudax/include/cuda/experimental/__group/mapping/binary_partition.cuhcudax/include/cuda/experimental/__group/mapping/composite_mapping.cuhcudax/include/cuda/experimental/__group/mapping/group_as.cuhcudax/include/cuda/experimental/__group/mapping/group_by.cuhcudax/include/cuda/experimental/__group/mapping/identity_mapping.cuhcudax/include/cuda/experimental/__group/traits.cuhcudax/test/group/mapping/binary_partition.cucudax/test/group/mapping/composite_mapping.cucudax/test/group/mapping/group_as.cucudax/test/group/mapping/group_by.cucudax/test/group/mapping/identity_mapping.cucudax/test/group/synchronizer/barrier_synchronizer.cucudax/test/group/synchronizer/lane_synchronizer.cu
🚧 Files skipped from review as they are similar to previous changes (11)
- cudax/test/group/synchronizer/lane_synchronizer.cu
- cudax/test/group/synchronizer/barrier_synchronizer.cu
- cudax/include/cuda/experimental/__group/mapping/binary_partition.cuh
- cudax/include/cuda/experimental/__group/mapping/identity_mapping.cuh
- cudax/test/group/mapping/group_as.cu
- cudax/test/group/mapping/binary_partition.cu
- cudax/include/cuda/experimental/__group/mapping/composite_mapping.cuh
- cudax/test/group/mapping/identity_mapping.cu
- cudax/test/group/mapping/composite_mapping.cu
- cudax/test/group/mapping/group_by.cu
- cudax/include/cuda/experimental/__group/mapping/group_by.cuh
🥳 CI Workflow Results🟩 Finished in 34m 58s: Pass: 100%/55 | Total: 8h 16m | Max: 34m 58s | Hits: 69%/47114See results here. |
There was a bug in our mappings that made the map method update the lane mask no matter what the unit is. We want to modify lane mask only when the unit is a thread.