cuda::device::warp_match_any#9243
Conversation
|
Lost in the diff? Review this PR in Change Stack to follow the change map from intent to exact ranges. No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
OverviewThis PR introduces ChangesCore Implementation
API Exposure
Documentation
Testing
suggestion: WalkthroughAdds cuda::device::warp_match_any: templated device function that serializes values into 32-bit chunks (optional padding clearing), calls __match_any_sync per chunk on SM_70+, intersects chunk masks, and returns the matching lane_mask. Includes tests, public header wiring, a new docs page, and related warp documentation updates. Changeswarp_match_any Feature
Possibly Related PRs
Suggested Reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)libcudacxx/test/libcudacxx/cuda/warp/warp_match_any.pass.cpplibcudacxx/test/libcudacxx/cuda/warp/warp_match_any.pass.cpp:14:10: fatal error: 'cuda/std/array' file not found ... [truncated 1152 characters] ... nternal-isystem" "/usr/local/include" "-internal-isystem" Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/libcudacxx/extended_api/warp/warp_match_any.rst (1)
36-37: 💤 Low valuesuggestion: Constraints omit the
is_bitwise_comparablerequirement that the siblingwarp_match_alldoc lists for the no-__builtin_clear_paddingcase. Line 37 here describes padding but doesn't referenceis_bitwise_comparable, while the implementationstatic_asserts it. Align withwarp_match_all.rstline 37 for consistency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 20e5a646-5b0a-4a6c-89fb-56027b96cc9e
📒 Files selected for processing (6)
docs/libcudacxx/extended_api/warp.rstdocs/libcudacxx/extended_api/warp/warp_match_all.rstdocs/libcudacxx/extended_api/warp/warp_match_any.rstlibcudacxx/include/cuda/__warp/warp_match_any.hlibcudacxx/include/cuda/warplibcudacxx/test/libcudacxx/cuda/warp/warp_match_any.pass.cpp
This comment has been minimized.
This comment has been minimized.
| auto __data_copy = __data; | ||
| _CCCL_BUILTIN_CLEAR_PADDING(&__data_copy); | ||
| const auto __data_ptr = ::cuda::std::addressof(__data_copy); |
There was a problem hiding this comment.
Important: This introduces a needless copy. I believe this should only copy if is_bitwise_comparable_v is false
There was a problem hiding this comment.
yes, I'm aware of this issue. However, is_bitwise_comparable_v also checks padding. I need to introduce another (internal) traits for that. I will open a second PR
There was a problem hiding this comment.
Fine by me, although I would like to have an integer overload that does not do any of that and just forwards to the builtin
There was a problem hiding this comment.
I don't think it is needed. Direct call and cuda::device::warp_match_any produce identical code as expected https://godbolt.org/z/sKaWz5dv1
|
question: Should this be |
sorry, only the title is wrong. The implementation is correct. Let me update it |
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
… warp_match_any
This comment has been minimized.
This comment has been minimized.
🥳 CI Workflow Results🟩 Finished in 1h 12m: Pass: 100%/115 | Total: 1d 12h | Max: 1h 00m | Hits: 99%/336704See results here. |
Description
The PR provides
cuda::device::warp_match_anyin a similar way of cuda::device::warp_match_all for completeness.The main difference is that
warp_match_anyreturns amaskinstead of a bool value.