[libcu++] Make argument namespace and wrappers construction public#9251
[libcu++] Make argument namespace and wrappers construction public#9251pciolkosz wants to merge 5 commits into
Conversation
davebayer
left a comment
There was a problem hiding this comment.
I'm sorry, but I'll block this PR. We said there would be a discussion about the naming. I'm still strongly against having a separate namespace for these types.
|
Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine. Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
SummaryThis PR makes the user-facing CUDA argument wrapper API public: it adds a public header <cuda/argument> and moves the non‑underscored, public wrapper and bounds types into the cuda::args namespace. Internal inspection helpers and trait machinery remain underscored for now and are available under cuda::args::__* as internal APIs. The change resolves issue Key Changes
Codebase ImpactWidespread rename/usage updates across CUB, benchmarks, kernels, and libcudacxx tests; notable updated files:
Most edits are renames and header switches; some constructor/validation logic for bounds and wrapper construction was tightened/rewired in the argument implementation—tests were updated accordingly. API/ABI Impact
Review Notes
suggestion: Walkthrough This PR replaces internal cuda::__argument wrapper/bounds/traits names with public cuda::argument / cuda::args equivalents, adds a public forwarding header Changes CUDA argument API public-API migration
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)libcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpp:11:10: fatal error: 'cuda/argument' file not found ... [truncated 2200 characters] ... ile "src/clang/cTrans.ml", line 4782, characters 12-47 libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp:11:10: fatal error: 'cuda/argument' file not found ... [truncated 1160 characters] ... al-isystem" "/usr/local/include" "-internal-isystem" libcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpp:11:10: fatal error: 'cuda/argument' file not found ... [truncated 2200 characters] ... langFrontend__CTrans.CTrans_funct.exec_with_node_creation in file "src/clang/cTrans.ml" (inlined), line 104, characters 20-38
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4424dd10-8027-4d66-97f4-e3d43a7b6cf4
📒 Files selected for processing (20)
cub/benchmarks/bench/segmented_topk/fixed/keys.cucub/benchmarks/bench/segmented_topk/variable/keys.cucub/cub/agent/agent_batched_topk.cuhcub/cub/detail/segmented_params.cuhcub/cub/device/dispatch/dispatch_batched_topk.cuhcub/cub/device/dispatch/kernels/kernel_batched_topk.cuhcub/test/catch2_test_device_segmented_topk_keys.cucub/test/catch2_test_device_segmented_topk_pairs.culibcudacxx/include/cuda/__argument/argument.hlibcudacxx/include/cuda/__argument/argument_bounds.hlibcudacxx/include/cuda/argumentlibcudacxx/include/cuda/std/__internal/namespaces.hlibcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpplibcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpplibcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp
Tagging @gevtushenko |
6002bc6 to
3210e3d
Compare
This comment has been minimized.
This comment has been minimized.
| auto segment_sizes = ::cuda::__argument::__constant<MaxSegmentSize>{}; | ||
| auto k = ::cuda::__argument::__constant<MaxNumSelected>{}; | ||
| auto select_direction = ::cuda::__argument::__constant<cub::detail::topk::select::max>{}; | ||
| auto segment_sizes = ::cuda::argument::constant<MaxSegmentSize>{}; |
There was a problem hiding this comment.
C++26 has constant_wrapper. It's hard to back-port fully to earlier C++ versions (though there are proposals pending review that might fix this), but for now, it's how C++ spells "compile-time constant." Does this cuda::argument::constant do the same thing? If so, then should we consider just making this an alias to constant_wrapper (or a hypothetical cuda::std::constant_wrapper)?
|
|
||
| using segment_size_val_t = typename ::cuda::__argument::__traits<SegmentSizeParameterT>::element_type; | ||
| using num_segments_val_t = typename ::cuda::__argument::__traits<NumSegmentsParameterT>::element_type; | ||
| using segment_size_val_t = typename ::cuda::argument::__traits<SegmentSizeParameterT>::element_type; |
There was a problem hiding this comment.
Is this library usable without exposing __traits to the public API? If not, then should we consider waiting until __traits is ready?
There was a problem hiding this comment.
The main user facing part is the construction of the wrappers, while our APIs will use the traits to examine them. Long term we want to expose the __traits I think, but we don't have to right now so keeping them private leaves us room to change them easily and make public later
There was a problem hiding this comment.
Thanks for explaining! That makes sense.
| //! @brief Wraps a compile-time constant argument value. | ||
| template <auto _Value> | ||
| struct __constant | ||
| struct constant |
There was a problem hiding this comment.
This looks more or less like constant_wrapper. Should we consider spelling it cuda::std::constant_wrapper?
There was a problem hiding this comment.
We considered that design, cuda::std::constant_wrapper is difficult to back port to older c++ standards and the timeline probably doesn't leave enough room to implement it. I think its fine to just interoperate with constant wrapper later on
| static_assert(__valid_static_bounds_v<__element_type, _StaticBounds>, | ||
| "static argument bounds cannot be represented by the element type"); | ||
|
|
||
| _Arg __arg_; |
There was a problem hiding this comment.
This is a public member. That means it's possible to set the value without validating the bounds. Should we instead consider making the value private, so that every constructed immediate instance has been validated?
There was a problem hiding this comment.
I could make it a private member and add an underscored __access structure that is a friend or I can just underscore the variable and have less code this way. I don't mind either way, but we need a way to access it at the end of the day.
But I think this question does apply to __unwrap, which currently has two version, const and not-const reference. I could see the non-const reference version being not necessary, but it's also not public yet, so we don't have to resolve it right now. I might as well remove it and we can add it when needed
There was a problem hiding this comment.
@pciolkosz Thanks for answering! I don't quite understand the argument why validation is separate from construction.
There was a problem hiding this comment.
Also: Why can't users access the value (by value)? Isn't this just a validated value? Why wouldn't this class just have a value() member?
There was a problem hiding this comment.
I decided to not be stubborn and changed all members of the wrappers to private and added a friend struct to access them. We might end up exposing member functions to access them, but for now there is only __unwrap free function
|
|
||
| template <class _ElementType, auto _Lowest, auto _Highest> | ||
| inline constexpr bool __valid_static_bounds_v<_ElementType, __static_bounds<_Lowest, _Highest>> = | ||
| inline constexpr bool __valid_static_bounds_v<_ElementType, static_bounds<_Lowest, _Highest>> = |
There was a problem hiding this comment.
There are runtime_bounds and static_bounds. Do we also need a representation of half-run-time, half-static bounds?
submdspan represents bounds generally as any type for which structured binding into two elements, both convertible to index_type, is well-formed. This would let users pass in bounds however they like. It also would permit a unified representation of any combination of run-time and static bounds.
There was a problem hiding this comment.
Also, the template parameters' types are unconstrained here. What are the actual requirements on these types?
There was a problem hiding this comment.
You can tighten the static bounds with runtime bounds, so in this case you can do static_bounds<0, 1000> and runtime_bounds(100, 1000) which effectively would be a half-run-time, half-static bounds. It avoids overcomplicating the input types at the expense of this case being a bit more verbose, but I think it should be rare anyway.
Also, the template parameters' types are unconstrained here. What are the actual requirements on these types?
Good point, I added type validation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7cc00549-bb49-459c-96d8-ed4906627049
📒 Files selected for processing (22)
cub/benchmarks/bench/segmented_topk/fixed/keys.cucub/benchmarks/bench/segmented_topk/variable/keys.cucub/cub/agent/agent_batched_topk.cuhcub/cub/detail/segmented_params.cuhcub/cub/device/dispatch/dispatch_batched_topk.cuhcub/cub/device/dispatch/kernels/kernel_batched_topk.cuhcub/test/catch2_test_device_segmented_topk_keys.cucub/test/catch2_test_device_segmented_topk_pairs.culibcudacxx/include/cuda/__argument/argument.hlibcudacxx/include/cuda/__argument/argument_bounds.hlibcudacxx/include/cuda/std/__internal/namespaces.hlibcudacxx/test/libcudacxx/cuda/argument/argument_bounds.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/deferred_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar.fail.cpplibcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar_traits.fail.cpplibcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpplibcudacxx/test/libcudacxx/cuda/argument/static_argument_bounds_type.fail.cpplibcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpplibcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpplibcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp
✅ Files skipped from review due to trivial changes (1)
- libcudacxx/test/libcudacxx/cuda/argument/static_bounds_type_mismatch.fail.cpp
🚧 Files skipped from review as they are similar to previous changes (17)
- libcudacxx/include/cuda/std/__internal/namespaces.h
- libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar.fail.cpp
- libcudacxx/test/libcudacxx/cuda/argument/deferred_sequence_scalar_traits.fail.cpp
- libcudacxx/test/libcudacxx/cuda/argument/static_bounds_conversion.fail.cpp
- libcudacxx/test/libcudacxx/cuda/argument/static_argument_bounds_type.fail.cpp
- libcudacxx/test/libcudacxx/cuda/argument/argument_traits.pass.cpp
- cub/cub/agent/agent_batched_topk.cuh
- cub/cub/detail/segmented_params.cuh
- cub/cub/device/dispatch/kernels/kernel_batched_topk.cuh
- cub/benchmarks/bench/segmented_topk/fixed/keys.cu
- cub/benchmarks/bench/segmented_topk/variable/keys.cu
- cub/test/catch2_test_device_segmented_topk_pairs.cu
- libcudacxx/test/libcudacxx/cuda/argument/usage_example.pass.cpp
- cub/test/catch2_test_device_segmented_topk_keys.cu
- libcudacxx/include/cuda/__argument/argument_bounds.h
- libcudacxx/test/libcudacxx/cuda/argument/dynamic_argument.pass.cpp
- libcudacxx/test/libcudacxx/cuda/argument/static_argument.pass.cpp
| static_assert(::cuda::args::__traits<NumSegmentsParameterT>::is_single_value, | ||
| "Only uniform segment sizes are currently supported."); |
There was a problem hiding this comment.
suggestion: Fix the static_assert message to match the checked condition.
Line 243 says “Only uniform segment sizes are currently supported.” but Line 242 checks NumSegmentsParameterT (uniform number of segments). Update the message to avoid misleading compile-time diagnostics.
|
Changes from the code review session:
One thing to consider is if we want to expose |
🥳 CI Workflow Results🟩 Finished in 2h 27m: Pass: 100%/340 | Total: 6d 16h | Max: 2h 27m | Hits: 94%/500627See results here. |
Closes #9254
This PR removes underscores from the
cuda::argumentnamespace and the user facing argument wrappers and bound objects construction.Traits and other utility functions remain underscored, those are meant mostly to be used inside CCCL APIs to examine the argument wrappers passed by the user, but they will most likely be public later on, once we are more confident with the design.