[libcu++] Adds exec::guarantee and the max_total_num_items guarantee#9278
[libcu++] Adds exec::guarantee and the max_total_num_items guarantee#9278elstehle wants to merge 4 commits into
exec::guarantee and the max_total_num_items guarantee#9278Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
OverviewAdds a guarantees facility to libcudacxx and the first guarantee type, max_total_num_items, enabling callers to promise an upper bound on the total number of items an algorithm will process so algorithms can choose internal types or allocate temporary storage more efficiently. Key Features
Files Changed (high level)
SummaryIntroduces a composable, queryable guarantees API and the max_total_num_items guarantee to convey an upper bound on total items processed. The API supports static, runtime, and hybrid bounds, preserves narrow integral types via inference, integrates with existing execution query semantics (forwarding queries), and includes positive and negative tests covering the new functionality. important: Walkthrough Adds a guarantee facility (base type, query key, variadic Changes Execution Guarantees Facility
Assessment against linked issues
Suggested labels
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)libcudacxx/test/libcudacxx/cuda/execution/max_total_num_items.pass.cpplibcudacxx/test/libcudacxx/cuda/execution/max_total_num_items.pass.cpp:11:10: fatal error: 'cuda/execution.max_total_num_items.h' file not found ... [truncated 2200 characters] ... e "src/clang/cTrans.ml" (inlined), line 4765, characters 38-71 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1cc66350-916c-4788-a194-ec55f2aa4233
📒 Files selected for processing (9)
libcudacxx/include/cuda/__execution/guarantee.hlibcudacxx/include/cuda/__execution/max_total_num_items.hlibcudacxx/include/cuda/executionlibcudacxx/include/cuda/execution.guarantee.hlibcudacxx/include/cuda/execution.max_total_num_items.hlibcudacxx/test/libcudacxx/cuda/execution/guarantee.fail.cpplibcudacxx/test/libcudacxx/cuda/execution/guarantee.pass.cpplibcudacxx/test/libcudacxx/cuda/execution/max_total_num_items.fail.cpplibcudacxx/test/libcudacxx/cuda/execution/max_total_num_items.pass.cpp
This comment has been minimized.
This comment has been minimized.
|
Why can't we use the argument annotation framework to put an upper bound on the num items? How does the guarantee work for problems with multiple num items, like |
This bound-information is not always attachable to a specific parameter. E.g., for segmented top-k there is only a parameter for the segment sizes (for which we support the argument annotation) but not a parameter for the total number of items. Bounds on the total number of items is an optional parameter here.
I think for algorithms that have required parameters to which bounds can be attached, the information should be attached to the specific parameters. So for single-problem algorithms, I would advise to be using the argument annotation in favor of For Edit: Obviously, this is all forward-looking in preparation to a world where num_items could be a device-accessible-only value. |
🥳 CI Workflow Results🟩 Finished in 1h 05m: Pass: 100%/115 | Total: 23h 57m | Max: 41m 53s | Hits: 99%/337638See results here. |
Closes #9279
Description
Adds
cuda::execution::guaranteetogether with its first guarantee,cuda::execution::max_total_num_items. Whererequirelets a caller demand properties from an algorithm,guaranteelets a caller promise properties of the problem that an algorithm may exploit. Guarantees are bundled withguarantee(...)and surfaced through a dedicated__get_guaranteesquery, mirroringrequire.max_total_num_itemscommunicates an upper bound on the total number of items processed (e.g. the combined size of all segments incub::DeviceBatchedTopK), which an algorithm can use to size intermediate offset types. Since this bound-information may not be attachable to a specific parameter (e.g., on aDeviceBatchedTopKand similarly for segmented algorithms), we decided it should go into the guarantees API.Design decisions
max_total_num_itemsinstead a single atotal_num_itemstaking both lower and upper bounds: Lower bounds are presumably rare in practice, so we optimize for convenience in the common case and keep the two as separate, composable guarantees (guarantee(max_total_num_items<N>(), min_total_num_items<M>())), withmin_total_num_itemskept as follow-up work.max_total_num_items<N>()(static),max_total_num_items(n)(runtime), andmax_total_num_items<N>(n)(static bound + runtime refinement, assertingn <= N).int64_t: a 32-bit bound stays 32-bit instead of widening to 64-bit, such that amax_total_num_items(1000000)still provides an int32 static upper bound. Narrower types can be requested explicitly (max_total_num_items<cuda::std::int16_t{1000}>()).Example