cudax/stf: migrate stream/interfaces/ from cuda_safe_call to cuda_try#9268
cudax/stf: migrate stream/interfaces/ from cuda_safe_call to cuda_try#9268andralex wants to merge 2 commits into
Conversation
Covers the stream-backend data interfaces (hashtable, slice, slice reduction ops). Uses the templated cuda_try<F> form for single-function calls (cudaMemcpyAsync, cudaMemcpy2DAsync, cudaMemsetAsync, cudaStreamSynchronize, cudaFreeHost, cudaFreeAsync, cudaPointerGetAttributes). cudaHostAlloc and cudaMallocAsync stay in the runtime-status form: both are overload sets (cuda_runtime.h templated wrappers), so cuda_try<F> cannot name them. hashtable stream_data_allocate: after the device cudaMallocAsync succeeds, the buffer is freed via SCOPE(fail) if the subsequent cudaMemsetAsync throws, so the new throw path does not leak the allocation. Adds scope_guard.cuh. Verified locally by building cudax.test.stf.hashtable.test, cudax.test.stf.reductions.slice2d_reduction (2D cudaMemcpy2DAsync path), and cudax.test.stf.reductions.reduce_sum.
|
placeholder |
|
/ok to test b6dd048 |
|
/ok to test 33ffac5 |
@andralex, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
SummaryThis PR migrates CUDA runtime error handling in the data interfaces under cudax/include/cuda/experimental/__stf/stream/interfaces/ from abort-on-error ChangesFiles modified: 3 | Sites changed: 15 | Lines changed: +26/-18
Notable implementation details / decisions
API / compatibility
Validation / tests
WalkthroughThree STF stream-interface files migrate CUDA error handling from ChangesSTF Stream Interface Error Handling Refactor
Suggested labels
Suggested reviewers
Comment |
…set) cudaMemcpyAsync is an overload set on CTK 13.2+ (cuda_runtime.h adds an alternate-spelling wrapper), so cuda_try<cudaMemcpyAsync> is ill-formed. Clang rejects it; GCC accepts it leniently, which is why local GCC builds missed it. Revert the cudaMemcpyAsync calls in the stream slice and hashtable interfaces to the runtime-status cuda_try(cudaMemcpyAsync(...)) form. The other (non-overloaded) calls keep the templated form.
|
/ok to test 3ae7ba1 |
😬 CI Workflow Results🟥 Finished in 58m 10s: Pass: 94%/55 | Total: 21h 49m | Max: 58m 10s | Hits: 16%/136943See results here. |
|
/ok to test 3ae7ba1 |
Summary
Migrates the
cudax/include/cuda/experimental/__stf/stream/interfaces/data interfaces (hashtable, slice, slice reduction ops) fromcuda_safe_calltocuda_try. Part of the ongoing STFcuda_safe_call->cuda_tryrollout; the large stream files (event_types.cuh,stream_ctx.cuh,stream_task.cuh) are handled in separate PRs.Changes (3 files, 15 sites)
cuda_try<F>for single-function calls:cudaMemcpyAsync,cudaMemcpy2DAsync,cudaMemsetAsync,cudaStreamSynchronize,cudaFreeHost,cudaFreeAsync,cudaPointerGetAttributes(out-param -> returnedcudaPointerAttributes).cuda_try(...)for overload sets (cuda_runtime.htemplated wrappers):cudaHostAlloc,cudaMallocAsync.hashtable_linearprobing.cuhleak guard: instream_data_allocate, after the devicecudaMallocAsyncsucceeds, the buffer is freed viaSCOPE(fail)if the followingcudaMemsetAsyncthrows — closing the leak the new throw path would otherwise introduce. Addsscope_guard.cuh.Validation
Built locally (cpp20):
cudax.test.stf.hashtable.test,cudax.test.stf.reductions.slice2d_reduction(exercises the 2DcudaMemcpy2DAsyncpath),cudax.test.stf.reductions.reduce_sum— all compile and link.Test plan