[cudax] Implement cudax::coop::reduce for warp groups within a block
#9258
+252
−6
cudax::coop::reduce for warp groups within a block
#9258