UCT/CUDA/CUDA_IPC: Separated get and put remote cache methods.#11501
UCT/CUDA/CUDA_IPC: Separated get and put remote cache methods.#11501rakhmets wants to merge 4 commits into
Conversation
8f6c42e to
3870eee
Compare
|
🤖 Starting review — findings will be posted here when done. |
svc-nvidia-pr-review
left a comment
There was a problem hiding this comment.
Code Review
2 findings — 🚨 2 blockers
2 findings posted as inline comments.
45caec2 to
17616e4
Compare
|
🤖 Starting review — findings will be posted here when done. |
svc-nvidia-pr-review
left a comment
There was a problem hiding this comment.
Code Review
No issues found. ✅
| } else { | ||
| ucs_error("unable to use cuda_ipc remote_cache hash"); | ||
| status = UCS_ERR_NO_RESOURCE; | ||
| ucs_assertv_always(ret != UCS_KH_PUT_KEY_PRESENT, "key %d:%u:%d is present", |
There was a problem hiding this comment.
Consider the case when 2 threads miss on read (line 548), then go here. The second kh_put will return UCS_KH_PUT_KEY_PRESENT -> assert fails.
cc4d7e4 to
15d51a1
Compare
|
🤖 Starting review — findings will be posted here when done. |
svc-nvidia-pr-review
left a comment
There was a problem hiding this comment.
Code Review
5 findings — 🚨 2 blockers,
5 findings posted as inline comments.
| log_level); | ||
| } | ||
|
|
||
| ucs_rw_spinlock_write_unlock(&uct_cuda_ipc_remote_cache.lock); |
There was a problem hiding this comment.
🚨 BLOCKER
this serializes all map_memhandle calls as writers — the global write lock is held across cuIpcOpenMemHandle, pgtable insert, eviction, etc. The previous code released the outer lock before doing those heavy operations. Defeats the purpose of switching to rw_spinlock. Can we mirror the read-lock-first / upgrade-to-write-only-on-hash-insert pattern used in uct_cuda_ipc_open_memhandle_mempool?
There was a problem hiding this comment.
No, we cannot mirror the pattern used in uct_cuda_ipc_open_memhandle_mempool, as we can remove a cache entry in uct_cuda_ipc_destroy_cache_by_iface_address.
What?
Separated
uct_cuda_ipc_get_remote_cacheinto get and put parts. To useuct_cuda_ipc_put_remote_cacheinuct_cuda_ipc_map_memhandle, anduct_cuda_ipc_get_remote_cacheinuct_cuda_ipc_unmap_memhandle.