AZP/RELEASE: publish UCXX in UCX release pipeline#11488
Draft
Alexey-Rivkin wants to merge 7 commits into
Draft
Conversation
188b7f6 to
502eace
Compare
17c8289 to
715010d
Compare
UCXX tests run in rapidsai/ci-conda and ci-wheel base images. Thin wrappers open /opt/conda and /pyenv so the Azure-injected step user can use them, and add gdb so ucxx's timeout_with_stack.py can capture stacks on hangs.
Pull rapidsai/ucxx as a pipeline resource and add two stages gated on Static_check: UCXX_build (conda + wheel packages, docs, devcontainer, checks) then UCXX_tests (conda C++/Python on the CPU + GPU matrix). Covers x86_64 + aarch64, CUDA 12 + 13; GPU tests on amd64/cuda13. distributed-ucxx excluded (not upstreamed).
build_ucxx.sh and test_ucxx.sh wrap UCXX's ci/*.sh entrypoints for the Azure agents: stage rapids download shims, set the wheel toolchain, run the conda/wheel build, C++ gtest and Python test phases. CPU slices disable CUDA-only gtests; GPU slices force the host CUDA driver so cuInit matches the MPS daemon. test_client_shutdown is skipped (flaky teardown under MPS contention).
Each UCX PR must test a fixed UCXX revision; refs/heads/main drifts, so a green run says nothing durable. Pin to a tag and bump it deliberately as new UCXX releases are validated.
288c835 to
a9596f2
Compare
RAPIDS 26.06 shipped; ToT and the base images we wrap moved to 26.08.
a9596f2 to
8966f55
Compare
Pin the rapidsai/ucxx resource to a specific main commit (33deb0b) rather than v0.51.00a. Alpha tags are cut at code-freeze and don't pick up ongoing main work, and an old tag drifts from RAPIDS CI updates (images, ci/ scripts) that must move in tandem. A pinned commit stays immutable/reproducible while letting us do controlled bumps. This commit already includes ucxx openucx#674, so drop the local <unistd.h> patch in build_ucxx.sh + test_ucxx.sh.
Extend the release pipeline to build UCXX conda + libucxx/ucxx wheels + docs on a UCX release tag and on a daily cron, reusing the PR pipeline's UCXX scripts and CUDA-pinned images. Upload/publish steps (anaconda.org for conda+wheels, S3 for docs) are wired but gated `condition: false` until the rapids tokens are provisioned in the Azure secret store - each step's TODO names its secret. Builds on the PR-pipeline work: per-slice wheel container (cuda12/cuda13), py3.11 wheels, ucxx pinned to v0.51.00a, and the host-driver override that the wheel image needs. PR pipeline unchanged.
8966f55 to
48d6dd7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
Build + publish UCXX (conda, libucxx/ucxx wheels, docs) from the UCX release pipeline - on a release tag and a daily cron.
How?
schedules:cron + stage condition (release tag OR Schedule).v0.51.00a.Build.Reason.condition: falseuntil tokens land.