Align native UDF builds with jar dependency pins#632
Conversation
Signed-off-by: liyuan <yuali@nvidia.com>
|
verified worked in local |
Greptile SummaryThis PR replaces the previous branch-based cuDF/RMM/CCCL dependency resolution with an automatic pin-extraction flow: the build now reads the embedded
Confidence Score: 3/5The new ABI-alignment flow has a gap in CMakeLists.txt: the jar-matched package override file is resolved, validated, and logged but never fed into rapids_cpm_init(), so CCCL and RMM are compiled against the rapids-cmake defaults rather than the jar-pinned versions. The core goal of the PR is not achieved because RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE is never passed to rapids_cpm_init(). The entire pin-resolution pipeline runs and files are downloaded, but the versions actually used for compilation come from rapids-cmake defaults, not the jar-matched pins. Additionally, resolve-jni-cudf-pins.sh unconditionally calls python3 for JSON validation despite an unzip-only fallback path existing for property extraction. src/main/cpp/CMakeLists.txt needs the rapids_cpm_init() call updated to pass the override file; resolve-jni-cudf-pins.sh needs the JSON validation block guarded by a python3 availability check. Important Files Changed
Reviews (1): Last reviewed commit: "Align native UDF builds with jar depende..." | Re-trigger Greptile |
| if(RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE) | ||
| if(NOT EXISTS "${RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE}") | ||
| message(FATAL_ERROR | ||
| "RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE does not exist: " | ||
| "${RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE}") | ||
| endif() | ||
| message(STATUS "Using rapids-cmake package override: ${RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE}") | ||
| endif() |
There was a problem hiding this comment.
RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE validated but never applied
The variable is checked for existence and logged (lines 75–82), but rapids_cpm_init() is called later (line 281 in the SHOULD_USE_PREBUILT path) with no OVERRIDE argument. As a result, rapids_cpm_cccl() and rapids_cpm_rmm() resolve package versions from the defaults baked into the downloaded rapids-cmake, not from the jar-matched versions.json. The override file is effectively a no-op and the whole ABI-pin flow silently fails to apply its pins for CCCL and RMM.
| python3 - "$VERSIONS_FILE" <<'PY' | ||
| import json | ||
| import sys | ||
|
|
||
| versions_file = sys.argv[1] | ||
| with open(versions_file, encoding="utf-8") as fh: | ||
| data = json.load(fh) | ||
|
|
||
| packages = data.get("packages") | ||
| if not isinstance(packages, dict) or not packages: | ||
| raise SystemExit(f"ERROR: {versions_file} does not contain a non-empty packages map") | ||
|
|
||
| missing_metadata = [] | ||
| for name, package in sorted(packages.items()): | ||
| if "version" not in package: | ||
| missing_metadata.append(f"{name}: missing version") | ||
| has_git_source = "git_url" in package and "git_tag" in package | ||
| has_url_source = "url" in package and "url_hash" in package | ||
| if not (has_git_source or has_url_source): | ||
| missing_metadata.append(f"{name}: missing pinned git/url source") | ||
|
|
||
| if missing_metadata: | ||
| raise SystemExit("ERROR: invalid cudf-pins metadata:\n " + "\n ".join(missing_metadata)) | ||
|
|
||
| required = ["CCCL", "rmm"] | ||
| missing_required = [name for name in required if name not in packages] | ||
| if missing_required: | ||
| raise SystemExit("ERROR: cudf-pins missing required packages: " + ", ".join(missing_required)) | ||
| PY |
There was a problem hiding this comment.
Validation block unconditionally requires
python3 despite unzip fallback
read_property_from_jar (lines 40–63) handles environments where only unzip is installed. However, the versions.json validation inline script on line 125 calls python3 directly without any availability check. On a host where python3 is absent but unzip is present, property extraction succeeds, both files are downloaded, and then the build exits with an opaque python3: command not found (exit 127) rather than a meaningful diagnostic.
| break | ||
| PY | ||
| elif command -v unzip >/dev/null 2>&1; then | ||
| unzip -p "$JAR_PATH" "$entry" 2>/dev/null | awk -F= -v key="$property" '$1 == key {print $2; exit}' |
There was a problem hiding this comment.
awk fallback truncates property values containing =
The unzip fallback uses -F= and prints only $2, so a property value containing = is silently truncated. The url property is extracted this way; a query-string = in the JNI repository URL would produce a wrong RAW_BASE and silently download from an incorrect path.
Summary
spark-rapids-jniand cuDF revisions from the selectedrapids-4-sparkjar, then fetch the matchingcudf-pinsandrapids-cmakeentrypoint.Closes #630
Test plan
bash -n examples/UDF-Examples/RAPIDS-accelerated-UDFs/resolve-jni-cudf-pins.shbash -n examples/UDF-Examples/RAPIDS-accelerated-UDFs/clone-cudf-repo.shbash -n examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.shgit diff --checkexamples/UDF-Examples/RAPIDS-accelerated-UDFs.spark-yuanli, with~/work/jars/v26.08/rapids-4-spark_2.12-26.08.0-SNAPSHOT-cuda12.jar, ranmvn package -Pudf-native-examples -Drapids4spark.version=26.08.0-SNAPSHOT -DskipTestsinside the native UDF build container and producedtarget/rapids-4-spark-udf-examples_2.12-26.06.0-SNAPSHOT.jar.StringWordCountcompleted withnative udf completed rows=3.