Skip to content

Align native UDF builds with jar dependency pins#632

Open
nvliyuan wants to merge 1 commit into
NVIDIA:mainfrom
nvliyuan:fix-native-udf-jar-pins-v2
Open

Align native UDF builds with jar dependency pins#632
nvliyuan wants to merge 1 commit into
NVIDIA:mainfrom
nvliyuan:fix-native-udf-jar-pins-v2

Conversation

@nvliyuan

@nvliyuan nvliyuan commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Resolve spark-rapids-jni and cuDF revisions from the selected rapids-4-spark jar, then fetch the matching cudf-pins and rapids-cmake entrypoint.
  • Build native UDF examples against jar-matched cuDF headers, RMM, CCCL, and rapids-cmake pins instead of moving branches.
  • Document the ABI alignment flow for prebuilt native UDF builds.

Closes #630

Test plan

  • bash -n examples/UDF-Examples/RAPIDS-accelerated-UDFs/resolve-jni-cudf-pins.sh
  • bash -n examples/UDF-Examples/RAPIDS-accelerated-UDFs/clone-cudf-repo.sh
  • bash -n examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh
  • git diff --check
  • Checked Cursor lints for examples/UDF-Examples/RAPIDS-accelerated-UDFs.
  • On spark-yuanli, with ~/work/jars/v26.08/rapids-4-spark_2.12-26.08.0-SNAPSHOT-cuda12.jar, ran mvn package -Pudf-native-examples -Drapids4spark.version=26.08.0-SNAPSHOT -DskipTests inside the native UDF build container and produced target/rapids-4-spark-udf-examples_2.12-26.06.0-SNAPSHOT.jar.
  • Ran a JVM smoke test using the packaged UDF jar plus the v26.08 plugin jar; native StringWordCount completed with native udf completed rows=3.

Signed-off-by: liyuan <yuali@nvidia.com>
@nvliyuan nvliyuan requested a review from GaryShen2008 June 1, 2026 09:08
@nvliyuan

nvliyuan commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator Author

verified worked in local

@greptile-apps

greptile-apps Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the previous branch-based cuDF/RMM/CCCL dependency resolution with an automatic pin-extraction flow: the build now reads the embedded spark-rapids-jni and cudf-java version metadata directly from the rapids-4-spark jar, downloads the matching cudf-pins/versions.json and RAPIDS.cmake from that exact JNI commit, then checks out cuDF at the jar-recorded revision.

  • resolve-jni-cudf-pins.sh (new) extracts JNI/cuDF revisions from the jar, downloads and validates pin files, and emits a jar-native-deps.properties consumed by Maven and CMake.
  • clone-cudf-repo.sh is generalised from branch-only to arbitrary git refs using --filter=blob:none + shallow fetch + FETCH_HEAD checkout.
  • CMakeLists.txt gains RAPIDS_CMAKE_FILE / RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE cache variables; the entrypoint file is correctly applied via include(), but the versions override is validated and logged without being passed to rapids_cpm_init(), so CCCL/RMM are still resolved from the rapids-cmake defaults rather than the jar-matched pins.

Confidence Score: 3/5

The new ABI-alignment flow has a gap in CMakeLists.txt: the jar-matched package override file is resolved, validated, and logged but never fed into rapids_cpm_init(), so CCCL and RMM are compiled against the rapids-cmake defaults rather than the jar-pinned versions.

The core goal of the PR is not achieved because RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE is never passed to rapids_cpm_init(). The entire pin-resolution pipeline runs and files are downloaded, but the versions actually used for compilation come from rapids-cmake defaults, not the jar-matched pins. Additionally, resolve-jni-cudf-pins.sh unconditionally calls python3 for JSON validation despite an unzip-only fallback path existing for property extraction.

src/main/cpp/CMakeLists.txt needs the rapids_cpm_init() call updated to pass the override file; resolve-jni-cudf-pins.sh needs the JSON validation block guarded by a python3 availability check.

Important Files Changed

Filename Overview
examples/UDF-Examples/RAPIDS-accelerated-UDFs/src/main/cpp/CMakeLists.txt Adds RAPIDS_CMAKE_FILE/RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE cache variables and SHA-commit detection; RAPIDS_CMAKE_FILE is correctly applied via include(), but RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE is never passed to rapids_cpm_init(), so jar-matched pins are not applied to CCCL/RMM resolution.
examples/UDF-Examples/RAPIDS-accelerated-UDFs/resolve-jni-cudf-pins.sh New script that reads spark-rapids-jni revision/URL from the jar and downloads matching cudf-pins and RAPIDS.cmake; python3 is unconditionally required for JSON validation despite an unzip fallback elsewhere, and the awk fallback truncates values containing '='.
examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml Adds resolve-jni-cudf-pins.sh exec + property-file load in generate-resources, re-loads properties in compile phase, and passes jar pins as -D flags to cmake; the two-phase property load pattern is correct for Ant tasks across Maven phases.
examples/UDF-Examples/RAPIDS-accelerated-UDFs/clone-cudf-repo.sh Generalises clone/update logic from branch-only to arbitrary git refs using --filter=blob:none + fetch --depth 1 + checkout FETCH_HEAD; logic is sound for commit SHAs and GitHub's uploadpack capabilities.
examples/UDF-Examples/RAPIDS-accelerated-UDFs/extract-cudf-libs.sh Adds RAPIDS_JAR_PATH env-var override, makes TARGET_DIR overridable, guards against missing pom.xml, and wires in resolve-jni-cudf-pins.sh + clone-cudf-repo.sh with a hard error on missing cudf revision.
examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md Documentation updated to reflect the new automatic pin-resolution flow; accurate and consistent with the implementation.

Reviews (1): Last reviewed commit: "Align native UDF builds with jar depende..." | Re-trigger Greptile

Comment on lines +75 to +82
if(RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE)
if(NOT EXISTS "${RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE}")
message(FATAL_ERROR
"RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE does not exist: "
"${RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE}")
endif()
message(STATUS "Using rapids-cmake package override: ${RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE}")
endif()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 RAPIDS_CMAKE_CPM_OVERRIDE_VERSION_FILE validated but never applied

The variable is checked for existence and logged (lines 75–82), but rapids_cpm_init() is called later (line 281 in the SHOULD_USE_PREBUILT path) with no OVERRIDE argument. As a result, rapids_cpm_cccl() and rapids_cpm_rmm() resolve package versions from the defaults baked into the downloaded rapids-cmake, not from the jar-matched versions.json. The override file is effectively a no-op and the whole ABI-pin flow silently fails to apply its pins for CCCL and RMM.

Comment on lines +125 to +153
python3 - "$VERSIONS_FILE" <<'PY'
import json
import sys

versions_file = sys.argv[1]
with open(versions_file, encoding="utf-8") as fh:
data = json.load(fh)

packages = data.get("packages")
if not isinstance(packages, dict) or not packages:
raise SystemExit(f"ERROR: {versions_file} does not contain a non-empty packages map")

missing_metadata = []
for name, package in sorted(packages.items()):
if "version" not in package:
missing_metadata.append(f"{name}: missing version")
has_git_source = "git_url" in package and "git_tag" in package
has_url_source = "url" in package and "url_hash" in package
if not (has_git_source or has_url_source):
missing_metadata.append(f"{name}: missing pinned git/url source")

if missing_metadata:
raise SystemExit("ERROR: invalid cudf-pins metadata:\n " + "\n ".join(missing_metadata))

required = ["CCCL", "rmm"]
missing_required = [name for name in required if name not in packages]
if missing_required:
raise SystemExit("ERROR: cudf-pins missing required packages: " + ", ".join(missing_required))
PY

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Validation block unconditionally requires python3 despite unzip fallback

read_property_from_jar (lines 40–63) handles environments where only unzip is installed. However, the versions.json validation inline script on line 125 calls python3 directly without any availability check. On a host where python3 is absent but unzip is present, property extraction succeeds, both files are downloaded, and then the build exits with an opaque python3: command not found (exit 127) rather than a meaningful diagnostic.

break
PY
elif command -v unzip >/dev/null 2>&1; then
unzip -p "$JAR_PATH" "$entry" 2>/dev/null | awk -F= -v key="$property" '$1 == key {print $2; exit}'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 awk fallback truncates property values containing =

The unzip fallback uses -F= and prints only $2, so a property value containing = is silently truncated. The url property is extracted this way; a query-string = in the JNI repository URL would produce a wrong RAW_BASE and silently download from an incorrect path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Align native UDF build dependencies with rapids-4-spark jar pins

2 participants