Skip to content

CONFIGURE: added a configure option to add SONAME suffix#11483

Open
roiedanino wants to merge 8 commits into
openucx:masterfrom
roiedanino:deepbind
Open

CONFIGURE: added a configure option to add SONAME suffix#11483
roiedanino wants to merge 8 commits into
openucx:masterfrom
roiedanino:deepbind

Conversation

@roiedanino
Copy link
Copy Markdown
Contributor

What?

Adds --with-soname-suffix option in configure which when set adds:

-release SUFFIX

for example:
libucp_la_LDFLAGS = -ldl -version-info $(SOVERSION) -release example

will produce:

libucp-example.so.0.0.0
libucp-example.so.0 -> libucp-example.so.0.0.0
libucp.so -> libucp-example.so.0.0.0

Why?

This allows creating purpose-specific .so files to avoid collisions with already loaded UCX versions when binding to specific versions or binaries is required.

This by itself won't provide a full isolation without using RTLD_DEEPBIND during dlopen.

Signed-off-by: Roie Danino <rdanino@nvidia.com>
Comment thread configure.ac Outdated
AS_IF([test "x$with_soname_suffix" != xno],
[AS_IF([test "x$with_soname_suffix" = xyes],
[AC_MSG_ERROR([--with-soname-suffix requires an explicit suffix value])])
AS_IF([printf '%s\n' "$with_soname_suffix" | grep -Eq '^@<:@A-Za-z0-9@:>@@<:@A-Za-z0-9_.-@:>@*$'],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] [important] Dotted suffixes break module loading

The option accepts dots, but libtool -release 1.22 produces names like libucs-1.22.so.0; UCX's module loader derives module_ext from the first dot in the loaded libucs basename, so it would look for module files ending in .22.so.0 instead of .so.0. That means accepted values such as --with-soname-suffix=1.22 can prevent optional UCT/UCM modules from loading. Either reject dots here or fix the loader to derive the real shared-library extension, and add a small configure/runtime check for a dotted suffix.

Comment thread src/ucs/Makefile.am
-DUCX_CONFIG_DIR=\"$(ucx_config_dir)\"
libucs_la_CFLAGS = $(BASE_CFLAGS) $(LT_CFLAGS)
libucs_la_LDFLAGS = -ldl $(BFD_LDFLAGS) -version-info $(SOVERSION)
libucs_la_LDFLAGS = -ldl $(BFD_LDFLAGS) -version-info $(SOVERSION) $(UCX_LT_RELEASE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] [important] libucs_signal is left unsuffixed

The new suffix is applied to libucm/libucs/libuct/libucp, but libucs_signal is also an installed versioned UCX shared library. If the goal is avoiding collisions with already-loaded UCX DSOs, leaving this SONAME as libucs_signal.so.0 keeps one UCX entry point outside the suffix scheme. Add $(UCX_LT_RELEASE) here as well, or explicitly document that this option is limited to the four core libraries.

@roiedanino
Copy link
Copy Markdown
Contributor Author

@svcnbu-swx-hpcx

@roiedanino
Copy link
Copy Markdown
Contributor Author

@svc-nixl

@roiedanino
Copy link
Copy Markdown
Contributor Author

@svc-nvidia-pr-review

@svc-nvidia-pr-review
Copy link
Copy Markdown

🚫 @roiedanino — not authorized to trigger a review on this repo. Contact a repo admin to be added to the allowed-users list.

@NirWolfer
Copy link
Copy Markdown
Contributor

@svc-nvidia-pr-review

@svc-nvidia-pr-review
Copy link
Copy Markdown

🤖 Starting review — findings will be posted here when done.

Copy link
Copy Markdown

@svc-nvidia-pr-review svc-nvidia-pr-review left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

4 findings — 🚨 1 blocker, ⚠️ 2 minors, ℹ️ 1 info

4 findings posted as inline comments.

Comment thread configure.ac
AS_IF([printf '%s\n' "$with_soname_suffix" | grep -Eq '^@<:@A-Za-z0-9@:>@@<:@A-Za-z0-9_.-@:>@*$'],
[],
[AC_MSG_ERROR([--with-soname-suffix must contain only letters, digits, dots, underscores, and dashes])])
UCX_LT_RELEASE="-release $with_soname_suffix"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 BLOCKER

libtool's -release changes the installed library filename (e.g. libucp-SUFFIX.so.0), not just the SONAME. With this set, ucx.pc (which emits -lucp) and cmake/ucx-targets.cmake.in (which hardcodes libucp.so/libucs.so/libuct.so) will fail to locate the libs for downstream consumers. Either also template these files with the suffix, or use a mechanism that only alters the SONAME (e.g., -Wl,-soname=...) if the intent really is SONAME-only as the option name suggests.

Comment thread configure.ac Outdated
ucx_soname_suffix_summary="<disabled>"
AC_ARG_WITH([soname-suffix],
AS_HELP_STRING([--with-soname-suffix=SUFFIX],
[Append SUFFIX to UCX core shared library SONAMEs. Disabled by default. [default=NO]]),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ MINOR

the option is named --with-soname-suffix but -release actually appends the suffix to the library filename (and hence SONAME), not only to the SONAME. pls clarify the help text or rename (e.g. --with-library-suffix) so users aren't surprised by the renamed .so file.

Comment thread configure.ac Outdated
AS_IF([test "x$with_soname_suffix" != xno],
[AS_IF([test "x$with_soname_suffix" = xyes],
[AC_MSG_ERROR([--with-soname-suffix requires an explicit suffix value])])
AS_IF([printf '%s\n' "$with_soname_suffix" | grep -Eq '^@<:@A-Za-z0-9@:>@@<:@A-Za-z0-9_.-@:>@*$'],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ MINOR

the regex printf | grep -Eq works but is heavier than needed; a case statement (as used elsewhere in autoconf) would avoid spawning subprocesses and be more portable. minor.

Comment thread configure.ac
AS_IF([printf '%s\n' "$with_soname_suffix" | grep -Eq '^@<:@A-Za-z0-9@:>@@<:@A-Za-z0-9_.-@:>@*$'],
[],
[AC_MSG_ERROR([--with-soname-suffix must contain only letters, digits, dots, underscores, and dashes])])
UCX_LT_RELEASE="-release $with_soname_suffix"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ INFO

only the four core libs (ucm/ucp/ucs/uct) get $(UCX_LT_RELEASE); the per-transport module libs (e.g. src/uct/ib/Makefile.am, src/ucm/cuda/Makefile.am, …) still use plain -version-info $(SOVERSION). is the asymmetry intentional? if two UCX trees with different suffixes are installed in the same prefix, the modules will collide.

Signed-off-by: Roie Danino <rdanino@nvidia.com>
@roiedanino
Copy link
Copy Markdown
Contributor Author

@svc-nvidia-pr-review

@svc-nvidia-pr-review
Copy link
Copy Markdown

🤖 Starting review — findings will be posted here when done.

Copy link
Copy Markdown

@svc-nvidia-pr-review svc-nvidia-pr-review left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

4 findings⚠️ 3 minors, ℹ️ 1 info

3 findings posted as inline comments.

General Findings

ℹ️ INFO

no test coverage: nothing in test/ exercises --with-soname-suffix or ucs_module_normalize_base. Can we add a small gtest that constructs a fake module filename with the configured suffix and verifies the loader strips it, similar to existing test_module usage? Otherwise this path is only validated by hand.

Comment thread configure.ac Outdated
[Suffix appended to private UCX module library names])
ucx_soname_suffix_summary="$with_soname_suffix"])
AC_SUBST([UCX_LT_RELEASE])
UCT_MODULE_LDFLAGS='$(UCX_LT_RELEASE)'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ MINOR

UCT_MODULE_LDFLAGS is just $(UCX_LT_RELEASE) and the rest of the diff already uses $(UCX_LT_RELEASE) directly in ucm/, ucp/, ucs/, perf/. Why have two names for the same thing? Can we drop UCT_MODULE_LDFLAGS and use $(UCX_LT_RELEASE) everywhere?

Comment thread configure.ac Outdated
[],
[AC_MSG_ERROR([--with-soname-suffix must contain only letters, digits, dots, underscores, and dashes])])
UCX_LT_RELEASE="-release $with_soname_suffix"
AC_DEFINE_UNQUOTED([UCX_MODULE_SONAME_SUFFIX],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ MINOR

minor: the macro name UCX_MODULE_SONAME_SUFFIX is misleading — libtool -release modifies the library file name (and link/SONAME), it doesn't append to the ELF SONAME. Maybe UCX_MODULE_FILE_SUFFIX or UCX_MODULE_RELEASE_SUFFIX would be clearer.

Comment thread src/ucs/sys/module.c Outdated
size_t base_len = strlen(base);

if ((base_len > suffix_len) &&
!strcmp(base + base_len - suffix_len, UCX_MODULE_SONAME_SUFFIX)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ MINOR

minor: this only strips the suffix when it appears at the very end of base. After ucs_module_filename_to_base that is what we get for libtool -release output (e.g. libuct_cuda-MYSUFFIX), so it works, but a one-line comment here describing the expected input (lib<framework>_<mod>-<suffix>) would help future readers understand why a tail-only check is sufficient.

@roiedanino
Copy link
Copy Markdown
Contributor Author

@svc-nvidia-pr-review

@svc-nvidia-pr-review
Copy link
Copy Markdown

🤖 Starting review — findings will be posted here when done.

Copy link
Copy Markdown

@svc-nvidia-pr-review svc-nvidia-pr-review left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

4 findings — 🚨 1 blocker, ⚠️ 2 minors, ℹ️ 1 info

4 findings posted as inline comments.

Comment thread configure.ac
libuct_knem_la_LIBADD = $(top_builddir)/src/ucs/libucs.la \
$(top_builddir)/src/uct/libuct.la
libuct_knem_la_LDFLAGS = -version-info $(SOVERSION) $(UCT_MODULE_LDFLAGS)
libuct_knem_la_LDFLAGS = -version-info $(SOVERSION) $(UCX_LT_RELEASE)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ MINOR

this replaces $(UCT_MODULE_LDFLAGS) with $(UCX_LT_RELEASE) instead of appending — the other Makefile.am entries in this PR all kept their existing flags and only added $(UCX_LT_RELEASE). UCT_MODULE_LDFLAGS is currently unset so it's harmless, but was the drop intentional?

Comment thread src/ucs/sys/module.c Outdated
}

ucs_module_filename_to_base(entry->d_name, base, sizeof(base));
ucs_module_normalize_base(base);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ MINOR

ucs_module_normalize_base will also strip the configured suffix from any unrelated module name that happens to end with it (e.g. a third-party libuct_foo-ci.so not built with --with-soname-suffix=ci). Worth scoping on the -<suffix> separator anchored to the framework prefix, or at least a comment that such collisions are accepted.

Comment thread src/ucs/sys/module_int.h Outdated
#include <string.h>


static inline void ucs_module_normalize_base(char *base)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️ INFO

minor: a dedicated header for a single 8-line static inline helper feels heavy. It exists only so test_sys.cc can reach the function — could keep the helper static in module.c and expose a test hook, or declare it non-inline in module.h.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants