Ticket #2092 (closed defect: fixed)

Opened 7 years ago

Last modified 7 years ago

libopen-rte and libopen-pal shared library versioning issues

Reported by: jsquyres Owned by:
Priority: critical Milestone: Open MPI 1.4.2
Version: trunk Keywords:


mpicc currently links all of OMPI's libraries:

-lmpi -lopen-rte -lopen-pal

(similar for the other wrappers) When linking against shared libraries, this is both unnecessary and Bad -- the MPI application ends up explicitly depending on libopen-rte and libopen-pal rather than implicitly depending on them. The difference is that with explicit dependencies, the MPI app is then chained to the .so version numbers of libopen-rte and libopen-pal -- even though MPI apps don't explicitly call anything down in those libraries.

(see the Libtool .so version rules before reading further)

This can be problematic -- consider:

  • OMPI version A: has libmpi 0:0:0, libopen-rte 0:0:0, libopen-pal 0:0:0
  • OMPI version B: has libmpi 0:1:0, libopen-rte 1:0:0, libopen-pal 1:0:0

An MPI app compiled against OMPI vA should be forward compatible with OMPI vB because the MPI interfaces haven't changed. But since the MPI app is explicitly dependent on libopen-rte and libopen-pal, it won't be binary compatible (even though the MPI app doesn't call anything down in libopen-rte or libopen-pal -- only libmpi does, and libmpi presumably has been adjusted for any ORTE/OPAL interface changes). This is Bad.

Unfortunately, listing -lopen-rte and -lopen-pal in the wrappers is necessary because of the case of static linking -- where all the libs are .a's, and therefore need to be explicitly mentioned.

So -- how to fix this? We kicked around a few ideas, but none of them are good. Recording them here for posterity:

  1. Collapse libopen-rte and libopen-pal into a single libmpi. We don't like this because:
    • We like 3 libs because it prevents developers from making abstraction violations.
    • Other projects are now depending on libopen-rte and libopen-pal.
  2. Only collapse libopen-rte/libopen-pal -> libmpi in production builds; keep the 3 libs for developer builds.
    • This seems confusing, and still has the problem that other projects depend on these libraries.
  3. We could figure out in configure whether we're building static or dynamic in configure and adjust Makefile.am-isms to build one big libmpi for static and 3 libs for dynamic -- and then just have the wrappers always only -lmpi (not -lopen-rte, etc.).
    • But what to do when users --enable-static --enable-shared?
  4. We could only allow building static or shared -- not both simultaneously.
    • This might annoy some people...?
  5. We could add logic to the wrappers to look at the libraries in $libdir and figure out whether to list just -lmpi or also -lopen-rte, etc.
    • The wrapper would have to know what the shared library extension(s) are for that platform (and they vary). This is possible, but icky.
    • The wrapper then has to parse the compiler and linker flags passed via argv to see if static or dynamic linking is being forced. These flags vary wildly on different platforms and different compilers. It seems like the only winning move here is not to play.
  6. We could leave the libopen-rte and libopen-pal .so version numbers as 0:0:0 and avoid the issue.
    • We're doing this to get v1.3.4 out the door.
    • But we really should figure out something "better" for v1.4 -- because we're doing a disservice to projects using these libraries.

NOTE: This issue potentially has ramifications about binary compatibility of MPI applications in the v1.3 and v1.4 series with the upcoming v1.5 series. Meaning that if we do properly version libopen-rte/pal in v1.5, apps linked against rte/pal .so libs from the v1.3/v1.4 series may have incompatible "current" and "age" values.

Change History

comment:1 Changed 7 years ago by jsquyres

(In [22197]) Fixes #2091.

After much back-n-forth this afternoon, the RM's have decided that in order to get v1.3.4 out in a timely manner, we are punting on the unexpectedly complex issue of versioning libopen-rte and libopen-pal in v1.3.4. Refs #2092 for more details. Hopefully we can think of a proper solution for v1.4.

Also advance the version number to 1.3.4rc4.

comment:2 Changed 7 years ago by bbenton

  • Milestone changed from Open MPI 1.4 to Open MPI 1.4.2

comment:3 Changed 7 years ago by jsquyres

  • Status changed from new to closed
  • Resolution set to fixed

Per r22691 and the RFC thread (http://www.open-mpi.org/community/lists/devel/2010/02/7447.php), this issue should now be fixed. Note that libopen-rte.so and libopen-pal.so will stay at .so version 0:0:0 for the duration of the v1.4 series. They will be properly versioned in the v1.5 series.

Note: See TracTickets for help on using tickets.