Ticket #1982 (closed defect: fixed)

Opened 7 years ago

Last modified 5 years ago

Fortran MPI_IN_PLACE detection broken on OSX

Reported by: jsquyres Owned by: bosilca
Priority: major Milestone: Open MPI 1.4.6
Version: trunk Keywords:
Cc: ricardo.fonseca@…, dog@…, vanelteren@…

Description

Per the thread starting around here

http://www.open-mpi.org/community/lists/users/2009/08/10164.php

it looks like Fortran MPI_IN_PLACE detection is broken on OS X (this may also indicate that MPI_BOTTOM and other "special" Fortran constants are broken on OS X as well -- but I only tested IN_PLACE).

It works fine in OMPI 1.2.9, but is broken on the SVN trunk and OMPI v1.3.3. Here's a trivial program that tests the issue:

program inplace
  use mpi
  implicit none
  integer :: ierr, rank, rsize, bsize
  real, dimension( 2, 2 ) :: buffer, out
  integer :: rc
  call MPI_INIT(ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
  call MPI_COMM_SIZE(MPI_COMM_WORLD, rsize, ierr)
  buffer = rank + 1
  bsize = size(buffer,1) * size(buffer,2)
  print *, buffer
  call mpi_allreduce( MPI_IN_PLACE, buffer, bsize, MPI_REAL, MPI_SUM, MPI_COMM_WORLD, ierr )
  if ( rank == 0 ) then 
    print *, 'Result:'
    print *, buffer
  endif
  call mpi_finalize( rc )
end program

If you run this with np=2, the result should be 3.

I added some opal_output's in allreduce_f.c of the form:

    if (OMPI_IS_FORTRAN_IN_PLACE(sendbuf)) {
        opal_output(0, "This is IN_PLACE: %p == (%p, %p, %p, %p)", sendbuf, 
                    &MPI_FORTRAN_IN_PLACE, &mpi_fortran_in_place, 
                    &mpi_fortran_in_place_, &mpi_fortran_in_place__);
    } else {
        opal_output(0, "This is NOT IN_PLACE: %p != (%p, %p, %p, %p)", sendbuf,
                    &MPI_FORTRAN_IN_PLACE, &mpi_fortran_in_place, 
                    &mpi_fortran_in_place_, &mpi_fortran_in_place__);
    }

I did this on a v1.2.9 tarball, v1.3.3 tarball, and the SVN trunk.

v1.2.9 recognizes MPI_IN_PLACE properly; trunk and v1.3 do not.

Attachments

bogus-1.0.tar.gz (280.0 KB) - added by jsquyres 7 years ago.

Change History

comment:1 Changed 7 years ago by jsquyres

  • Owner set to jsquyres
  • Status changed from new to assigned

comment:2 Changed 7 years ago by jsquyres

  • Owner changed from jsquyres to bosilca

Actually, handing to George because he'll be able to get to it quicker this week than me.

Here's the info from how I reproduced it:

  • OS X 10.5.7
  • Intel-based MBP
  • XCode 3.1.3
$ where gcc
/usr/bin/gcc
$ gcc --version
i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5493)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

comment:3 Changed 7 years ago by jsquyres

As a followup, this doesn't appear to be a libtool issue -- I compiled the OMPI trunk and the v1.2 SVN branch from the same autotools (AC 2.64, AM 1.11, LT 2.2.6) on OS X 10.5.8 and got the same results: 1.2 works, trunk does not.

comment:4 Changed 7 years ago by jsquyres

So far, I'm mystified.

I have a completely repeatable case of OMPI v1.2 working just find and OMPI SVN trunk failing. I built a much smaller/simpler example outside of MPI that shows the same behavior as the OMPI SVN trunk (i.e., it doesn't work). So OMPI v1.2 must be doing something special to make it work. But I'm darned if I can figure out what it is. :-(

I attached a trivial tarball of the small example I made.

Something must be significantly different between v1.2 and the trunk, but I can't figure out yet what it is. Here's what I've eliminated so far:

  • -fvisibility=hidden in trunk (and not in v1.2) makes no difference. We appear to have DECLSPEC'ed everything properly. And my small example doesn't use -fvisibility=hidden by default, and it still shows the broken behavior.
  • constants.h does not appear to be significantly different.
  • I used the same Autotools between v1.2 and trunk. The generated libtool was slightly different -- but I copied over the v1.2 libtool to the trunk tree and built with it, but still got the same broken behavior.

Changed 7 years ago by jsquyres

comment:5 Changed 7 years ago by bosilca

I did some work on this one, but didn't manage to get anywhere close to something that make sense. In fact as was as mystified as Jeff. The only thing I found that isn't in Jeff's message is that "otool -L" shows that the MPI_IN_PLACE (and all the other constants) are moved in a special data segment in the 1.3 while they are in the normal one in the 1.2. However, I wasn't able to find anything on the net about why this data segment behave different than the rest.

comment:6 Changed 7 years ago by jsquyres

  • Cc dog@… added

Dave Gunter provides some good clues (http://www.open-mpi.org/community/lists/devel/2009/09/6867.php). I don't have cycles to look at this instantly, but it seems like it provides the missing info to fix the problem.

From his mail:

I've been playing around with Jeff's "bogus" tarball and I, too, see it fail on OS X. If I make the following changes in configure[.ac], it works perfectly:

  1. replace -fno-common with -fcommon
  2. add -flat_namespace as part of the arguments for creating shared libs.

After that, things work fine:

(dog@domdechant 63%) main
Fortran MPI_BOTTOM is          93
Assigning C variables
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18, 0x2040/19, 0x602c/20)
Fortran MPI_BOTTOM is          19
Fortran MPI_BOTTOM is          32
MPI_SEND_F: This is BOTTOM: 0x2040 == (0x6020/17, 0x6024/18, 0x2040/32, 0x602c/20)
Fortran MPI_BOTTOM is          32

I still don't see what the problem is for the two different versions of OMPI are.

OSX 10.5.8, GCC 4.4.1, most recent libtool, autoconf, automake and m4.

comment:7 Changed 7 years ago by bosilca

  • Priority changed from blocker to major

comment:8 Changed 7 years ago by dog

I've been unable to make any progress on this and don't know what else to check. The 1.2.9 release and 1.3.3 code release are pretty much identical as far as how these variables are implemented. 1.2.9 works, 1.3.3 (and any of the 1.3.x line) fails.

The generated libtool for 1.3.x is much different than the 1.2 branch but using the libtool from 1.2.9 doesn't fix 1.3.3. There is also a lot of change in the underlying autoconf/make macros but I lack sufficient expertise there to figure out what may have changed.

comment:9 Changed 7 years ago by bosilca

  • Milestone changed from Open MPI 1.3.4 to Open MPI 1.4

comment:10 Changed 7 years ago by bbenton

  • Milestone changed from Open MPI 1.4 to Open MPI 1.4.2

comment:11 Changed 7 years ago by bbenton

  • Milestone changed from Open MPI 1.4.2 to Open MPI 1.4.3

comment:12 Changed 6 years ago by bosilca

A little bit more info on this one. I compared the libmpi of the 1.2 and the 1.5 regarding the symbols related to in_place. Here is what I have:

  • on 1.2
    00000000000d2718 S _MPI_FORTRAN_IN_PLACE
    0000000000000000 - 0c 0000  GSYM _MPI_FORTRAN_IN_PLACE
    00000000000d271c S _mpi_fortran_in_place
    0000000000000000 - 0c 0000  GSYM _mpi_fortran_in_place
    00000000000d2720 S _mpi_fortran_in_place_
    0000000000000000 - 0c 0000  GSYM _mpi_fortran_in_place_
    00000000000d2724 S _mpi_fortran_in_place__
    0000000000000000 - 0c 0000  GSYM _mpi_fortran_in_place__
    
  • on 1.5
    0000000000000000 - 0d 0000  GSYM _MPI_FORTRAN_IN_PLACE
    0000000000267d70 S _MPI_FORTRAN_IN_PLACE
    0000000000267d74 S _mpi_fortran_in_place
    0000000000000000 - 0d 0000  GSYM _mpi_fortran_in_place
    0000000000000000 - 0d 0000  GSYM _mpi_fortran_in_place_
    0000000000267d78 S _mpi_fortran_in_place_
    0000000000000000 - 0d 0000  GSYM _mpi_fortran_in_place__
    0000000000267d7c S _mpi_fortran_in_place__
    

There two things to be noticed:

  • the order of the symbols is different. I wonder how the fact that the variable itself appears before the GSYM in the 1.5 impact the linking process.
  • on the 1.2 the symbols are marked as c (local common symbol), while on the 1.5 they are marked as d (local data section symbol).

comment:13 Changed 5 years ago by jsquyres

  • Cc vanelteren@… added

Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php, the magic compiler (linker) flag we need is:

-Wl,-commons,use_dylibs

I have a patch for this, but it entails an m4 change, so I'll commit it tonight (and CMR to v1.4 and v1.5).

comment:14 Changed 5 years ago by jsquyres

  • Status changed from assigned to closed
  • Resolution set to fixed

(In [25545]) Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php, to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS X, we need to use the following compiler (linker) flag:

-Wl,-commons,use_dylibs

So if we're compiling on OS X, test to see if that flag works with the compiler. If so, add it to the wrapper FFLAGS and FCFLAGS (note that per a future update, we'll only have one Fortran compiler anyway).

Fixes #1982.

comment:15 Changed 5 years ago by jsquyres

  • Status changed from closed to reopened
  • Resolution fixed deleted

Gah -- r25545 was totally borked and was backed out. New commit coming shortly.

comment:16 Changed 5 years ago by jsquyres

  • Status changed from reopened to closed
  • Resolution set to fixed

(In [25547]) (this is what r25545 should have been)

Per http://www.open-mpi.org/community/lists/users/2011/11/17862.php, to make MPI_IN_PLACE (and other sentinel Fortran constants) work on OS X, we need to use the following compiler (linker) flag:

-Wl,-commons,use_dylibs

So if we're compiling on OS X, test to see if that flag works with the compiler. If so, add it to the wrapper FFLAGS and FCFLAGS (note that per a future update, we'll only have one Fortran compiler anyway).

Fixes #1982.

comment:17 Changed 5 years ago by bbenton

  • Milestone changed from Open MPI 1.4.5 to Open MPI 1.4.6

Milestone Open MPI 1.4.5 deleted

Note: See TracTickets for help on using tickets.