Menu

#4 bad sgi mkmf or template?

open
nobody
7
2003-07-25
2003-06-12
Will Heres
No

Hi,

My compilation of the Havana release keeps failing - at
first I
thought there was a problem with the source code, but
then
I realized that when mkmf makes the Makefile, none of
the
references to LDFLAGS appear - such that the required
libraries
are never referenced. This is on an SGI Origin200. Have
you folks
seen this before?

Thanks - Will

Discussion

  • V.  Balaji

    V. Balaji - 2003-06-17

    Logged In: YES
    user_id=149024

    Will: the template file for all SGI platforms is
    'mkmf.template.sgi', and that does indeed set LDFLAGS to
    some appropriate value that works on most sites. Are you
    sure you have this template file?

     
  • Nobody/Anonymous

    Logged In: NO

    Yes, I have the SGI template. Here's a bit of output from
    run_solo_example,
    with the makefile generated using the mkmf.template.sgi
    template:

    -----------
    if ( 0 != 0 ) then
    make fms.exe
    f90 -Duse_netCDF -Duse_libMPI -macro_expand -d8 -64
    -i4 -r8 -mips4 -O2 -OPT:Olimit=0 -woff1670 -c
    -I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp
    /home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp_domains.F90
    f90 -Duse_netCDF -Duse_libMPI -macro_expand -d8 -64
    -i4 -r8 -mips4 -O2 -OPT:Olimit=0 -woff1670 -c
    -I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp
    -I/usr/local/include
    /home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp_io.F90
    f90 -Duse_netCDF -Duse_libMPI -macro_expand -d8 -64
    -i4 -r8 -mips4 -O2 -OPT:Olimit=0 -woff1670 -c
    -I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/fms
    /home/bradman/will/software/models/FMS/solo_example/fms_src/shared/fms/fms_io.F90
    ------------

    As you can see the LDFLAGS entry is not included. The values
    are set in
    the template file, but are not included at compile time. If
    I add $(LDFLAGS)
    to the end of the FFLAGS or CFLAGS entry, I see something
    like this:

    ----------
    if ( 0 != 0 ) then
    make fms.exe
    f90 -Duse_netCDF -Duse_libMPI -macro_expand
    -Dsgi_mipspro -I/usr/local/include -d8 -i4 -r8 -O2
    -OPT:Olimit=0 -woff1670 -expand_source -64 -mips4
    -dont_warn_unused -L/usr/local/lib -lnetcdf -ludunits -lmpi
    -lsma -lexc -lscs -c
    -I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp
    /home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90

    module mpp_mod
    ^
    f90-855 f90: ERROR MPP_MOD, File =
    /home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90,
    Line = 70, Column = 8
    The compiler has detected errors in module "MPP_MOD". No
    module information file will be created for this module.

    use mpi
    ^
    f90-292 f90: ERROR MPP_MOD, File =
    /home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90,
    Line = 207, Column = 7
    "MPI" is specified as the module name on a USE statement,
    but the compiler cannot find it.
    ----------
    In this case the library references are included but you can
    also see I
    have other problems (and yes, I DO have MPI/MPT installed on
    this machine).

    Any suggestions?

     
  • V.  Balaji

    V. Balaji - 2003-06-17
     
  • Will Heres

    Will Heres - 2003-06-17

    Logged In: YES
    user_id=800011

    OK - I set FORTRAN_SYSTEM_MODULES by hand (don't have
    'modules'
    package installed, and having trouble finding it) to
    /usr/lib/f90modules, then compiled:

    ----------
    make fms.exe
    f90 -macro_expand -Duse_libMPI -Duse_netCDF
    -Dsgi_mipspro -I/usr/local/include -64 -mips4 -d8 -i4 -r8
    -O2 -OPT:Olimit=0 -woff1670 -expand_source -L/usr/local/lib
    -lnetcdf -ludunits -lmpi -lsma -lexc -lscs -c
    -I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp
    /home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90

    ----------
    but then when it runs it dies -

    mpirun -np 4 fms.exe
    MPI: MPI_COMM_WORLD rank 0 has terminated without calling
    MPI_Finalize()
    MPI: Received signal 11

    Do you think I'm still missing something?
    Thanks, Will

     
  • Will Heres

    Will Heres - 2003-06-18

    Logged In: YES
    user_id=800011

    An addendum to yesterday's message...I installed the
    'modules' package,
    unset my FORTRAN_SYSTEM_MODULES that I defined by hand, and
    recompiled accordingly. MPI still dies with signal 11.

    I also tried to compile with "-Duse_libSMA", and the build
    fails at link time,
    with unresolved "mpp_malloc_" in mpp_domains.o

    It shouldn't be this hard to compile this model, should it?
    Thanks, Will

     
  • V.  Balaji

    V. Balaji - 2003-06-18

    Logged In: YES
    user_id=149024

    I'm not sure why your MPI run is failing. I'd need more
    details, e.g a traceback.

    But I do know why changing MPI->SMA fails to link correctly.
    For some reason, the file MPP.mod is not getting overwritten
    in the new compile.
    I've noticed this problem on some systems, but don't have an
    explanation.

    Yes, it shouldn't be this hard to make this damn thing compile!

     
  • Will Heres

    Will Heres - 2003-07-25
    • labels: --> Infrastructure: mpp
    • priority: 5 --> 7
     
  • Will Heres

    Will Heres - 2003-07-25

    Logged In: YES
    user_id=800011

    Hi,

    I just upgraded the IRIX MPT to version 1.8, and changed
    some of the
    module definitions. I tried running again, and failed, but
    received the
    following traceback (see below).

    If it's any easier, I have no problems trying to get this to
    run using
    shmem...

    Thanks - Will
    ----------
    set run = mpirun -v -np 4
    mpirun -v -np 4 fms.exe
    MPI: libxmpi.so 'SGI MPI 4.3 MPT 1.8 06/06/03 11:47:23'
    Job Limits not enabled: Job not found or not part of job
    MPI: libmpi.so 'SGI MPI 4.3 MPT 1.8 06/06/03 11:47:12
    (64_M4)'
    MPI: MPI_MSGS_MAX = 524288
    MPI: MPI_BUFS_PER_PROC= 32
    MPI: Program fms.exe, Rank 0, Process 45409 received signal
    SIGSEGV(11)

    MPI: --------stack traceback-------
    45409(9):
    0xc6c2b50[MPI_SGI_stacktraceback]
    0xc6c2f98[first_arriver_handler]
    0xc6c3228[slave_sig_handler]
    0xd83fb48[flush_]
    0x10032f0c[STDLOG.in.MPP_MOD]
    0x10032920[MPP_INIT.in.MPP_MOD]
    0x10027894[FMS_INIT.in.FMS_MOD]
    0x101837d0[MAIN__]
    0xcc4fc24[main]

    MPI: About to execute: (echo "set \$stacktracelimit=20;
    where; quit") | dbx -p 45409 | sed -e 's/^/MPI: /'
    MPI: dbx version 7.3.1 68542_Oct26 MR Oct 26 2000 17:50:34
    MPI: Process 45409 (fms.exe) stopped at [__waitsys:24
    +0x8,0x4184208]
    MPI: Source (of
    /xlv47/6.5.20f/work/irix/lib/libc/libc_64_M4/proc/waitsys.s)
    not available for Process 45409
    MPI: > 0 __waitsys(0x0, 0xb14e, 0xffffff9070, 0x3, 0x0,
    0x41a3e48, 0x41a3c9c, 0x1)
    ["/xlv47/6.5.20f/work/irix/lib/libc/libc_64_M4/proc/waitsys.s":24,
    0x4184208]
    MPI: 1 _system(0xffffff9150, 0xb14e, 0xffffff9070, 0x3,
    0x0, 0x41a3e48, 0x41a3c9c, 0x1)
    ["/xlv47/6.5.20f/work/irix/lib/libc/libc_64_M4/stdio/system.c":116,
    0x41a1080]
    MPI: 2 MPI_SGI_stacktraceback(0x0, 0xb14e, 0xffffff9070,
    0x3, 0x0, 0x41a3e48, 0x41a3c9c, 0x1)
    ["/xlv4/mpt/1.8/mpi/work/4.3/lib/libmpi/libmpi_64_M4/adi/sig.c":242,
    0xc6c2cb8]
    MPI: 3 first_arriver_handler(0xb, 0x470a300,
    0xffffff9070, 0x3, 0x0, 0x41a3e48, 0x41a3c9c, 0x1)
    ["/xlv4/mpt/1.8/mpi/work/4.3/lib/libmpi/libmpi_64_M4/adi/sig.c":445,
    0xc6c2f98]
    MPI: 4 slave_sig_handler(0xb, 0xb14e, 0xffffff9070, 0x3,
    0x0, 0x41a3e48, 0x41a3c9c, 0x1)
    ["/xlv4/mpt/1.8/mpi/work/4.3/lib/libmpi/libmpi_64_M4/adi/sig.c":528,
    0xc6c3228]
    MPI: 5 _sigtramp(0xffffffff8000000b, 0xb14e,
    0xffffff9a00, 0x3, 0x0, 0x41a3e48, 0x41a3c9c, 0x1)
    ["/xlv47/6.5.20f/work/irix/lib/libc/libc_64_M4/signal/sigtramp.s":71,
    0x419ac7c]
    MPI: 6 flush_(0x0, 0x3fdf8, 0x51, 0x0, 0x80, 0x1285ce80,
    0x4260858, 0x1)
    ["/j7/mtibuild/v74/workarea/v7.4/libf/fio/f77wrappers.c":188,
    0xd83fb48]
    MPI: 7 STDLOG(0x0, 0x8a, 0x51, 0x0, 0x80, 0x1285ce80,
    0x4260858, 0x1)
    ["/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90":1215,
    0x10032f0c]
    MPI: More (n if no)? 8 MPP_INIT(0x0, 0x8a, 0x51, 0x0,
    0x80, 0x1285ce80, 0x4260858, 0x1)
    ["/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90":1159,
    0x10032920]
    MPI: 9 FMS_INIT(0x0, 0x8a, 0x51, 0x0, 0x80, 0x1285ce80,
    0x4260858, 0x1)
    ["/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/fms/fms.f90":302,
    0x10027894]
    MPI: 10 atmos_model(0x0, 0x8a, 0x51, 0x0, 0x80,
    0x1285ce80, 0x4260858, 0x1)
    ["/home/bradman/will/software/models/FMS/solo_example/fms_src/atmos_solo/atmos_model.f90":94,
    0x101837d0]
    MPI: 11 main(0x0, 0x8a, 0x51, 0x0, 0x80, 0x1285ce80,
    0x4260858, 0x1)
    ["/j7/mtibuild/v74/workarea/v7.4/libF77/main.c":101,
    0xcc4fc24]
    MPI: 12 __start()
    ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_64_M4/csu/crt1text.s":177,
    0x10022098]

    MPI: -----stack traceback ends-----
    MPI: Program fms.exe, Rank 0, Process 45409: Dumping core on
    signal SIGSEGV(11) into directory /tmp
    MPI: MPI_COMM_WORLD rank 0 has terminated without calling
    MPI_Finalize()
    MPI: aborting job
    MPI: Received signal 11

    if ( 1 != 0 ) then
    set cores_dumped = 0
    foreach corefile ( core* )
    echo cvdump of core file core.45409
    cvdump fms.exe core.45409
    [1] 45393
    set cores_dumped = `expr $cores_dumped + 1`
    expr 0 + 1
    end
    unset echo
    FATAL: Connection to PCS server failed.
    error reading descendent process status: Error 0
    [1] Exit 1 cvdump fms.exe core.45409 >>
    $corefile.out
    ERROR: in mpirun, core dumped: run 1,
    loop 1
    No match
    No match
    No match
    No match
    No match
    No match
    No match
    No match
    ERROR: in mpirun, core dumped: run 1,
    loop 1
    ERROR: Any output that may have been generated
    is in
    /home/bradman/will/software/models/FMS/solo_example/output_spectral_crash

     
  • V.  Balaji

    V. Balaji - 2003-07-28

    Logged In: YES
    user_id=149024

    There is a known issue with the system call FLUSH() which
    appears in the traceback. As of compiler release 7.4, this
    call has a REQUIRED second argument for an error code, which
    was an optional argument in the earlier release. We have
    corrected the code but you may not yet have this fix.

    I'm guessing this is the problem since FLUSH() appears in
    your traceback.
    But it doesn't explain why the code works with SHMEM, nor
    the obscure error message:

    Job Limits not enabled: Job not found or not part of job

     
  • Will Heres

    Will Heres - 2003-07-28

    Logged In: YES
    user_id=800011

    My compilers here are still 7.3 (7.3.1.2 to be exact), the
    FMS release I'm
    using is Havana (Oct 2002?), and my version of MPI is 4.3.
    Does this
    seem like it should be a problem to you?

    As for the job limit message, it could be that the SGI
    implementation of MPI
    checks for jlimits or miser or something like that, which I
    do not use on the
    systems here...

    Thanks, Will

     

Log in to post a comment.