bad sgi mkmf or template?
Status: Planning
Brought to you by:
alangenh
Hi,
My compilation of the Havana release keeps failing - at
first I
thought there was a problem with the source code, but
then
I realized that when mkmf makes the Makefile, none of
the
references to LDFLAGS appear - such that the required
libraries
are never referenced. This is on an SGI Origin200. Have
you folks
seen this before?
Thanks - Will
Logged In: YES
user_id=149024
Will: the template file for all SGI platforms is
'mkmf.template.sgi', and that does indeed set LDFLAGS to
some appropriate value that works on most sites. Are you
sure you have this template file?
Logged In: NO
Yes, I have the SGI template. Here's a bit of output from
run_solo_example,
with the makefile generated using the mkmf.template.sgi
template:
-----------
if ( 0 != 0 ) then
make fms.exe
f90 -Duse_netCDF -Duse_libMPI -macro_expand -d8 -64
-i4 -r8 -mips4 -O2 -OPT:Olimit=0 -woff1670 -c
-I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp
/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp_domains.F90
f90 -Duse_netCDF -Duse_libMPI -macro_expand -d8 -64
-i4 -r8 -mips4 -O2 -OPT:Olimit=0 -woff1670 -c
-I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp
-I/usr/local/include
/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp_io.F90
f90 -Duse_netCDF -Duse_libMPI -macro_expand -d8 -64
-i4 -r8 -mips4 -O2 -OPT:Olimit=0 -woff1670 -c
-I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/fms
/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/fms/fms_io.F90
------------
As you can see the LDFLAGS entry is not included. The values
are set in
the template file, but are not included at compile time. If
I add $(LDFLAGS)
to the end of the FFLAGS or CFLAGS entry, I see something
like this:
----------
if ( 0 != 0 ) then
make fms.exe
f90 -Duse_netCDF -Duse_libMPI -macro_expand
-Dsgi_mipspro -I/usr/local/include -d8 -i4 -r8 -O2
-OPT:Olimit=0 -woff1670 -expand_source -64 -mips4
-dont_warn_unused -L/usr/local/lib -lnetcdf -ludunits -lmpi
-lsma -lexc -lscs -c
-I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp
/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90
module mpp_mod
^
f90-855 f90: ERROR MPP_MOD, File =
/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90,
Line = 70, Column = 8
The compiler has detected errors in module "MPP_MOD". No
module information file will be created for this module.
use mpi
^
f90-292 f90: ERROR MPP_MOD, File =
/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90,
Line = 207, Column = 7
"MPI" is specified as the module name on a USE statement,
but the compiler cannot find it.
----------
In this case the library references are included but you can
also see I
have other problems (and yes, I DO have MPI/MPT installed on
this machine).
Any suggestions?
Logged In: YES
user_id=800011
OK - I set FORTRAN_SYSTEM_MODULES by hand (don't have
'modules'
package installed, and having trouble finding it) to
/usr/lib/f90modules, then compiled:
----------
make fms.exe
f90 -macro_expand -Duse_libMPI -Duse_netCDF
-Dsgi_mipspro -I/usr/local/include -64 -mips4 -d8 -i4 -r8
-O2 -OPT:Olimit=0 -woff1670 -expand_source -L/usr/local/lib
-lnetcdf -ludunits -lmpi -lsma -lexc -lscs -c
-I/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp
/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90
----------
but then when it runs it dies -
mpirun -np 4 fms.exe
MPI: MPI_COMM_WORLD rank 0 has terminated without calling
MPI_Finalize()
MPI: Received signal 11
Do you think I'm still missing something?
Thanks, Will
Logged In: YES
user_id=800011
An addendum to yesterday's message...I installed the
'modules' package,
unset my FORTRAN_SYSTEM_MODULES that I defined by hand, and
recompiled accordingly. MPI still dies with signal 11.
I also tried to compile with "-Duse_libSMA", and the build
fails at link time,
with unresolved "mpp_malloc_" in mpp_domains.o
It shouldn't be this hard to compile this model, should it?
Thanks, Will
Logged In: YES
user_id=149024
I'm not sure why your MPI run is failing. I'd need more
details, e.g a traceback.
But I do know why changing MPI->SMA fails to link correctly.
For some reason, the file MPP.mod is not getting overwritten
in the new compile.
I've noticed this problem on some systems, but don't have an
explanation.
Yes, it shouldn't be this hard to make this damn thing compile!
Logged In: YES
user_id=800011
Hi,
I just upgraded the IRIX MPT to version 1.8, and changed
some of the
module definitions. I tried running again, and failed, but
received the
following traceback (see below).
If it's any easier, I have no problems trying to get this to
run using
shmem...
Thanks - Will
----------
set run = mpirun -v -np 4
mpirun -v -np 4 fms.exe
MPI: libxmpi.so 'SGI MPI 4.3 MPT 1.8 06/06/03 11:47:23'
Job Limits not enabled: Job not found or not part of job
MPI: libmpi.so 'SGI MPI 4.3 MPT 1.8 06/06/03 11:47:12
(64_M4)'
MPI: MPI_MSGS_MAX = 524288
MPI: MPI_BUFS_PER_PROC= 32
MPI: Program fms.exe, Rank 0, Process 45409 received signal
SIGSEGV(11)
MPI: --------stack traceback-------
45409(9):
0xc6c2b50[MPI_SGI_stacktraceback]
0xc6c2f98[first_arriver_handler]
0xc6c3228[slave_sig_handler]
0xd83fb48[flush_]
0x10032f0c[STDLOG.in.MPP_MOD]
0x10032920[MPP_INIT.in.MPP_MOD]
0x10027894[FMS_INIT.in.FMS_MOD]
0x101837d0[MAIN__]
0xcc4fc24[main]
MPI: About to execute: (echo "set \$stacktracelimit=20;
where; quit") | dbx -p 45409 | sed -e 's/^/MPI: /'
MPI: dbx version 7.3.1 68542_Oct26 MR Oct 26 2000 17:50:34
MPI: Process 45409 (fms.exe) stopped at [__waitsys:24
+0x8,0x4184208]
MPI: Source (of
/xlv47/6.5.20f/work/irix/lib/libc/libc_64_M4/proc/waitsys.s)
not available for Process 45409
MPI: > 0 __waitsys(0x0, 0xb14e, 0xffffff9070, 0x3, 0x0,
0x41a3e48, 0x41a3c9c, 0x1)
["/xlv47/6.5.20f/work/irix/lib/libc/libc_64_M4/proc/waitsys.s":24,
0x4184208]
MPI: 1 _system(0xffffff9150, 0xb14e, 0xffffff9070, 0x3,
0x0, 0x41a3e48, 0x41a3c9c, 0x1)
["/xlv47/6.5.20f/work/irix/lib/libc/libc_64_M4/stdio/system.c":116,
0x41a1080]
MPI: 2 MPI_SGI_stacktraceback(0x0, 0xb14e, 0xffffff9070,
0x3, 0x0, 0x41a3e48, 0x41a3c9c, 0x1)
["/xlv4/mpt/1.8/mpi/work/4.3/lib/libmpi/libmpi_64_M4/adi/sig.c":242,
0xc6c2cb8]
MPI: 3 first_arriver_handler(0xb, 0x470a300,
0xffffff9070, 0x3, 0x0, 0x41a3e48, 0x41a3c9c, 0x1)
["/xlv4/mpt/1.8/mpi/work/4.3/lib/libmpi/libmpi_64_M4/adi/sig.c":445,
0xc6c2f98]
MPI: 4 slave_sig_handler(0xb, 0xb14e, 0xffffff9070, 0x3,
0x0, 0x41a3e48, 0x41a3c9c, 0x1)
["/xlv4/mpt/1.8/mpi/work/4.3/lib/libmpi/libmpi_64_M4/adi/sig.c":528,
0xc6c3228]
MPI: 5 _sigtramp(0xffffffff8000000b, 0xb14e,
0xffffff9a00, 0x3, 0x0, 0x41a3e48, 0x41a3c9c, 0x1)
["/xlv47/6.5.20f/work/irix/lib/libc/libc_64_M4/signal/sigtramp.s":71,
0x419ac7c]
MPI: 6 flush_(0x0, 0x3fdf8, 0x51, 0x0, 0x80, 0x1285ce80,
0x4260858, 0x1)
["/j7/mtibuild/v74/workarea/v7.4/libf/fio/f77wrappers.c":188,
0xd83fb48]
MPI: 7 STDLOG(0x0, 0x8a, 0x51, 0x0, 0x80, 0x1285ce80,
0x4260858, 0x1)
["/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90":1215,
0x10032f0c]
MPI: More (n if no)? 8 MPP_INIT(0x0, 0x8a, 0x51, 0x0,
0x80, 0x1285ce80, 0x4260858, 0x1)
["/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/mpp/mpp.F90":1159,
0x10032920]
MPI: 9 FMS_INIT(0x0, 0x8a, 0x51, 0x0, 0x80, 0x1285ce80,
0x4260858, 0x1)
["/home/bradman/will/software/models/FMS/solo_example/fms_src/shared/fms/fms.f90":302,
0x10027894]
MPI: 10 atmos_model(0x0, 0x8a, 0x51, 0x0, 0x80,
0x1285ce80, 0x4260858, 0x1)
["/home/bradman/will/software/models/FMS/solo_example/fms_src/atmos_solo/atmos_model.f90":94,
0x101837d0]
MPI: 11 main(0x0, 0x8a, 0x51, 0x0, 0x80, 0x1285ce80,
0x4260858, 0x1)
["/j7/mtibuild/v74/workarea/v7.4/libF77/main.c":101,
0xcc4fc24]
MPI: 12 __start()
["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_64_M4/csu/crt1text.s":177,
0x10022098]
MPI: -----stack traceback ends-----
MPI: Program fms.exe, Rank 0, Process 45409: Dumping core on
signal SIGSEGV(11) into directory /tmp
MPI: MPI_COMM_WORLD rank 0 has terminated without calling
MPI_Finalize()
MPI: aborting job
MPI: Received signal 11
if ( 1 != 0 ) then
set cores_dumped = 0
foreach corefile ( core* )
echo cvdump of core file core.45409
cvdump fms.exe core.45409
[1] 45393
set cores_dumped = `expr $cores_dumped + 1`
expr 0 + 1
end
unset echo
FATAL: Connection to PCS server failed.
error reading descendent process status: Error 0
[1] Exit 1 cvdump fms.exe core.45409 >>
$corefile.out
ERROR: in mpirun, core dumped: run 1,
loop 1
No match
No match
No match
No match
No match
No match
No match
No match
ERROR: in mpirun, core dumped: run 1,
loop 1
ERROR: Any output that may have been generated
is in
/home/bradman/will/software/models/FMS/solo_example/output_spectral_crash
Logged In: YES
user_id=149024
There is a known issue with the system call FLUSH() which
appears in the traceback. As of compiler release 7.4, this
call has a REQUIRED second argument for an error code, which
was an optional argument in the earlier release. We have
corrected the code but you may not yet have this fix.
I'm guessing this is the problem since FLUSH() appears in
your traceback.
But it doesn't explain why the code works with SHMEM, nor
the obscure error message:
Job Limits not enabled: Job not found or not part of job
Logged In: YES
user_id=800011
My compilers here are still 7.3 (7.3.1.2 to be exact), the
FMS release I'm
using is Havana (Oct 2002?), and my version of MPI is 4.3.
Does this
seem like it should be a problem to you?
As for the job limit message, it could be that the SGI
implementation of MPI
checks for jlimits or miser or something like that, which I
do not use on the
systems here...
Thanks, Will