Problem running LMON on ANL's Cobalt-based Intrepid BGP
Brought to you by:
dongahn
Hi Dong,
I set the env var... The test went further but failed. Seems like mpirun did not go through. Im sending nohup.out. It would be great if you could have a look at it. I cant pin point the problem but I think there is some partition related problem.
I'm also attaching log files generated from a normal mpi hello world run.
I ran the local tests, the model checkers and they seem to work fine. So I dont think its an installation issue. I thought the problem maybe because I'm specifying the wrong rm but that doesn't seem to be it. I tried installing lmon with --with-rm=slurm and the configure step itself failed. I guess we're not running things correctly.
--
Thanks,
Divya
Hi Divya and Ray,
>
> I suspect that the mpirun options that fe_launch_smoketest.cxx are
> setting are insufficient under the Cobalt scheduler. I'm coping an ANL
>
> collaborator (Ray Loy) here as he might be able provide some
> information
> you would need.
>
> Ray:
> The software piece Divia is using, LaunchMON, requires the same debug
>
> hook and environment as TotalView. How does your users currently use
> totalview under Cobalt? In particular, how does totalview interface
> with
> mpirun under Cobalt?
>
> Best,
> Dong
>
Divya,
Are you trying to run this here at ALCF or somewhere else with Cobalt?
And do you need to run interactively or is a scripted run sufficient?
Ray