Running nmon 15h on a Linux system running Hadoop terasort causes it to segfault. I suspect this is due to the large number of processes starting and stopping during terasort. gdb shows that the crash occurs in the COUNTDELTA() macro. As this statement doesn't seem critical (for my purposes anyway), simply removing it allows nmon to run reliably. Here is the patch.
Hi,
Interesting problem but I can't see your fix as a real solution :-)
Like you suggest it may be due to the rapidly changing process list.
I would like to fix this.
Can you save an nmon file when the machine is busy to investigate the issue for me?
Perhaps: nmon -fT -s -c 600
If using nmon16b then the extra CPU stats from the -U option would helpful.
Cheers, Nigel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm also seeing this issue (albiet with RHEL6.7 on x86_64).
Getting it to core gives:
Core was generated by `nmon -f -t -r Test1 -s30 -c120'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000040fb86 in main (argc=<value optimized out>, argv=<value optimized out>) at lmon15h.c:6365
6365 lmon15h.c: No such file or directory.
in lmon15h.c
The workaround above is the same I came up with. I have run your requested arguments on the same version, which also segfaulted but managed to get 150 samples first.
Running nmon 15h on a Linux system running Hadoop terasort causes it to segfault. I suspect this is due to the large number of processes starting and stopping during terasort. gdb shows that the crash occurs in the COUNTDELTA() macro. As this statement doesn't seem critical (for my purposes anyway), simply removing it allows nmon to run reliably. Here is the patch.
Hi,
Interesting problem but I can't see your fix as a real solution :-)
Like you suggest it may be due to the rapidly changing process list.
I would like to fix this.
Can you save an nmon file when the machine is busy to investigate the issue for me?
Perhaps: nmon -fT -s -c 600
If using nmon16b then the extra CPU stats from the -U option would helpful.
Cheers, Nigel
hello,
I'm also seeing this issue (albiet with RHEL6.7 on x86_64).
Getting it to core gives:
The workaround above is the same I came up with. I have run your requested arguments on the same version, which also segfaulted but managed to get 150 samples first.
Would you like the core file aswell?
I have attached the nmon output file and the entire abrt folder than contains the core dump. Thanks for looking into this!