Menu

nmon 15h segfault with workaround patch

Help
2016-01-07
2016-01-13
  • Claudio Fahey

    Claudio Fahey - 2016-01-07

    Running nmon 15h on a Linux system running Hadoop terasort causes it to segfault. I suspect this is due to the large number of processes starting and stopping during terasort. gdb shows that the crash occurs in the COUNTDELTA() macro. As this statement doesn't seem critical (for my purposes anyway), simply removing it allows nmon to run reliably. Here is the patch.

    @@ -6362,7 +6362,7 @@ fprintf(fp,"VM,Paging and Virtual Memory,nr_dirty,nr_writeback,nr_unstable,nr_pa
                                                                       TIMEDELTA(pi_stime,i,j);
                                            topper[max_sorted].size =  p->procs[i].statm_resident;
                                            if(isroot)
    -                                           topper[max_sorted].io =  COUNTDELTA(read_io) + COUNTDELTA(write_io);
    +                                           topper[max_sorted].io = 0;
    
                                            max_sorted++;
                                            break;
    
     
  • Nigel Griffiths

    Nigel Griffiths - 2016-01-08

    Hi,
    Interesting problem but I can't see your fix as a real solution :-)
    Like you suggest it may be due to the rapidly changing process list.
    I would like to fix this.
    Can you save an nmon file when the machine is busy to investigate the issue for me?
    Perhaps: nmon -fT -s -c 600
    If using nmon16b then the extra CPU stats from the -U option would helpful.

    Cheers, Nigel

     
  • Darroch Royden

    Darroch Royden - 2016-01-11

    hello,

    I'm also seeing this issue (albiet with RHEL6.7 on x86_64).

    Getting it to core gives:

    Core was generated by `nmon -f -t -r Test1 -s30 -c120'.
    Program terminated with signal 11, Segmentation fault.
    #0  0x000000000040fb86 in main (argc=<value optimized out>, argv=<value optimized out>) at lmon15h.c:6365
    6365    lmon15h.c: No such file or directory.
            in lmon15h.c
    

    The workaround above is the same I came up with. I have run your requested arguments on the same version, which also segfaulted but managed to get 150 samples first.

    Would you like the core file aswell?

     
  • Claudio Fahey

    Claudio Fahey - 2016-01-13

    I have attached the nmon output file and the entire abrt folder than contains the core dump. Thanks for looking into this!

     

Log in to post a comment.