16n: segfault in libc when writing to file on RHEL8 with NFSv4
Linux performance monitoring on-screen or to CSV file
Brought to you by:
nigelgriffiths
I have a lab of ~50 RHEL8 VMs which all run nmon via cron to record data to files. On a few of these systems, nmon is now segfaulting. It seems to be related to NFSv4 mounts because 1) on systems with NO NFSv4 mounts, nmon runs fine, and 2) if I remove "-N" from the options it also runs fine. Oddly, even after I unmount the NFSv4 volume, nmon still crashes.
I tried compiling/running from the latest source code, which reports itself as 16n, but it still segfaults on systems which have previously mounted NFSv4 filesystems.
command-line:
/usr/local/bin/nmon -NT -s 30 -c 10 -F /var/log/nmon/$(hostname)$(date "+%F%T").nmon
example segfault message:
Feb 21 13:55:03 myhostname kernel: nmon[3145175]: segfault at 43 ip 00007fa45c8cdfd5 sp 00007ffe6b9d4b98 error 4 in libc-2.28.so[7fa45c801000+1bc000]
Hi Robert,
Thanks for reporting this.
Is it possible to send me the nmon file up to the point it crashed?
Also are you mounting and unmounting NFS v4 file-systems while nmon is running?
To debug the problem code, we need a stack trace to identify the nmon code calling libc.
Assuming you have gdb available and the core file, source code + binary in the current directory.
If compiling yourself include -g and may be don't optimize by removing the -O option.
$ gdb nmon
GNU gdb verion xxxxxxxx
GDB Information here
For help, type "help".
Reading symbols from nmon...done.
(gdb) run [[YOUR NMON COMMAND LINE HERE]]
Starting program: /home/nag/nmon xxxxxxx
. . .
. . .
Program received signal SIGSEGV, Segmentation fault.
main (argc=<optimised out="">, argv=0x7fffffffe4d8) at nmon.c:2247
2247 *crashptr = 42;</optimised>
(gdb) where full
0 main (argc=<optimised out="">, argv=0x7fffffffe4d8) at nmon.c:2247</optimised>
. . .
- lots of detail sand variables here
. . .
(gdb) quit
Copy all the gdb output lines and send to me.
Thanks Nigel
odd, nmon did not crash when run via gdb. I was however able to recompile with -g (and without -03) and it did still segfault when run normally. Then I ran gdb with the binary and coredump to generate the following output
also: I am not mounting/unmounting while nmon is running.
Last edit: Robert Jacobson 2023-02-21
was that gdb output helpful? Is there any other way I can help solve this?