Menu

#22 16n: segfault in libc when writing to file on RHEL8 with NFSv4

v1.0 (example)
open
nobody
None
5
2023-03-27
2023-02-21
No

I have a lab of ~50 RHEL8 VMs which all run nmon via cron to record data to files. On a few of these systems, nmon is now segfaulting. It seems to be related to NFSv4 mounts because 1) on systems with NO NFSv4 mounts, nmon runs fine, and 2) if I remove "-N" from the options it also runs fine. Oddly, even after I unmount the NFSv4 volume, nmon still crashes.

I tried compiling/running from the latest source code, which reports itself as 16n, but it still segfaults on systems which have previously mounted NFSv4 filesystems.

command-line:
/usr/local/bin/nmon -NT -s 30 -c 10 -F /var/log/nmon/$(hostname)$(date "+%F%T").nmon

1 Attachments

Discussion

  • Robert Jacobson

    Robert Jacobson - 2023-02-21

    example segfault message:

    Feb 21 13:55:03 myhostname kernel: nmon[3145175]: segfault at 43 ip 00007fa45c8cdfd5 sp 00007ffe6b9d4b98 error 4 in libc-2.28.so[7fa45c801000+1bc000]

     
  • Nigel Griffiths

    Nigel Griffiths - 2023-02-21

    Hi Robert,
    Thanks for reporting this.
    Is it possible to send me the nmon file up to the point it crashed?

    Also are you mounting and unmounting NFS v4 file-systems while nmon is running?

    To debug the problem code, we need a stack trace to identify the nmon code calling libc.

    Assuming you have gdb available and the core file, source code + binary in the current directory.
    If compiling yourself include -g and may be don't optimize by removing the -O option.

    $ gdb nmon
    GNU gdb verion xxxxxxxx
    GDB Information here

    For help, type "help".
    Reading symbols from nmon...done.
    (gdb) run [[YOUR NMON COMMAND LINE HERE]]
    Starting program: /home/nag/nmon xxxxxxx
    . . .
    . . .
    Program received signal SIGSEGV, Segmentation fault.
    main (argc=<optimised out="">, argv=0x7fffffffe4d8) at nmon.c:2247
    2247 *crashptr = 42;</optimised>

    (gdb) where full

    0 main (argc=<optimised out="">, argv=0x7fffffffe4d8) at nmon.c:2247</optimised>

    . . .
    - lots of detail sand variables here
    . . .
    (gdb) quit

    Copy all the gdb output lines and send to me.
    Thanks Nigel

     
  • Robert Jacobson

    Robert Jacobson - 2023-02-21

    odd, nmon did not crash when run via gdb. I was however able to recompile with -g (and without -03) and it did still segfault when run normally. Then I ran gdb with the binary and coredump to generate the following output

    also: I am not mounting/unmounting while nmon is running.

    # gdb ./nmon_x86_rhel75 coredump
    GNU gdb (GDB) Red Hat Enterprise Linux 8.2-19.el8
    Copyright (C) 2018 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    Type "show copying" and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
        <http://www.gnu.org/software/gdb/documentation/>.
    
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from ./nmon_x86_rhel75...done.
    [New LWP 653659]
    Core was generated by `/home/teridon/nmon/nmon_x86_rhel75 -NT -s 30 -c 10 -F /var/log/nmon/gdb_test.nm'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
    62              VPCMPEQ (%rdi), %ymm0, %ymm1
    (gdb) where full
    #0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
    No locals.
    #1  0x00007fae97edec0f in _IO_vfprintf_internal (s=0x18f4070, format=<optimized out>, ap=ap@entry=0x7ffc3e9d38f0) at vfprintf.c:1638
            len = <optimized out>
            string_malloced = 0
            step0_jumps = {0, 3717, 3277, 3173, 4565, 3053, 5181, 4149, 3805, 5061, 4669, 3469, 4861, 4549, 3661, 4789, 3781, 4765, 3381, 2077, 1453, 1237,
              2317, 1741, 1685, 805, 1821, 445, 449, 4957}
            space = 0
            is_short = 0
            use_outdigits = 0
            outc = <optimized out>
            step1_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, 5061, 4669, 3469, 4861, 4549, 3661, 4789, 3781, 4765, 3381, 2077, 1453, 1237, 2317, 1741, 1685, 805,
              1821, 445, 449, 0}
            group = 0
            prec = <optimized out>
            step2_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4669, 3469, 4861, 4549, 3661, 4789, 3781, 4765, 3381, 2077, 1453, 1237, 2317, 1741, 1685, 805, 1821,
              445, 449, 0}
            string = 0x100 <error: Cannot access memory at address 0x100>
            left = 0
            is_long_double = <optimized out>
            width = 0
            signed_number = <optimized out>
            step3a_jumps = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3565, 0, 0, 0, 3661, 4789, 3781, 4765, 3381, 0, 0, 0, 0, 1741, 0, 0, 0, 0, 0, 0}
            alt = 0
            showsign = 0
            is_long = <optimized out>
            is_char = <optimized out>
            pad = 32 ' '
            step3b_jumps = {0 <repeats 11 times>, 4861, 0, 0, 3661, 4789, 3781, 4765, 3381, 2077, 1453, 1237, 2317, 1741, 1685, 805, 1821, 0, 0, 0}
            step4_jumps = {0 <repeats 14 times>, 3661, 4789, 3781, 4765, 3381, 2077, 1453, 1237, 2317, 1741, 1685, 805, 1821, 0, 0, 0}
            args_value = <optimized out>
            is_negative = <optimized out>
            number = <optimized out>
            base = <optimized out>
            the_arg = {pa_wchar = 0 L'\000', pa_int = 0, pa_long_int = 0, pa_long_long_int = 0, pa_u_int = 0, pa_u_long_int = 0, pa_u_long_long_int = 0,
              pa_double = 0, pa_long_double = 0, pa_string = 0x0, pa_wstring = 0x0, pa_pointer = 0x0, pa_user = 0x0}
            spec = 115 's'
            _buffer = {__routine = 0x7fae97efb1f0 <__funlockfile>, __arg = 0x18f4070, __canceltype = 26169968, __prev = 0x630aa5b0235da00}
            _avail = <optimized out>
            thousands_sep = 0x0
            grouping = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>
            done = 1
            f = 0x42446e "s"
            lead_str_end = <optimized out>
            end_of_spec = <optimized out>
            work_buffer = " \000\000\000\060\000\000\000\240\065\235>\374\177\000\000\340\064\235>\374\177\000\000\000\332\065\002[\252\060\006\220j\217\001\000\000\000\000\220j\217\001\000\000\000\000\034\064B\000\000\000\000\000{5B\000\000\000\000\000J \215\001", '\000' <repeats 12 times>, "h\r\000\000\000\000\000\000\035\000\000\000\000\000\000\000P\243\360\227\256\177\000\000\000\332\065\002[\252\060\006\034\064B\000\000\000\000\000\220j\217\001", '\000' <repeats 12 times>, "{5B\000\000\000\000\000\034\064B\000\000\000\000\000\001\000\000\000\000\000\000\000\260\345\000\000\000\000\000\000\322\333\177\000\000\252\001\000\000\000\000\000\000\354\"@", '\000' <repeats 13 times>, "\260\065"...
            workstart = 0x0
            workend = <optimized out>
            ap_save = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7ffc3e9d39d0, reg_save_area = 0x7ffc3e9d3910}}
            nspecs_done = <optimized out>
            save_errno = <optimized out>
            readonly_format = <optimized out>
            __PRETTY_FUNCTION__ = "_IO_vfprintf_internal"
            __result = <optimized out>
    #2  0x00007fae97ee5928 in __fprintf (stream=<optimized out>, format=<optimized out>) at fprintf.c:32
            arg = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x7ffc3e9d39d0, reg_save_area = 0x7ffc3e9d3910}}
            done = <optimized out>
    #3  0x00000000004184b3 in main (argc=8, argv=0x7ffc3e9d42f8) at lmon.c:7097
            secs = 0
            cpu_idle = 235
            cpu_user = 6
            cpu_sys = 8
            cpu_wait = 10
            cpu_steal = 0
            current_procs = 407
    --Type <RET> for more, q to quit, c to continue without paging--c
            adjusted_procs = 535
            n = 0
            i = 60
            j = 0
            k = 48
            ret = 1
            max_sorted = 0
            skipped = 0
            x = 8
            y = 0
            elapsed = 1.3074131011962891
            cpu_sum = 259
            ftmp = 6.9360819630765096e-310
            top_first_time = 1
            disk_first_time = 1
            nfs_first_time = 1
            vm_first_time = 0
            bbbr_line = 0
            cpu_busy = 0
            smp_first_time = 0
            wide_first_time = 1
            proc_first_time = 0
            first_key_pressed = 0
            childpid = 0
            ralfmode = 0
            pgrp = "\020\256\177\000\000@bØ\256\177\000\000\020\256\177\000\000\302lj\276\277\035\t"
            tim = 0x7fae9823b740 <_tmbuf>
            total_busy = 0
            total_rbytes = 0
            total_wbytes = 0
            total_xfers = 0
            uts = {sysname = "Linux", '\000' <repeats 59 times>, nodename = "gs483-cobbler", '\000' <repeats 51 times>, release = "4.18.0-425.3.1.el8.x86_64", '\000' <repeats 39 times>, version = "#1 SMP Fri Sep 30 11:45:06 EDT 2022", '\000' <repeats 29 times>, machine = "x86_64", '\000' <repeats 58 times>, __domainname = "(none)", '\000' <repeats 58 times>}
            top_disk_busy = 0
            top_disk_name = 0x427acd ""
            disk_mb = 0
            disk_total = 6.9360820715891356e-310
            disk_busy = 6.9525589178838084e-310
            disk_read = 0
            disk_read_tmp = 2.1219957904712067e-314
            disk_write = 0
            disk_write_tmp = 0
            disk_size = 0
            disk_xfers = 0
            total_disk_read = 6.9360820716345896e-310
            total_disk_write = 6.9525589107827017e-310
            total_disk_xfers = 6.9525589107819112e-310
            readers = 0
            writers = 6.9360820715891356e-310
            str_p = 0x42a828 "PROC,%s,%.0f,%.0f,%.1f,%.1f,%.1f,%.1f,%.1f,%.1f,%.1f,%.1f\n"
            varperftmp = 0
            formatstring = 0x0
            open_filename = 0x18d5620 ""
            user_filename = 0x18d22a0 "/var/log/nmon/gdb_test.nmon"
            user_filename_set = 1 '\001'
            using_stdout = 0 '\000'
            statfs_buffer = {f_type = 140387864040824, f_bsize = 140387864037936, f_blocks = 140387861752168, f_bfree = 140387861807233, f_bavail = 4, f_files = 140387863975624, f_ffree = 9, f_fsid = {__val = {0, 0}}, f_namelen = 1, f_frsize = 140387864037936, f_flags = 140721358978624, f_spare = {140721358978448, 140721358978464, 140387864040824, 0}}
            fs_size = -5.07476745e-24
            fs_bsize = -4.19411508e-24
            fs_free = 4.58028416e-41
            fs_size_used = -1.65308081e-24
            cmdstr = "NMONCMD63\000\242\230\256\177\000\000\310q\237>\374\177\000\000\240\"Ę\256\177\000\000\324<\235>\374\177\000\000\330<\235>\374\177\000\000\001\000\000\000\000\000\000\000\256\067\242\230\256\177\000\000\306\000\000\000\000\000\000\000`~藮\177\000\000\240\"Ę\256\177\000\000\330<\235>\374\177\000\000\324<\235>\374\177", '\000' <repeats 27 times>, "\240\177\000\000hi\241\230\256\177\000\000_\232\177g\000\000\000\000\360d\241\230\256\177\000\000`~ 藮\177\000\000i\376\235\001\000\000\000\000\240=\235>\374\177\000\000\220=\235>\374\177\000\000\320\256\177\000\000\a", '\000' <repeats 23 times>...
            updays = 0
            uphours = 0
            upmins = 140387861884855
            v2c_total = 4.58028416e-41
            v2s_total = -5.07534174e-24
            v3c_total = 4.58028416e-41
            v3s_total = -5.07570461e-24
            v4c_total = 4.59121429e-41
            v4s_total = 0.307145596
            errors = 0
            nmon_start = 0x0
            nmon_end = 0x0
            nmon_snap = 0x0
            nmon_tmp = 0x0
            nmon_one_in = 1
            time_stamp_type = 0
            ticks = 100
            pagesize = 4096
            average = 4.58028416e-41
            nmon_tv = {tv_sec = 140721358978148, tv_usec = 0}
            nmon_start_time = 0
            nmon_end_time = 0
            nmon_run_time = -1
            seconds_over = 0
            mhz = 4.59121429e-41
            min_mhz = 0
            max_mhz = 5.49330522e-38
            avg_mhz = 0
            topsize = 140721358979584
            topsize_ch = 0 '\000'
            toprset = 140387851160903
            toprset_ch = 0 '\000'
            toptrs = 0
            toptrs_ch = 0 '\000'
            topdrs = 0
            topdrs_ch = 0 '\000'
            toplrs = 140721358979224
            toplrs_ch = 0 '\000'
            topshare = 140721359123200
            topshare_ch = 0 '\000'
            toprio = 140387861804974
            toprio_ch = 0 '\000'
            topwio = 0
            topwio_ch = 0 '\000'
            tmpslab = 136143
            slabstr = 0x423abb "nr_slab_reclaimable"
            truncated_command = "\370\256\177\000\000{(Ǘ\256\177\000\000\060\256\177\000\000\000\000\000\000\000\000\000\000\020;\235>\374\177\000\000\000\004\000\000\000\000\000\000\240\"Ę\256\177\000\000\060\256\177", '\000' <repeats 122 times>, "\274\063\242\230\256\177\000\000\000\000\000\000\000\000\000\000\300\256\177"...
    (gdb)
    
     

    Last edit: Robert Jacobson 2023-02-21
  • Robert Jacobson

    Robert Jacobson - 2023-03-27

    was that gdb output helpful? Is there any other way I can help solve this?

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.