|
From: John K. <Joh...@be...> - 2020-01-30 21:26:16
|
Hi, I am receiving a valgrind Internal error upon startup of a multi-threaded application. It appears to be related to the option track_origins as I was not seeing the issue without the track_origins option. I am using the following valgrind options: --trace-children=yes and --track-origins=yes. If you have any ideas on how I can workaround this issue or fix it, I would appreciate it. According to valgrind, the 'impossible' happened. See below for the valgrind output associated with this error. Thanks, John Knight Joh...@be... Here is the valgrind output showing the internal error: ==24393== Memcheck, a memory error detector ==24393== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==24393== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==24393== Command: /bin/touch /var/tmp/tr069paready ==24393== --24393:0: aspacem <<< SHOW_SEGMENTS: out_of_memory (36 segments) --24393:0: aspacem 4 segment names in 4 slots --24393:0: aspacem freelist is empty --24393:0: aspacem (0,4,5) /lib/valgrind/memcheck-arm-linux --24393:0: aspacem (1,41,3) /bin/busybox --24393:0: aspacem (2,58,3) /lib/ld-uClibc-1.0.14.so --24393:0: aspacem (3,87,1) /tmp/vgdb-pipe-shared-mem-vgdb-24393-by-root-on-??? --24393:0: aspacem 0: RSVN 0000000000-000000ffff 65536 ----- SmFixed --24393:0: aspacem 1: file 0000010000-00000a5fff 614400 r-x-- d=0x00b i=2042 o=0 (1,41) --24393:0: aspacem 2: RSVN 00000a6000-00000b5fff 65536 ----- SmFixed --24393:0: aspacem 3: file 00000b6000-00000b6fff 4096 rw--- d=0x00b i=2042 o=614400 (1,41) --24393:0: aspacem 4: anon 00000b7000-00000b7fff 4096 rw--- --24393:0: aspacem 5: RSVN 00000b8000-0003ffffff 63m ----- SmFixed --24393:0: aspacem 6: file 0004000000-0004005fff 24576 r-xT- d=0x00b i=942 o=0 (2,58) --24393:0: aspacem 7: 0004006000-0004015fff 65536 --24393:0: aspacem 8: file 0004016000-0004017fff 8192 rw--- d=0x00b i=942 o=24576 (2,58) --24393:0: aspacem 9: anon 0004018000-0004018fff 4096 rwx-- --24393:0: aspacem 10: RSVN 0004019000-0004817fff 8384512 ----- SmLower --24393:0: aspacem 11: 0004818000-0057ffffff 1335m --24393:0: aspacem 12: FILE 0058000000-005809afff 634880 r-x-- d=0x00b i=1150 o=0 (0,4) --24393:0: aspacem 13: file 005809b000-005809cfff 8192 r-x-- d=0x00b i=1150 o=634880 (0,4) --24393:0: aspacem 14: FILE 005809d000-00581bbfff 1175552 r-x-- d=0x00b i=1150 o=643072 (0,4) --24393:0: aspacem 15: 00581bc000-00581cafff 61440 --24393:0: aspacem 16: FILE 00581cb000-00581ccfff 8192 rw--- d=0x00b i=1150 o=1814528 (0,4) --24393:0: aspacem 17: ANON 00581cd000-0058b38fff 9879552 rw--- --24393:0: aspacem 18: 0058b39000-00617a8fff 140m --24393:0: aspacem 19: RSVN 00617a9000-00617a9fff 4096 ----- SmFixed --24393:0: aspacem 20: ANON 00617aa000-0067950fff 97m rwx-- --24393:0: aspacem 21: ANON 0067951000-0067952fff 8192 ----- --24393:0: aspacem 22: ANON 0067953000-0067a52fff 1048576 rwx-- --24393:0: aspacem 23: ANON 0067a53000-0067a54fff 8192 ----- --24393:0: aspacem 24: 0067a55000-0067a57fff 12288 --24393:0: aspacem 25: FILE 0067a58000-0067a58fff 4096 rw--- d=0x00e i=69865 o=0 (3,87) --24393:0: aspacem 26: 0067a59000-00b6f0efff 1268m --24393:0: aspacem 27: ANON 00b6f0f000-00b6f0ffff 4096 r-x-- --24393:0: aspacem 28: 00b6f10000-00bd752fff 104m --24393:0: aspacem 29: RSVN 00bd753000-00bdf51fff 8384512 ----- SmUpper --24393:0: aspacem 30: anon 00bdf52000-00bdf52fff 4096 rw--- --24393:0: aspacem 31: 00bdf53000-00bef31fff 15m --24393:0: aspacem 32: ANON 00bef32000-00bef52fff 135168 rw--- --24393:0: aspacem 33: RSVN 00bef53000-00fffeffff 1040m ----- SmFixed --24393:0: aspacem 34: anon 00ffff0000-00ffff0fff 4096 r-x-- --24393:0: aspacem 35: RSVN 00ffff1000-00ffffffff 61440 ----- SmFixed --24393:0: aspacem >>> --24393-- core : 8,388,608/ 8,388,608 max/curr mmap'd, 0/0 unsplit/split sb unmmap'd, 2,915,072/ 2,909,656 max/curr, 913/ 2957424 totalloc-blocks/bytes, 908 searches 4 rzB --24393-- dinfo : 1,048,576/ 1,048,576 max/curr mmap'd, 0/0 unsplit/split sb unmmap'd, 378,840/ 104,992 max/curr, 1020/ 592440 totalloc-blocks/bytes, 1026 searches 4 rzB --24393-- client : 0/ 0 max/curr mmap'd, 0/0 unsplit/split sb unmmap'd, 0/ 0 max/curr, 0/ 0 totalloc-blocks/bytes, 0 searches 20 rzB --24393-- demangle: 65,536/ 65,536 max/curr mmap'd, 0/0 unsplit/split sb unmmap'd, 120/ 88 max/curr, 8/ 200 totalloc-blocks/bytes, 7 searches 4 rzB --24393-- ttaux : 0/ 0 max/curr mmap'd, 0/0 unsplit/split sb unmmap'd, 0/ 0 max/curr, 0/ 0 totalloc-blocks/bytes, 0 searches 4 rzB --24393-- translate: fast new/die SP updates identified: 0 (0.0%)/0 (0.0%) --24393-- translate: generic_known new/die SP updates identified: 2 (100.0%)/0 (0.0%) --24393-- translate: generic_unknown SP updates identified: 0 (0.0%) --24393-- translate: PX: SPonly 0, UnwRegs 1, AllRegs 0, AllRegsAllInsns 0 --24393-- tt/tc: 2 tt lookups requiring 0 probes --24393-- tt/tc: 0 fast-cache updates, 1 flushes --24393-- transtab: new 1 (68 -> 1,204; ratio 17.7) [0 scs] avg tce size 1204 --24393-- transtab: dumped 0 (0 -> ??) (sectors recycled 0) --24393-- transtab: discarded 0 (0 -> ??) --24393-- scheduler: 0 event checks. --24393-- scheduler: 0 indir transfers, 0 misses (1 in 0) .. --24393-- scheduler: .. of which: 0 hit0, 0 hit1, 0 hit2, 0 hit3, 0 missed --24393-- scheduler: 0/1 major/minor sched events. --24393-- sanity: 0 cheap, 0 expensive checks. --24393-- exectx: 769 lists, 4 contexts (avg 0.01 per list) (avg 1.00 IP per context) --24393-- exectx: 5 searches, 1 full compares (200 per 1000) --24393-- exectx: 0 cmp2, 0 cmp4, 0 cmpAll --24393-- errormgr: 0 supplist searches, 0 comparisons during search --24393-- errormgr: 0 errlist searches, 0 comparisons during search --24393-- memcheck: freelist: vol 0 length 0 --24393-- memcheck: sanity checks: 0 cheap, 1 expensive --24393-- memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use --24393-- memcheck: auxmaps_L1: 0 searches, 0 cmps, ratio 0:10 --24393-- memcheck: auxmaps_L2: 0 searches, 0 nodes --24393-- memcheck: SMs: n_issued = 7 (112k, 0M) --24393-- memcheck: SMs: n_deissued = 0 (0k, 0M) --24393-- memcheck: SMs: max_noaccess = 65535 (1048560k, 1023M) --24393-- memcheck: SMs: max_undefined = 0 (0k, 0M) --24393-- memcheck: SMs: max_defined = 9 (144k, 0M) --24393-- memcheck: SMs: max_non_DSM = 7 (112k, 0M) --24393-- memcheck: max sec V bit nodes: 0 (0k, 0M) --24393-- memcheck: set_sec_vbits8 calls: 0 (new: 0, updates: 0) --24393-- memcheck: max shadow mem size: 416k, 0M --24393-- ocacheL1: 171,981 refs 21,120 misses (0 lossage) --24393-- ocacheL1: 150,861 at 0 0 at 1 --24393-- ocacheL1: 0 at 2+ 21,120 move-fwds --24393-- ocacheL1: 92,274,688 sizeB 67,108,864 useful --24393-- ocacheL2: 21,120 refs 21,120 misses --24393-- ocacheL2: 0 max nodes 0 curr nodes --24393-- niacache: 0 refs 0 misses host stacktrace: ==24393== at 0x5803684C: ??? (in /lib/valgrind/memcheck-arm-linux) sched status: running_tid=1 --24393-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --24393-- si_code=1; Faulting address: 0xA0; sp: 0x67a529a0 valgrind: the 'impossible' happened: Killed by fatal signal host stacktrace: ==24393== at 0x58080344: ??? (in /lib/valgrind/memcheck-arm-linux) sched status: running_tid=1 Segmentation fault /bin/busybox: can't resolve symbol '__libc_freeres' Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 __________________________________________________________________ Confidential This e-mail and any files transmitted with it are the property of Belkin International, Inc. and/or its affiliates, are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipients or otherwise have reason to believe that you have received this e-mail in error, please notify the sender and delete this message immediately from your computer. Any other use, retention, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. Pour la version fran?aise: http://www.belkin.com/email-notice/French.html F?r die deutsche ?bersetzung: http://www.belkin.com/email-notice/German.html __________________________________________________________________ |
|
From: John R. <jr...@bi...> - 2020-01-31 00:29:27
|
> I am receiving a valgrind Internal error upon startup of a multi-threaded application. It appears to be related to the option track_origins as I was not seeing the issue without the track_origins option. I am using the following valgrind options: --trace-children=yesand --track-origins=yes. Please tell us the following essential information: 0. You claim "a multi-threaded application" yet valgrind says "Command: /bin/touch ...". Explain. [Is it implemented via busybox?] 1. Confirm that the machine architecture is 32-bit ARM. (The string "memcheck-arm-linux" appears in the output.) 2. What is the make and model of the machine? Is it a virtual machine, or real hardware? 3. How much physical RAM? How much swap space? (The string "out_of_memory" appears in the output.) 4. Was valgrind essentially the only process running? 5. How many threads? 6. Run without --trace-children=yes, then copy+paste here the HEAP SUMMARY and LEAK SUMMARY paragraphs. 7. What is the approximate allocation profile? A zillion little blocks, or a few very large blocks, etc? 8. Which linux distribution and version? 9. Run "readelf --segments the_executable" and copy+paste here the output. 10. When the internal error happens (or shortly before), what does "ps -l" say about the process? |
|
From: John K. <Joh...@be...> - 2020-01-31 02:58:39
|
Hi John,
My comments inline. See lines with JK.
John
From: John Reiser <jr...@bi...>
Sent: Thursday, January 30, 2020 4:29 PM
To: val...@li...
Subject: Re: [Valgrind-users] Valgrind Internal Error: Valgrind received a signal 11 (SIGSEGV) - exiting
> I am receiving a valgrind Internal error upon startup of a multi-threaded application. It appears to be related to the option track_origins as I was not seeing the issue without the track_origins option. I am using the following valgrind options: --trace-children=yesand --track-origins=yes.
Please tell us the following essential information:
0. You claim "a multi-threaded application" yet valgrind says "Command: /bin/touch ...". Explain. [Is it implemented via busybox?]
JK - The application is a TR-69PA... which runs 4 threads interfacing to Dbus. During startup, it does a "system" call that runs /bin/touch. The vast bulk of the program is written in C. It does run in an embedded Linux environment on a router... and busybox is used for a number of the normal linux commands such as ps, ls, etc.
1. Confirm that the machine architecture is 32-bit ARM. (The string "memcheck-arm-linux" appears in the output.)
JK - This is a Linux architecture 32-bit ARM.
We use the toolchain-arm_cortex-a7_gcc-4.8-linaro_uClibc-1.0.14_eabi.tar.xz.
2. What is the make and model of the machine? Is it a virtual machine, or real hardware?
JK - Real hardware.
3. How much physical RAM? How much swap space? (The string "out_of_memory" appears in the output.)
JK - [ 0.000000] Memory: 495376K/507904K available (5078K kernel code, 420K rwdata, 1704K rodata, 208K init, 328K bss, 12528K reserved, 0K highmem)
Not sure how to get swap space.
4. Was valgrind essentially the only process running?
JK - Definitely not. The TR069 consists of a whole family of CCSP processes that communicate to each other via DBUS. The TR069PA was however the only process running via valgrind.
5. How many threads?
JK - 4 threads per CCSP process, including the one running valgrind.
6. Run without --trace-children=yes, then copy+paste here the HEAP SUMMARY and LEAK SUMMARY paragraphs.
JK - I will have to get back to you on this one. Won't be until Monday.
7. What is the approximate allocation profile? A zillion little blocks, or a few very large blocks, etc?
JK - All sizes... some very small, and some very large. Control structures are small ones, while some objects transferred across the DBUS are quite large, containing arrays of objects/parameters. Note that the overall application is a router... it has a lot of other processes running. I would be hard pressed to characterize overall usage of memory.
8. Which linux distribution and version?
JK - This is embedded Linux version 3.14.77. Here is info displayed at boot time:
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 3.14.77 (jknight@gotrocks) (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r35193) ) #1 SMP PREEMPT Mon Dec 23 12:54:41 PST 2019
[ 0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] Machine model: Qualcomm Technologies, Inc. IPQ40xx/AP-DK07.1-C1
[ 0.000000] Memory policy: Data cache writealloc
[ 0.000000] PERCPU: Embedded 8 pages/cpu @dfbc7000 s8448 r8192 d16128 u32768
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 125952
[ 0.000000] Kernel command line: init=/sbin/init rootfstype=ubifs ubi.mtd=alt_rootfs root=ubi0:ubifs rootwait rw clk_ignore_unused
[ 0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 0.000000] Memory: 495376K/507904K available (5078K kernel code, 420K rwdata, 1704K rodata, 208K init, 328K bss, 12528K reserved, 0K highmem)
9. Run "readelf --segments the_executable" and copy+paste here the output.
JK -
jknight@gotrocks:~/projects/nodes_dev_tb_tr69/skyfall_v2_tr181/nfsroot/rootfs/usr/sbin/tr069$ readelf --segments CcspTr069PaSsp
Elf file type is EXEC (Executable file)
Entry point 0x19780
There are 7 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
EXIDX 0x0e0c38 0x000f0c38 0x000f0c38 0x012f8 0x012f8 R 0x4
PHDR 0x000034 0x00010034 0x00010034 0x000e0 0x000e0 R E 0x4
INTERP 0x000114 0x00010114 0x00010114 0x00014 0x00014 R 0x1
[Requesting program interpreter: /lib/ld-uClibc.so.0]
LOAD 0x000000 0x00010000 0x00010000 0xe1f34 0xe1f34 R E 0x10000
LOAD 0x0e2000 0x00102000 0x00102000 0x01368 0x03ef0 RW 0x10000
DYNAMIC 0x0e2008 0x00102008 0x00102008 0x00188 0x00188 RW 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
Section to Segment mapping:
Segment Sections...
00 .ARM.exidx
01
02 .interp
03 .interp .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .ARM.extab .ARM.exidx .eh_frame
04 .init_array .fini_array .dynamic .got .data .bss
05 .dynamic
06
10. When the internal error happens (or shortly before), what does "ps -l" say about the process?
JK - This is somewhat difficult to get as the error occurs as the system auto-boots and I am generally not even logged into the console yet. I will try to get this probably on Monday and get back to you.
I hope this helps. I will get back to you on Monday for the two items I could not answer today.
John
_______________________________________________
Valgrind-users mailing list
Val...@li...<mailto:Val...@li...>
https://lists.sourceforge.net/lists/listinfo/valgrind-users<https://lists.sourceforge.net/lists/listinfo/valgrind-users>
__________________________________________________________________ Confidential This e-mail and any files transmitted with it are the property of Belkin International, Inc. and/or its affiliates, are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipients or otherwise have reason to believe that you have received this e-mail in error, please notify the sender and delete this message immediately from your computer. Any other use, retention, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. Pour la version fran?aise: http://www.belkin.com/email-notice/French.html F?r die deutsche ?bersetzung: http://www.belkin.com/email-notice/German.html __________________________________________________________________
|
|
From: John R. <jr...@bi...> - 2020-01-31 05:56:34
|
> 3. How much physical RAM? How much swap space? (The string "out_of_memory" appears in the output.) > > JK - [ 0.000000] Memory: 495376K/507904K available (5078K kernel code, 420K rwdata, 1704K rodata, 208K init, 328K bss, 12528K reserved, 0K highmem) > > Not sure how to get swap space. Run /usr/bin/top, which gives other useful statistics about resources, too. On a 32-bit ARM RaspberryPi model 2 with 1GB RAM the output might begin like this for an "idle" machine: ===== Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 924.1 total, 582.1 free, 66.0 used, 276.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 830.5 avail Mem ===== > > > 4. Was valgrind essentially the only process running? > > JK – Definitely not. The TR069 consists of a whole family of CCSP processes that communicate to each other via DBUS. The TR069PA was however the only process running via valgrind > > 5. How many threads? > > JK – 4 threads per CCSP process, including the one running valgrind. So there are many CCSP processes, plus other processes, running on a box with 512 megaBytes of RAM and no swap space. Remember that a process running valgrind (memcheck) requires about 2 to 3 times the memory of a non-checked process. You have exhausted the available RAM. The "out of memory" string was a clue. Reduce the number and size of simultaneous CCSP processes. Reduce the number and size non-CCSP processes. Run valgrind (memcheck) only on the one CCSP process that really interests you. Do not use --trace-children=yes. Instead, make CcspTr069PaSsp into an executable "wrapper" shell script which identifies the correct instance (perhaps by count of invocations), then runs valgrind on a saved copy of the original CcspTr069PaSsp; else just runs the saved copy without valgrind. |
|
From: John K. <Joh...@be...> - 2020-02-03 19:51:46
|
Hi John, Thanks for the thoughtful analysis and suggestions on this issue. Unfortunately, I cannot run one CCSP app by itself all of the CCSP apps are necessary and work together. I was running valgrind on only one ccsp app, but when asking valgrind to track-origins, I guess it needed a lot more memory. I am hoping to get a different router which has more memory now that I know what the issue is. Thanks for your help! John From: John Reiser <jr...@bi...> Sent: Thursday, January 30, 2020 9:56 PM To: val...@li... Subject: Re: [Valgrind-users] Valgrind Internal Error: Valgrind received a signal 11 (SIGSEGV) - exiting > 3. How much physical RAM? How much swap space? (The string "out_of_memory" appears in the output.) > > JK - [ 0.000000] Memory: 495376K/507904K available (5078K kernel code, 420K rwdata, 1704K rodata, 208K init, 328K bss, 12528K reserved, 0K highmem) > > Not sure how to get swap space. Run /usr/bin/top, which gives other useful statistics about resources, too. On a 32-bit ARM RaspberryPi model 2 with 1GB RAM the output might begin like this for an "idle" machine: ===== Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 924.1 total, 582.1 free, 66.0 used, 276.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 830.5 avail Mem ===== > > > 4. Was valgrind essentially the only process running? > > JK - Definitely not. The TR069 consists of a whole family of CCSP processes that communicate to each other via DBUS. The TR069PA was however the only process running via valgrind > > 5. How many threads? > > JK - 4 threads per CCSP process, including the one running valgrind. So there are many CCSP processes, plus other processes, running on a box with 512 megaBytes of RAM and no swap space. Remember that a process running valgrind (memcheck) requires about 2 to 3 times the memory of a non-checked process. You have exhausted the available RAM. The "out of memory" string was a clue. Reduce the number and size of simultaneous CCSP processes. Reduce the number and size non-CCSP processes. Run valgrind (memcheck) only on the one CCSP process that really interests you. Do not use --trace-children=yes. Instead, make CcspTr069PaSsp into an executable "wrapper" shell script which identifies the correct instance (perhaps by count of invocations), then runs valgrind on a saved copy of the original CcspTr069PaSsp; else just runs the saved copy without valgrind. _______________________________________________ Valgrind-users mailing list Val...@li...<mailto:Val...@li...> https://lists.sourceforge.net/lists/listinfo/valgrind-users<https://lists.sourceforge.net/lists/listinfo/valgrind-users> __________________________________________________________________ Confidential This e-mail and any files transmitted with it are the property of Belkin International, Inc. and/or its affiliates, are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipients or otherwise have reason to believe that you have received this e-mail in error, please notify the sender and delete this message immediately from your computer. Any other use, retention, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. Pour la version fran?aise: http://www.belkin.com/email-notice/French.html F?r die deutsche ?bersetzung: http://www.belkin.com/email-notice/German.html __________________________________________________________________ |