You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(12) |
Oct
(16) |
Nov
(1) |
Dec
|
2024 |
Jan
(4) |
Feb
(3) |
Mar
(6) |
Apr
(17) |
May
(2) |
Jun
(33) |
Jul
(13) |
Aug
(1) |
Sep
(6) |
Oct
(8) |
Nov
(6) |
Dec
(15) |
2025 |
Jan
(5) |
Feb
(11) |
Mar
(8) |
Apr
(20) |
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Grant S. <mat...@gm...> - 2022-09-26 20:14:10
|
So I noticed something in my code that looked wrong to me, but valgrind didn't report anything. I made a small example of it, and still no findings. I'm sure this code is reading/writing past its array. But valgind doesn't say anything. I'm I not understanding something or is this a bug. Using: valgrind-3.19.0, gcc 4.8.5, CentOS 7 I also tried valgrind-3.19.0, gcc 7.3.1, Amazon Linux 2 Here is the code. ------ #include <string.h> #include <stdio.h> int main() { char retStr[32]; // this is bad right? 40 bytes when above was 32? memset(retStr, 'F', 40); // These are "writing" past the allocated memory? retStr[32] = 'A'; retStr[33] = 'B'; // These should be fine printf("*********** retStr is %c\n", retStr[30]); printf("*********** retStr is %c\n", retStr[31]); // These are reading past allocated memory? printf("*********** retStr is %c\n", retStr[32]); printf("*********** retStr is %c\n", retStr[33]); return 0; } --- Compiled: "gcc filename.cxx" Ran via this command "valgrind ./a.out" |
From: John R. <jr...@bi...> - 2022-09-23 00:11:25
|
> I think valgrind experienced a division by zero. > > readdwarf.c: > > if (op_code >= info.li_opcode_base) { > op_code -= info.li_opcode_base; > Word adv = (op_code / info.li_line_range) <--- line 831 > * info.li_min_insn_length; > Int advAddr = adv; > state_machine_regs.address += adv; If you can re-build valgrind, then a quick-and-dirty work-around might be > Word adv = (op_code / (info.li_line_range ?: 1)) > * info.li_min_insn_length; where "x ?: y" is a deprecated-but-useful slang for "x ? x : y". Also, one probable reason for the bug reporting system rejecting your first submission is the many consecutive lines that begin with "00:08.54 GECKO(319869) ". The work-around for this is to put the text of the valgrind complaint into an attachment, and say "See the attachment for the full text of the valgrind complaint." |
From: ISHIKAWA,chiaki <ish...@yk...> - 2022-09-22 08:30:35
|
Hi, I thought SIGFPE is a bit odd. But now I know why. I think valgrind experienced a division by zero. readdwarf.c: if (op_code >= info.li_opcode_base) { op_code -= info.li_opcode_base; Word adv = (op_code / info.li_line_range) <--- line 831 * info.li_min_insn_length; Int advAddr = adv; state_machine_regs.address += adv; On 2022/09/22 13:39, ISHIKAWA,chiaki wrote: > Hi, > I could not post this bug report to the kde bugzilla due to the > following error message. > > ========== > Your comment has been automatically blocked as it is believed to > contain spam. Please contact Sysadmin if you believe this to be > incorrect. > ========== > > Well, it does not, and the bugzilla web does not list sysadmin address. > > So here it is. > > I can send the full log on request. > Also, libxul.so when zipped is about 120MB. > I can send it via webtransfer or something to the interested party. > > I forgot to mention, I recreated TB each day after a minor fix, and so > the change in the binary toolchain is likely the culprit. > > Thank you for making the great package available to programming community. > > Chiaki > > > > ======================= > valgrind 3.20GIT crashes while trying to run Thunderbird mail client. > ------------------------ > > SUMMARY > > I am trying to run thunderbird mail client (TB for short) under valgrind. > TB is created from the so-called comm-central source tree. > My local source was synced with the public source tree about a week > ago. > > Well, I could run TB under valgrind on August 15. > Also, I believe I could run it about 10 days ago. > However, in the last few days, when I tried to run TB under valgrind, > valgrind crashed. > > valgrind said: > 00:08.54 GECKO(319869) --319873-- WARNING: Serious error when reading > debug info > 00:08.54 GECKO(319869) --319873-- When reading debug info from > /NEW-SSD/moz-obj-dir/objdir-tb3/toolkit/library/build/libxul.so: > 00:08.54 GECKO(319869) --319873-- Only DWARF version 2, 3, 4 and 5 > line info is currently supported. > 00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25 > 00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25 > 00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25 > 00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x1b > 00:08.54 GECKO(319869) --319873-- VALGRIND INTERNAL ERROR: Valgrind > received a signal 8 (SIGFPE) - exiting > 00:08.54 GECKO(319869) --319873-- si_code=1; Faulting address: > 0x580C4519; sp: 0x100307c820 > 00:08.54 GECKO(319869) valgrind: the 'impossible' happened: > 00:08.54 GECKO(319869) Killed by fatal signal > 00:08.54 GECKO(319869) host stacktrace: > 00:08.54 GECKO(319869) ==319873== at 0x580C4519: > read_dwarf2_lineblock (readdwarf.c:831) > 00:08.54 GECKO(319869) ==319873== by 0x580C770F: > vgModuleLocal_read_debuginfo_dwarf3 (readdwarf.c:1380) > 00:08.54 GECKO(319869) ==319873== by 0x58081053: > vgModuleLocal_read_elf_debug_info (readelf.c:3489) > 00:08.54 GECKO(319869) ==319873== by 0x5806F0DB: > di_notify_ACHIEVE_ACCEPT_STATE (debuginfo.c:969) > 00:08.54 GECKO(319869) ==319873== by 0x5806F0DB: > vgPlain_di_notify_mmap (debuginfo.c:1326) > 00:08.54 GECKO(319869) ==319873== by 0x5809EDBF: > vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2466) > 00:08.54 GECKO(319869) ==319873== by 0x580AA5FF: > vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:413) > 00:08.54 GECKO(319869) ==319873== by 0x5809B275: > vgPlain_client_syscall (syswrap-main.c:2234) > 00:08.54 GECKO(319869) ==319873== by 0x58096D5A: handle_syscall > (scheduler.c:1211) > 00:08.54 GECKO(319869) ==319873== by 0x58098D67: vgPlain_scheduler > (scheduler.c:1529) > 00:08.54 GECKO(319869) ==319873== by 0x580E170C: thread_wrapper > (syswrap-linux.c:101) > 00:08.54 GECKO(319869) ==319873== by 0x580E170C: > run_a_thread_NORETURN (syswrap-linux.c:154) > 00:08.54 GECKO(319869) sched status: > 00:08.54 GECKO(319869) running_tid=1 > 00:08.54 GECKO(319869) Thread 1: status = VgTs_Runnable syscall 9 > (lwpid 319873) > 00:08.54 GECKO(319869) ==319873== at 0x4020B82: __mmap64 (mmap64.c:59) > 00:08.54 GECKO(319869) ==319873== by 0x4020B82: mmap (mmap64.c:47) > 00:08.54 GECKO(319869) ==319873== by 0x400615B: _dl_map_segments > (dl-map-segments.h:94) > 00:08.54 GECKO(319869) ==319873== by 0x400615B: > _dl_map_object_from_fd (dl-load.c:1250) > 00:08.55 GECKO(319869) ==319873== by 0x40074E6: _dl_map_object > (dl-load.c:2301) > 00:08.55 GECKO(319869) ==319873== by 0x400BB57: > dl_open_worker_begin (dl-open.c:533) > 00:08.55 GECKO(319869) ==319873== by 0x4BFF09F: _dl_catch_exception > (dl-error-skeleton.c:208) > 00:08.55 GECKO(319869) ==319873== by 0x400B325: dl_open_worker > (dl-open.c:777) > 00:08.55 GECKO(319869) ==319873== by 0x4BFF09F: _dl_catch_exception > (dl-error-skeleton.c:208) > 00:08.55 GECKO(319869) ==319873== by 0x400B70A: _dl_open > (dl-open.c:878) > 00:08.55 GECKO(319869) ==319873== by 0x4B32D77: dlopen_doit > (dlopen.c:56) > 00:08.55 GECKO(319869) ==319873== by 0x4BFF09F: _dl_catch_exception > (dl-error-skeleton.c:208) > 00:08.55 GECKO(319869) ==319873== by 0x4BFF15E: _dl_catch_error > (dl-error-skeleton.c:227) > 00:08.55 GECKO(319869) ==319873== by 0x4B32855: _dlerror_run > (dlerror.c:138) > 00:08.55 GECKO(319869) ==319873== by 0x4B32E30: > dlopen_implementation (dlopen.c:71) > 00:08.55 GECKO(319869) ==319873== by 0x4B32E30: dlopen@@GLIBC_2.34 > (dlopen.c:81) > 00:08.55 GECKO(319869) ==319873== by 0x118474: GetLibHandle(char > const*) (nsXPCOMGlue.cpp:89) > 00:08.55 GECKO(319869) ==319873== by 0x1185F5: ReadDependentCB(char > const*, mozilla::LibLoadingStrategy) (nsXPCOMGlue.cpp:144) > 00:08.55 GECKO(319869) ==319873== by 0x119252: XPCOMGlueLoad(char > const*, mozilla::LibLoadingStrategy) (nsXPCOMGlue.cpp:323) > 00:08.55 GECKO(319869) ==319873== by 0x1193D4: > mozilla::GetBootstrap(char const*, mozilla::LibLoadingStrategy) > (nsXPCOMGlue.cpp:405) > 00:08.55 GECKO(319869) ==319873== by 0x115D9B: > InitXPCOMGlue(mozilla::LibLoadingStrategy) (nsMailApp.cpp:244) > 00:08.55 GECKO(319869) ==319873== by 0x11682F: main (nsMailApp.cpp:375) > 00:08.55 GECKO(319869) client stack range: [0x1FFEFFA000 0x1FFF000FFF] > client SP: 0x1FFEFFB768 > 00:08.55 GECKO(319869) valgrind stack range: [0x1002F7E000 > 0x100307DFFF] top usage: 18984 of 1048576 > 00:08.55 GECKO(319869) Note: see also the FAQ in the source distribution. > 00:08.55 GECKO(319869) It contains workarounds to several common problems. > 00:08.55 GECKO(319869) In particular, if Valgrind aborted or crashed after > 00:08.55 GECKO(319869) identifying problems in your program, there's a > good chance > 00:08.55 GECKO(319869) that fixing those problems will prevent > Valgrind aborting or > 00:08.55 GECKO(319869) crashing, especially if it happened in > m_mallocfree.c. > 00:08.55 GECKO(319869) If that doesn't help, please report this bug > to: www.valgrind.org > 00:08.55 GECKO(319869) In the bug report, send all the above text, the > valgrind > 00:08.55 GECKO(319869) version, and what OS and version you are > using. Thanks. > > > I suspect something has changed in the TB's binary toolchain in the > last 10 days or so. > > Valgrind version > Valgrind-3.20.0.GIT-90763ca763-20220522X and LibVEX > > OS: > Linux version 5.18.0-4-amd64 (deb...@li...) (gcc-11 > (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) > 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10) > > It seems the binary toolchain for TB seemed to have generated debug > information that > valgrind could not grok (?) > > I am uploading the relvant part of the log from the test run when the > fatal error occurred. > > When valgrind failed, TB was created using ordinary -g flag of gcc-10. > > On a hunch, I re-created TB using -gsplit-dwarf flag to gcc.. > Then, with the newly created version of TB, valgrind did not crash > although it did print > "### unhandled dwarf2 ..." warnings. > > So something in the debug information in the libxul.so is not quite > right when ordinary non-split dwarf information is in the object.. > > (I will attching the gzipped libxul if that big file is accepted.) > > > > STEPS TO REPRODUCE > 1. run valgrind to check the memory usage of thunderbird mail client > when it runs a test. > The command line is dumped in the attached log. > 2. Wait for the completion of a test run. > 3. valgrind crashes. > > OBSERVED RESULT > 00:08.54 GECKO(319869) --319873-- VALGRIND INTERNAL ERROR: Valgrind > received a signal 8 (SIGFPE) - exiting > 00:08.54 GECKO(319869) --319873-- si_code=1; Faulting address: > 0x580C4519; sp: 0x100307c820 > 00:08.54 GECKO(319869) valgrind: the 'impossible' happened: > 00:08.54 GECKO(319869) Killed by fatal signal > 00:08.54 GECKO(319869) host stacktrace: > > > EXPECTED RESULT > valgrind ought to finish the execution of TB running its test > successfully. > > SOFTWARE/OS VERSIONS > Debian GNU/Linux. > Linux version 5.18.0-4-amd64 (deb...@li...) (gcc-11 > (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) > 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10) > > Linux/KDE Plasma: > (available in About System) <--- not sure about this. My Debian XFCE4 > desltop does not ahve "About System" anywhere. > I don't believe the GUI middleware has anything to do with the bug > reported here. > KDE Plasma Version: > KDE Frameworks Version: > Qt Version: > > ADDITIONAL INFORMATION > As I noted above, if I recompile TB using -gsplit-dwarf flag to > gcc-10, then valgrind prints warnings about > unknown dwarf2 symbol, but it runs TB running its test to its completion. > I pass "-gdwarf-4 " to gcc. > So if something is generating dwarf2 info, it is not gcc, I think. > Maybe rust compiler being used? > rustc --version > rustc 1.63.0 (4b91a6ea7 2022-08-08) > > [end of memo] > |
From: ISHIKAWA,chiaki <ish...@yk...> - 2022-09-22 04:56:39
|
Hi, I could not post this bug report to the kde bugzilla due to the following error message. ========== Your comment has been automatically blocked as it is believed to contain spam. Please contact Sysadmin if you believe this to be incorrect. ========== Well, it does not, and the bugzilla web does not list sysadmin address. So here it is. I can send the full log on request. Also, libxul.so when zipped is about 120MB. I can send it via webtransfer or something to the interested party. I forgot to mention, I recreated TB each day after a minor fix, and so the change in the binary toolchain is likely the culprit. Thank you for making the great package available to programming community. Chiaki ======================= valgrind 3.20GIT crashes while trying to run Thunderbird mail client. ------------------------ SUMMARY I am trying to run thunderbird mail client (TB for short) under valgrind. TB is created from the so-called comm-central source tree. My local source was synced with the public source tree about a week ago. Well, I could run TB under valgrind on August 15. Also, I believe I could run it about 10 days ago. However, in the last few days, when I tried to run TB under valgrind, valgrind crashed. valgrind said: 00:08.54 GECKO(319869) --319873-- WARNING: Serious error when reading debug info 00:08.54 GECKO(319869) --319873-- When reading debug info from /NEW-SSD/moz-obj-dir/objdir-tb3/toolkit/library/build/libxul.so: 00:08.54 GECKO(319869) --319873-- Only DWARF version 2, 3, 4 and 5 line info is currently supported. 00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25 00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25 00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x25 00:08.54 GECKO(319869) ### unhandled dwarf2 abbrev form code 0x1b 00:08.54 GECKO(319869) --319873-- VALGRIND INTERNAL ERROR: Valgrind received a signal 8 (SIGFPE) - exiting 00:08.54 GECKO(319869) --319873-- si_code=1; Faulting address: 0x580C4519; sp: 0x100307c820 00:08.54 GECKO(319869) valgrind: the 'impossible' happened: 00:08.54 GECKO(319869) Killed by fatal signal 00:08.54 GECKO(319869) host stacktrace: 00:08.54 GECKO(319869) ==319873== at 0x580C4519: read_dwarf2_lineblock (readdwarf.c:831) 00:08.54 GECKO(319869) ==319873== by 0x580C770F: vgModuleLocal_read_debuginfo_dwarf3 (readdwarf.c:1380) 00:08.54 GECKO(319869) ==319873== by 0x58081053: vgModuleLocal_read_elf_debug_info (readelf.c:3489) 00:08.54 GECKO(319869) ==319873== by 0x5806F0DB: di_notify_ACHIEVE_ACCEPT_STATE (debuginfo.c:969) 00:08.54 GECKO(319869) ==319873== by 0x5806F0DB: vgPlain_di_notify_mmap (debuginfo.c:1326) 00:08.54 GECKO(319869) ==319873== by 0x5809EDBF: vgModuleLocal_generic_PRE_sys_mmap (syswrap-generic.c:2466) 00:08.54 GECKO(319869) ==319873== by 0x580AA5FF: vgSysWrap_amd64_linux_sys_mmap_before (syswrap-amd64-linux.c:413) 00:08.54 GECKO(319869) ==319873== by 0x5809B275: vgPlain_client_syscall (syswrap-main.c:2234) 00:08.54 GECKO(319869) ==319873== by 0x58096D5A: handle_syscall (scheduler.c:1211) 00:08.54 GECKO(319869) ==319873== by 0x58098D67: vgPlain_scheduler (scheduler.c:1529) 00:08.54 GECKO(319869) ==319873== by 0x580E170C: thread_wrapper (syswrap-linux.c:101) 00:08.54 GECKO(319869) ==319873== by 0x580E170C: run_a_thread_NORETURN (syswrap-linux.c:154) 00:08.54 GECKO(319869) sched status: 00:08.54 GECKO(319869) running_tid=1 00:08.54 GECKO(319869) Thread 1: status = VgTs_Runnable syscall 9 (lwpid 319873) 00:08.54 GECKO(319869) ==319873== at 0x4020B82: __mmap64 (mmap64.c:59) 00:08.54 GECKO(319869) ==319873== by 0x4020B82: mmap (mmap64.c:47) 00:08.54 GECKO(319869) ==319873== by 0x400615B: _dl_map_segments (dl-map-segments.h:94) 00:08.54 GECKO(319869) ==319873== by 0x400615B: _dl_map_object_from_fd (dl-load.c:1250) 00:08.55 GECKO(319869) ==319873== by 0x40074E6: _dl_map_object (dl-load.c:2301) 00:08.55 GECKO(319869) ==319873== by 0x400BB57: dl_open_worker_begin (dl-open.c:533) 00:08.55 GECKO(319869) ==319873== by 0x4BFF09F: _dl_catch_exception (dl-error-skeleton.c:208) 00:08.55 GECKO(319869) ==319873== by 0x400B325: dl_open_worker (dl-open.c:777) 00:08.55 GECKO(319869) ==319873== by 0x4BFF09F: _dl_catch_exception (dl-error-skeleton.c:208) 00:08.55 GECKO(319869) ==319873== by 0x400B70A: _dl_open (dl-open.c:878) 00:08.55 GECKO(319869) ==319873== by 0x4B32D77: dlopen_doit (dlopen.c:56) 00:08.55 GECKO(319869) ==319873== by 0x4BFF09F: _dl_catch_exception (dl-error-skeleton.c:208) 00:08.55 GECKO(319869) ==319873== by 0x4BFF15E: _dl_catch_error (dl-error-skeleton.c:227) 00:08.55 GECKO(319869) ==319873== by 0x4B32855: _dlerror_run (dlerror.c:138) 00:08.55 GECKO(319869) ==319873== by 0x4B32E30: dlopen_implementation (dlopen.c:71) 00:08.55 GECKO(319869) ==319873== by 0x4B32E30: dlopen@@GLIBC_2.34 (dlopen.c:81) 00:08.55 GECKO(319869) ==319873== by 0x118474: GetLibHandle(char const*) (nsXPCOMGlue.cpp:89) 00:08.55 GECKO(319869) ==319873== by 0x1185F5: ReadDependentCB(char const*, mozilla::LibLoadingStrategy) (nsXPCOMGlue.cpp:144) 00:08.55 GECKO(319869) ==319873== by 0x119252: XPCOMGlueLoad(char const*, mozilla::LibLoadingStrategy) (nsXPCOMGlue.cpp:323) 00:08.55 GECKO(319869) ==319873== by 0x1193D4: mozilla::GetBootstrap(char const*, mozilla::LibLoadingStrategy) (nsXPCOMGlue.cpp:405) 00:08.55 GECKO(319869) ==319873== by 0x115D9B: InitXPCOMGlue(mozilla::LibLoadingStrategy) (nsMailApp.cpp:244) 00:08.55 GECKO(319869) ==319873== by 0x11682F: main (nsMailApp.cpp:375) 00:08.55 GECKO(319869) client stack range: [0x1FFEFFA000 0x1FFF000FFF] client SP: 0x1FFEFFB768 00:08.55 GECKO(319869) valgrind stack range: [0x1002F7E000 0x100307DFFF] top usage: 18984 of 1048576 00:08.55 GECKO(319869) Note: see also the FAQ in the source distribution. 00:08.55 GECKO(319869) It contains workarounds to several common problems. 00:08.55 GECKO(319869) In particular, if Valgrind aborted or crashed after 00:08.55 GECKO(319869) identifying problems in your program, there's a good chance 00:08.55 GECKO(319869) that fixing those problems will prevent Valgrind aborting or 00:08.55 GECKO(319869) crashing, especially if it happened in m_mallocfree.c. 00:08.55 GECKO(319869) If that doesn't help, please report this bug to: www.valgrind.org 00:08.55 GECKO(319869) In the bug report, send all the above text, the valgrind 00:08.55 GECKO(319869) version, and what OS and version you are using. Thanks. I suspect something has changed in the TB's binary toolchain in the last 10 days or so. Valgrind version Valgrind-3.20.0.GIT-90763ca763-20220522X and LibVEX OS: Linux version 5.18.0-4-amd64 (deb...@li...) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10) It seems the binary toolchain for TB seemed to have generated debug information that valgrind could not grok (?) I am uploading the relvant part of the log from the test run when the fatal error occurred. When valgrind failed, TB was created using ordinary -g flag of gcc-10. On a hunch, I re-created TB using -gsplit-dwarf flag to gcc.. Then, with the newly created version of TB, valgrind did not crash although it did print "### unhandled dwarf2 ..." warnings. So something in the debug information in the libxul.so is not quite right when ordinary non-split dwarf information is in the object.. (I will attching the gzipped libxul if that big file is accepted.) STEPS TO REPRODUCE 1. run valgrind to check the memory usage of thunderbird mail client when it runs a test. The command line is dumped in the attached log. 2. Wait for the completion of a test run. 3. valgrind crashes. OBSERVED RESULT 00:08.54 GECKO(319869) --319873-- VALGRIND INTERNAL ERROR: Valgrind received a signal 8 (SIGFPE) - exiting 00:08.54 GECKO(319869) --319873-- si_code=1; Faulting address: 0x580C4519; sp: 0x100307c820 00:08.54 GECKO(319869) valgrind: the 'impossible' happened: 00:08.54 GECKO(319869) Killed by fatal signal 00:08.54 GECKO(319869) host stacktrace: EXPECTED RESULT valgrind ought to finish the execution of TB running its test successfully. SOFTWARE/OS VERSIONS Debian GNU/Linux. Linux version 5.18.0-4-amd64 (deb...@li...) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10) Linux/KDE Plasma: (available in About System) <--- not sure about this. My Debian XFCE4 desltop does not ahve "About System" anywhere. I don't believe the GUI middleware has anything to do with the bug reported here. KDE Plasma Version: KDE Frameworks Version: Qt Version: ADDITIONAL INFORMATION As I noted above, if I recompile TB using -gsplit-dwarf flag to gcc-10, then valgrind prints warnings about unknown dwarf2 symbol, but it runs TB running its test to its completion. I pass "-gdwarf-4 " to gcc. So if something is generating dwarf2 info, it is not gcc, I think. Maybe rust compiler being used? rustc --version rustc 1.63.0 (4b91a6ea7 2022-08-08) [end of memo] |
From: Paul F. <pj...@wa...> - 2022-09-14 07:45:57
|
Can you try https://github.com/oracle/solaris-userland/tree/master/components/valgrind One day I’ll get round to contacting the Oracle support team that produced this with a view to merging it upstream. A+ Paul |
From: David A. <da...@li...> - 2022-09-12 19:43:02
|
Here is the bug as a url. https://bugs.kde.org/show_bug.cgi?id=459031 DavidA |
From: David A. <da...@li...> - 2022-09-12 19:39:22
|
I just filed a bug on kde.org about the missing words related to --error-exitcode= and suggested a sentence for the on-line html doc to resolve the issue. Bug 459031 on bugs.kde.org DavidA |
From: John R. <jr...@bi...> - 2022-09-12 17:57:08
|
On 9/12/22 10:04, David Anderson wrote: > Thhe html documentation > http://www.valgrind.org/docs/manual/index.html > explains what --error-exitcode=9 > does in case valgrind notes an error. > [[snip]] > It says nothing explicitly about what exit code is returned > if valgrind does not find an error. AFAICT. > > In my tests, it appears that when valgrind > finds no problem valgrind returns the exit code > of the application under test if --error-exitcode is set non-zero. > I suggest that the web page should say that clearly. The best way to handle this is to file a bug report: https://valgrind.org/support/bug_reports.html and then post here the URL of the bug report. A bug report alerts multiple developers, and will not get lost or forgotten. |
From: David A. <da...@li...> - 2022-09-12 17:17:20
|
Thhe html documentation http://www.valgrind.org/docs/manual/index.html explains what --error-exitcode=9 does in case valgrind notes an error. --error-exitcode=<number> Specifies an alternative exit code to return if Valgrind reported any errors in the run. When set to the default value (zero), the return value from Valgrind will always be the return value of the process being simulated. When set to a nonzero value, that value is returned instead, if Valgrind detects any errors. This is useful for using Valgrind as part of an automated test suite, since it makes it easy to detect test cases for which Valgrind has reported errors, just by inspecting return codes. It says nothing explicitly about what exit code is returned if valgrind does not find an error. AFAICT. In my tests, it appears that when valgrind finds no problem valgrind returns the exit code of the application under test if --error-exitcode is set non-zero. I suggest that the web page should say that clearly. DavidA |
From: John R. <jr...@bi...> - 2022-09-12 15:58:53
|
> OK, but why would that break core files only with valgrind? Because when ran directly, the core files work perfectly fine. [Rhetorical] Why are there bugs? [Practical] The operating system itself is the writer of ordinary core files, which contain process state: register values, copies of Writable pages, partial information from Read-only pages, etc. Valgrind is an in-process emulator. As far as the OS is concerned, the process is valgrind, not postgresql. The register values are those of the valgrind emulator internal code, not of the target program that valgrind is emulating. In order for the core file to look like it was generated for postgresql, then valgrind must write the core file. The spec for the layout of a core file (the C-language 'struct' that corresponds to the sequence of bytes in the file) is rife with opportunities for bugs. First, the spec is hard to find, or may refer to other documents that are hard to access. (What _exactly_ is the entire programmer- visible register state?) Then the spec is not executable (directly compilable). Often the spec or the C_language 'struct' is not updated in timely manner when the hardware or the OS changes. In practice it is very easy for there to be a discrepancy involving the presence, order, width, or alignment of various fields, especially for condition codes, processor modes (32 or 64 bit?), and optional register files or accelerators (floating point, SIMD, vector units, etc.) > > ... attached is a simple .c file, with a trivial example (3 functions) and segfault (or abort). ... > The core file produced without valgrind is perfectly fine: > > $ gdb ./a.out core > ... > Core was generated by `./a.out'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 > 6 *ptr = 'a'; > (gdb) bt > #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 > #1 0x0000005594350750 in f2 () at valgrind-core-test.c:13 > #2 0x0000005594350768 in f1 () at valgrind-core-test.c:18 > #3 0x0000005594350780 in main () at valgrind-core-test.c:23 > > but when run under valgrind it looks like this: > > $ gdb ./a.out vgcore.1395835 > ... > Core was generated by `'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000000000108734 in ?? () > (gdb) bt > #0 0x0000000000108734 in ?? () > #1 0x0000000000108780 in ?? () > #2 0x0000000000108644 in ?? () > > However, when I do this on x86 (Fedora 34, gcc 11.3.1, valgrind 3.18.1) it works just fine and I get the same backtrace. > > So perhaps this is specific to (either) gcc 10.2, or aarch64 platform. Bingo! You now have the raw material for a very good bug report: "valgrind-generated core files lose debugging info on aarch64". Please file a bug report; see https://valgrind.org/support/bug_reports.html (Also note that several values for program counter in the two tracebacks agree in the lowest 12 bits (3 hex digits). So this may be some confusion about the placement ("relocation") of [groups of] whole pages in the address space.) |
From: Tomas V. <tv...@fu...> - 2022-09-12 10:08:32
|
On 9/9/22 18:20, John Reiser wrote: > [[Aggressive snipping, but relevant details preserved.]] > >> No threading is used. Postgres is multi-process, and uses shared >> memory for the shared cache (through shm_open etc.). > > Multi-process plus shm_open() IS THREADING! Not pthreads, but multiple > execution contexts that read and write the same memory, which is > subject to the same types of synchronization errors as pthreads. > Perhaps --tool=drd and --tool=helgrind can help. > OK, but why would that break core files only with valgrind? Because when ran directly, the core files work perfectly fine. > > [[Another topic]] >> Sure, but that's more of a workaround - it does not make the core file >> useful, it provides alternative way to get to the same result. Plus it >> requires additional tooling/scripting, and I'd prefer keeping the >> tooling as simple as possible. > > I made a specific suggestion that takes less than one hour: build a > small test case > that performs a short chain of subroutine calls, with the last routine > generating > a deliberate SIGABRT. Run the test case under valgrind, get a core file > from valgrind, > and see if gdb gives the correct traceback from that core file. The > objective > is to provide a strong clue about whether *every* core file generated by > valgrind > (in your environment) fails to work well with gdb. Perhaps solving the > problem > that involves your larger and more-complex case can be subsumed by > analyzing > something that is much simpler. > > Please perform that experiment and report the results here. > I did this experiment - attached is a simple .c file, with a trivial example (3 functions) and segfault (or abort). When built like this: $ gcc valgrind-core-test.c -O0 -g then the core file produced without valgrind is perfectly fine: $ gdb ./a.out core ... Core was generated by `./a.out'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 6 *ptr = 'a'; (gdb) bt #0 0x0000005594350734 in f3 () at valgrind-core-test.c:6 #1 0x0000005594350750 in f2 () at valgrind-core-test.c:13 #2 0x0000005594350768 in f1 () at valgrind-core-test.c:18 #3 0x0000005594350780 in main () at valgrind-core-test.c:23 but when run under valgrind it looks like this: $ gdb ./a.out vgcore.1395835 ... Core was generated by `'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000108734 in ?? () (gdb) bt #0 0x0000000000108734 in ?? () #1 0x0000000000108780 in ?? () #2 0x0000000000108644 in ?? () However, when I do this on x86 (Fedora 34, gcc 11.3.1, valgrind 3.18.1) it works just fine and I get the same backtrace. So perhaps this is specific to (either) gcc 10.2, or aarch64 platform. regards Tomas |
From: John R. <jr...@bi...> - 2022-09-09 16:20:24
|
[[Aggressive snipping, but relevant details preserved.]] > No threading is used. Postgres is multi-process, and uses shared memory for the shared cache (through shm_open etc.). Multi-process plus shm_open() IS THREADING! Not pthreads, but multiple execution contexts that read and write the same memory, which is subject to the same types of synchronization errors as pthreads. Perhaps --tool=drd and --tool=helgrind can help. [[Another topic]] > Sure, but that's more of a workaround - it does not make the core file useful, it provides alternative way to get to the same result. Plus it requires additional tooling/scripting, and I'd prefer keeping the tooling as simple as possible. I made a specific suggestion that takes less than one hour: build a small test case that performs a short chain of subroutine calls, with the last routine generating a deliberate SIGABRT. Run the test case under valgrind, get a core file from valgrind, and see if gdb gives the correct traceback from that core file. The objective is to provide a strong clue about whether *every* core file generated by valgrind (in your environment) fails to work well with gdb. Perhaps solving the problem that involves your larger and more-complex case can be subsumed by analyzing something that is much simpler. Please perform that experiment and report the results here. |
From: Tomas V. <tv...@fu...> - 2022-09-09 14:26:28
|
On 9/9/22 04:58, John Reiser wrote: >>> 1. Describe the environment completely. > > Also: Any kind of threading (pthreads, or shm_open, or > mmap(,,,MAP_SHARED,,)) > must be mentioned explicitly. Multiple execution contexts which access > the same address space instance are a significant complicating factor. > > If threading is involved, then try using "valgrind --tool=drd ..." > or --tool=helgrind, because those tools specifically target detecting > race conditions and other synchronization errors, much like --tool=memcheck > [the default tool when no --tool= is mentioned] targets errors involving > malloc() and free(), uninitialized variables, etc. > No threading is used. Postgres is multi-process, and uses shared memory for the shared cache (through shm_open etc.). FWIW, as I mentioned before, this works perfectly fine when the core is not generated by valgrind. >>> 4. Walk before attempting to run. >>> Did you try a simple example? Write a half-page program with 5 >>> subroutines, >>> each of which calls the next one, and the last one sends SIGABRT to >>> the process. > >>> Does the .core file when run under valgrind give the correct >>> traceback using gdb? > > Specifically: apply valgrind to the small program which causes a > deliberate SIGABRT, > and get a core file. Does gdb give the correct traceback for that core > file? > If not, then you have an ideal test case for filing a bug report against > valgrind > because even the simple core file is bad. If gdb does give a correct > traceback > for the simple core file, then you have to keep looking for the source > of the > problem on your larger program. > I'll try this once I have access to the machine early next week. > >>> 5. (Learn and) Use the built-in tools where possible. >>> Run the process interactively, invoking valgrind with "--vgdb-error=0", >>> and giving the debugger command "(gdb) continue" after establishing >>> connectivity between vgdb and the process. >>> See the valgrind manual, section 3.2.9 "vgdb command line options". >>> When the SIGABRT happens, then vgdb will allow you to use all the >>> ordinary >>> gdb commands to get a backtrace, go up and down the stack, examine >>> variables and other memory, run >>> (gdb) info proc >>> (gdb) shell cat /proc/$PID/maps >>> to see exactly the layout of process memory, etc. >>> There are also special commands to access valgrind functionality >>> interactively, such as checking for memory leaks. >>> >> >> I already explained why I don't want / can't use the interactive gdb. >> I'm aware of the option, I've used it before, but in this case it's >> not very practical. > > The gdb process does not *have* to be run interactively, it just takes > more work > and patience to run non-interactively. Run "valgrind --vgdb-error=0 ..." > and notice the last part of the printed instructions: > > and then give GDB the following command > ==215935== target remote | > /path/to/libexec/valgrind/../../bin/vgdb --pid=215935 > ==215935== --pid is optional if only one valgrind process is running > > So if there is only one valgrind process, then you do not need to know > the pid. > Thus you can run gdb with re-directed stdin/stdout/stderr, or perhaps > use the -x > command-line option. This allows a static, pre-scripted list of gdb > commands; > it may require a few iterations to get a good debug script. (Try the > commands > using the trivial SIGABRT case!) Also get the full gdb manual (more > than 800 pages) > and look at the "thread apply all ..." and "frame apply all ..." commands. > Sure, but that's more of a workaround - it does not make the core file useful, it provides alternative way to get to the same result. Plus it requires additional tooling/scripting, and I'd prefer keeping the tooling as simple as possible. Postgres is a multi-process system, that runs a bunch of management processes, and client processes (1:1 to connections). We don't know in which one an issue might happen, so we'd have to attach a script to each of them. Furthermore, there's the question of performance - we run these tests on many machines (although only some of them run them under valgrind), the valgrind makes it fairly slow already - if this vgdb thing makes even slower, that'd be an issue. But I haven't measured it, so maybe it's not as bad as I'm afraid. > It may be possible to perform some interactive "reconnaisance" to suggest > good things for the script to try. Using --vgdb-error=0, put a breakpoint > on a likely location for the error (or shortly before the error), > and look around. In the logged traceback: > > TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: > "reorderbuffer.c", Line: 902, PID: 536049) > (ExceptionalCondition+0x98)[0x8f5cec] > (+0x57a574)[0x682574] > (+0x579edc)[0x681edc] > (ReorderBufferAddNewTupleCids+0x60)[0x6864dc] > (SnapBuildProcessNewCid+0x94)[0x68b6a4] > > any of those named locations, or shortly before them, might be a good spot. > When execution stops at any one of the breakpoints, then look around > and see if you can find clues about "prev_first_lsn < cur_txn->first_lsn" > even though the error has not yet occurred. Perhaps this will help > identify location(s) that might be closer to the actual error > when it does happen. This might suggest commands for the non-interactive > gdb debugging script. > This does not work, I'm afraid. The issue is a (rare) race condition, and we run the assert thousands of times and it's fine 99.999% of the time. The breakpoint & interactive reconnaissance is unlikely to find anything 99% of the time, and it can easily make the race condition go away by changing the timing. That's kinda the interesting thing - this is not an issue valgrind is meant to discover, it's just that it seems to change the timing just enough to increase the probability. regards Tomas |
From: John R. <jr...@bi...> - 2022-09-09 02:59:02
|
>> 1. Describe the environment completely. Also: Any kind of threading (pthreads, or shm_open, or mmap(,,,MAP_SHARED,,)) must be mentioned explicitly. Multiple execution contexts which access the same address space instance are a significant complicating factor. If threading is involved, then try using "valgrind --tool=drd ..." or --tool=helgrind, because those tools specifically target detecting race conditions and other synchronization errors, much like --tool=memcheck [the default tool when no --tool= is mentioned] targets errors involving malloc() and free(), uninitialized variables, etc. >> 4. Walk before attempting to run. >> Did you try a simple example? Write a half-page program with 5 subroutines, >> each of which calls the next one, and the last one sends SIGABRT to the process. >> Does the .core file when run under valgrind give the correct traceback using gdb? Specifically: apply valgrind to the small program which causes a deliberate SIGABRT, and get a core file. Does gdb give the correct traceback for that core file? If not, then you have an ideal test case for filing a bug report against valgrind because even the simple core file is bad. If gdb does give a correct traceback for the simple core file, then you have to keep looking for the source of the problem on your larger program. >> 5. (Learn and) Use the built-in tools where possible. >> Run the process interactively, invoking valgrind with "--vgdb-error=0", >> and giving the debugger command "(gdb) continue" after establishing >> connectivity between vgdb and the process. >> See the valgrind manual, section 3.2.9 "vgdb command line options". >> When the SIGABRT happens, then vgdb will allow you to use all the ordinary >> gdb commands to get a backtrace, go up and down the stack, examine >> variables and other memory, run >> (gdb) info proc >> (gdb) shell cat /proc/$PID/maps >> to see exactly the layout of process memory, etc. >> There are also special commands to access valgrind functionality >> interactively, such as checking for memory leaks. >> > > I already explained why I don't want / can't use the interactive gdb. I'm aware of the option, I've used it before, but in this case it's not very practical. The gdb process does not *have* to be run interactively, it just takes more work and patience to run non-interactively. Run "valgrind --vgdb-error=0 ..." and notice the last part of the printed instructions: and then give GDB the following command ==215935== target remote | /path/to/libexec/valgrind/../../bin/vgdb --pid=215935 ==215935== --pid is optional if only one valgrind process is running So if there is only one valgrind process, then you do not need to know the pid. Thus you can run gdb with re-directed stdin/stdout/stderr, or perhaps use the -x command-line option. This allows a static, pre-scripted list of gdb commands; it may require a few iterations to get a good debug script. (Try the commands using the trivial SIGABRT case!) Also get the full gdb manual (more than 800 pages) and look at the "thread apply all ..." and "frame apply all ..." commands. It may be possible to perform some interactive "reconnaisance" to suggest good things for the script to try. Using --vgdb-error=0, put a breakpoint on a likely location for the error (or shortly before the error), and look around. In the logged traceback: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 902, PID: 536049) (ExceptionalCondition+0x98)[0x8f5cec] (+0x57a574)[0x682574] (+0x579edc)[0x681edc] (ReorderBufferAddNewTupleCids+0x60)[0x6864dc] (SnapBuildProcessNewCid+0x94)[0x68b6a4] any of those named locations, or shortly before them, might be a good spot. When execution stops at any one of the breakpoints, then look around and see if you can find clues about "prev_first_lsn < cur_txn->first_lsn" even though the error has not yet occurred. Perhaps this will help identify location(s) that might be closer to the actual error when it does happen. This might suggest commands for the non-interactive gdb debugging script. |
From: Paul F. <pj...@wa...> - 2022-09-08 21:25:20
|
<div dir='auto'><br><div class="gmail_extra" dir="auto"><br><div class="gmail_quote">On 8 Sept 2022 15:27, Shane Bishop <sha...@ou...> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> Hi,</div> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> <br> </div> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> I am trying to compile Valgrind 3.19.0 on Solaris </div> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> <br> </div> <div style="font-family:'calibri' , 'helvetica' , sans-serif;font-size:12pt;color:rgb( 0 , 0 , 0 );background-color:rgb( 255 , 255 , 255 )"> Is there an earlier release of Valgrind that is known to successfully compile on Solaris 11 that I could try building instead?</div></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">If you have Oracle support I believe that they have a version available. Otherwise I think that there is an Oracle GitHub repo with either a fork or patches.</div><div dir="auto"><br></div><div dir="auto">I can probably provide more info next week.</div><div dir="auto"><br></div><div dir="auto">A+</div><div dir="auto">Paul</div><div class="gmail_extra" dir="auto"></div></div> |
From: Tomas V. <tv...@fu...> - 2022-09-08 17:33:42
|
On 9/4/22 04:16, John Reiser wrote: >> Any ideas what I might be doing wrong? Or how do I load the core file? > > Why does use of valgrind cause programmers to forget general debugging > technique? > > 1. Describe the environment completely. > The report does not say which compilers and compiler versions were used, > or if the compiler commands contained any directives about debugging > format. > Such information is necessary to help understand what might be happening > with regard to debugging and tracebacks. > Yes, I should have included this information - I don't have access to the machine at the moment, but I'll share the detailed info early next week. However, it's running current RasberryPI OS 64-bit version, which is based on Debian 11. So it should have the same version of gcc etc. > 2. Get debugging information whenever invoking a compiler. > Traceback lines such as "(+0x57a574)[0x682574]" which lack the name > of a symbol or file, suggest that "-g" debugging info was not requested > for *all* compilations. Start over ("make clean; rm -rf '*.[oa]'") > then re-compile every source file, making be sure to specify "-g" > and no variant of "-O" or "-On", except possibly "-O0". > This is a bit puzzling. I'm always running valgrind tests with "-O0" and possibly with -fno-omit-frame-pointer, as that gives me the most reliable results etc. "-g" should be enabled too (thanks to the postgres specific --enable-debug configure switch). > 3. Optimizing for speed comes after achieving correct execution. > If 'inline' is used anywhere, then re-compile with the compile-time > argument > "-Dinline=/*empty*/" in order to #define 'inline' as a one-word comment. > If the behavior of the program changes (any difference at all, excepting > only slower execution), then there is a *design error* in the source code. > Fix that first. > If I was optimizing for speed, I wouldn't be running with "-O0". I'm not sure what's causing the missing symbols, but it certainly is not inline functions - we do have a couple of those, but definitely not this high in the stack. The other thing is that when loading the core file into gdb, the backtrace is entirely different (and bogus) from what was written into the server log (which comes from "backtrace()" - maybe the missing symbol names are due to some limitation in this). > 4. Walk before attempting to run. > Did you try a simple example? Write a half-page program with 5 > subroutines, > each of which calls the next one, and the last one sends SIGABRT to the > process. I've inspected *thousands* of core files in the last couple years, both as part of development and supporting all kinds of systems. And most of the time it either works just fine or it's clear why it's not working. Except when running under valgrind, in which case I have no idea why it doesn't work (with the same compile options and all that). > Does the .core file when run under valgrind give the correct traceback > using gdb? > I'm not sure I understand the questions. In my initial post I showed two backtraces - one I extracted from the .core file using gdb, and another one that the application itself (postgres) writes into the server log (after using backtrace() etc.). The logged backtrace has a couple missing symbols, but seems reasonable otherwise. The backtrace extracted from the .core file is clearly bogus. > 5. (Learn and) Use the built-in tools where possible. > Run the process interactively, invoking valgrind with "--vgdb-error=0", > and giving the debugger command "(gdb) continue" after establishing > connectivity between vgdb and the process. > See the valgrind manual, section 3.2.9 "vgdb command line options". > When the SIGABRT happens, then vgdb will allow you to use all the ordinary > gdb commands to get a backtrace, go up and down the stack, examine > variables and other memory, run > (gdb) info proc > (gdb) shell cat /proc/$PID/maps > to see exactly the layout of process memory, etc. > There are also special commands to access valgrind functionality > interactively, such as checking for memory leaks. > I already explained why I don't want / can't use the interactive gdb. I'm aware of the option, I've used it before, but in this case it's not very practical. regards Tomas |
From: Shane B. <sha...@ou...> - 2022-09-08 15:01:53
|
Hi, I am trying to compile Valgrind 3.19.0 on Solaris 11. The compilation fails. Here are the commands I have run: wget --no-check-certificate https://sourceware.org/pub/valgrind/valgrind-3.19.0.tar.bz2 tar xjf valgrind-3.19.0.tar.bz2 cd valgrind-3.19.0 MAKE=gmake ./configure --enable-only64bit gmake Once I run "gmake" then the build runs for a while, and then it fails: gcc -m64 -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wcast-align -Wcast-qual -Wwrite-strings -Wempty-body -Wformat -Wformat-signedness -Wformat-security -Wignored-qualifiers -Wmissing-parameter-type -Wlogical-op -Wimplicit-fallthrough=2 -Wold-style-declaration -finline-functions -fno-stack-protector -fno-strict-aliasing -fno-builtin -O -g -fno-omit-frame-pointer -fno-strict-aliasing -fpic -fno-builtin -fno-ipa-icf -nodefaultlibs -shared -Wl,-z,interpose,-z,initfirst -Wl,-M,../solaris/vgpreload-solaris.mapfile -m64 -Wl,-soname -Wl,vgpreload_core.so.0 -o vgpreload_core-amd64-solaris.so vgpreload_core_amd64_solaris_so-vg_preloaded.o Undefined first referenced symbol in file __xpg4 ../solaris/vgpreload-solaris.mapfile (symbol scope specifies local binding) __xpg6 ../solaris/vgpreload-solaris.mapfile (symbol scope specifies local binding) ld: fatal: symbol referencing errors collect2: error: ld returned 1 exit status gmake[3]: *** [Makefile:3469: vgpreload_core-amd64-solaris.so] Error 1 gmake[3]: Leaving directory '/root/Downloads/valgrind-3.19.0/coregrind' gmake[2]: *** [Makefile:2475: all] Error 2 gmake[2]: Leaving directory '/root/Downloads/valgrind-3.19.0/coregrind' gmake[1]: *** [Makefile:896: all-recursive] Error 1 gmake[1]: Leaving directory '/root/Downloads/valgrind-3.19.0' gmake: *** [Makefile:759: all] Error 2 I have gcc version 7.3.0 and gmake version 4.2.1. Please let me know any other details I should share to help identify why my build is failing. Is it possible I am missing some headers or libs I need to install to build successfully? Is there an earlier release of Valgrind that is known to successfully compile on Solaris 11 that I could try building instead? Thanks, Shane P.S. I apologize if my email client inserts any HTML formatting, I don't know how to prevent this. |
From: Tomas V. <tv...@fu...> - 2022-09-04 19:17:41
|
On 9/4/22 13:18, Philippe Waroquiers wrote: > On Sun, 2022-09-04 at 00:14 +0200, Tomas Vondra wrote: >> >> Clearly, this is not an issue valgrind is meant to detect (like invalid >> memory access, etc.) but an application issue. I've tried reproducing it >> without valgrind, but it only ever happens with valgrind - my theory is >> it's some sort of race condition, and valgrind changes the timing in a >> way that makes it much more likely to hit. I need to analyze the core to >> inspect the state more closely, etc. >> >> Any ideas what I might be doing wrong? Or how do I load the core file? > > Rather than have the core dump and analyse it, you might interactively debug > your program under valgrind. > E.g. you might put a breakpoint on the assert or at some interesting points > before the assert. > > See https://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver > for more info. > I know, and I've used vgdbserver before. But sometimes that's not very practical for a number of reasons: 1) Our tests run mostly unattended, possibly even on CI machines that we don't have access to. And we don't want the machine to just sit there and wait for someone to debug it interactively, it's better to report the failure. Being able to inspect the core later would be helpful, though. 2) The error may be quite rare and/or hard to trigger - we regularly see race conditions that happen 1 in 1000 runs. True, I could automate that using a gdb script. 3) I'd bet it's not so simple in multi-process system that forks various processes that can trigger the issue. I'd have to do attach a gdb to each of those. 4) It's already pretty slow under valgrind, I'd bet it'll be even worse with gdb, but maybe it's not that bad. rpi4 is very constrained, though. 5) Race conditions are often very sensitive to change in timing. For example I've never seen this particular issue without valgrind. I can easily imagine gdb changing the timing just enough for the race condition to not happen. regards Tomas |
From: Philippe W. <phi...@sk...> - 2022-09-04 11:18:16
|
On Sun, 2022-09-04 at 00:14 +0200, Tomas Vondra wrote: > > Clearly, this is not an issue valgrind is meant to detect (like invalid > memory access, etc.) but an application issue. I've tried reproducing it > without valgrind, but it only ever happens with valgrind - my theory is > it's some sort of race condition, and valgrind changes the timing in a > way that makes it much more likely to hit. I need to analyze the core to > inspect the state more closely, etc. > > Any ideas what I might be doing wrong? Or how do I load the core file? Rather than have the core dump and analyse it, you might interactively debug your program under valgrind. E.g. you might put a breakpoint on the assert or at some interesting points before the assert. See https://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver for more info. Philippe |
From: John R. <jr...@bi...> - 2022-09-04 02:17:10
|
> Any ideas what I might be doing wrong? Or how do I load the core file? Why does use of valgrind cause programmers to forget general debugging technique? 1. Describe the environment completely. The report does not say which compilers and compiler versions were used, or if the compiler commands contained any directives about debugging format. Such information is necessary to help understand what might be happening with regard to debugging and tracebacks. 2. Get debugging information whenever invoking a compiler. Traceback lines such as "(+0x57a574)[0x682574]" which lack the name of a symbol or file, suggest that "-g" debugging info was not requested for *all* compilations. Start over ("make clean; rm -rf '*.[oa]'") then re-compile every source file, making be sure to specify "-g" and no variant of "-O" or "-On", except possibly "-O0". 3. Optimizing for speed comes after achieving correct execution. If 'inline' is used anywhere, then re-compile with the compile-time argument "-Dinline=/*empty*/" in order to #define 'inline' as a one-word comment. If the behavior of the program changes (any difference at all, excepting only slower execution), then there is a *design error* in the source code. Fix that first. 4. Walk before attempting to run. Did you try a simple example? Write a half-page program with 5 subroutines, each of which calls the next one, and the last one sends SIGABRT to the process. Does the .core file when run under valgrind give the correct traceback using gdb? 5. (Learn and) Use the built-in tools where possible. Run the process interactively, invoking valgrind with "--vgdb-error=0", and giving the debugger command "(gdb) continue" after establishing connectivity between vgdb and the process. See the valgrind manual, section 3.2.9 "vgdb command line options". When the SIGABRT happens, then vgdb will allow you to use all the ordinary gdb commands to get a backtrace, go up and down the stack, examine variables and other memory, run (gdb) info proc (gdb) shell cat /proc/$PID/maps to see exactly the layout of process memory, etc. There are also special commands to access valgrind functionality interactively, such as checking for memory leaks. |
From: Tomas V. <tv...@fu...> - 2022-09-03 22:38:50
|
Hi, I'm having some issues with analyzing cores generated from valgrind. I do get the core file, but when I try opening it in gdb it just shows some entirely bogus information / backtrace etc. This is a rpi4 machine, with 64-bit debian, running a local build of valgrind 3.19.0 (built from sources, not a package). This is how I run the program (postgres binary) valgrind --quiet --trace-children=yes --track-origins=yes \ --read-var-info=yes --num-callers=20 --leak-check=no \ --gen-suppressions=all --error-limit=no \ --log-file=/tmp/valgrind.543917.log postgres \ -D /home/debian/postgres /contrib/test_decoding/tmp_check_iso/data \ -F -c listen_addresses= -k /tmp/pg_regress-n7HodE I get a ~200MB core file in /tmp, which I try loading like this: gdb src/backend/postgres /tmp/valgrind.542299.log.core.542391 but all I get is this: Reading symbols from src/backend/postgres... [New LWP 542391] Cannot access memory at address 0xcc10cc00cbf0cc6 Cannot access memory at address 0xcc10cc00cbf0cbe Core was generated by `'. Program terminated with signal SIGABRT, Aborted. #0 0x00000000049d42ac in ?? () (gdb) bt #0 0x00000000049d42ac in ?? () #1 0x0000000000400000 in dshash_dump (hash_table=0x0) at dshash.c:782 #2 0x0000000000400000 in dshash_dump (hash_table=0x49c0e44) at dshash.c:782 #3 0x0000000000000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) So the stack might be corrupt, for some reason? The first part looks entirely bogus too, though. The file size seems about right - with 128MB shared buffers, 200MB might be about right. The core is triggered by an "assert" in the source, and we even log a backtrace into the log - and that seems much more plausible: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 902, PID: 536049) (ExceptionalCondition+0x98)[0x8f5cec] (+0x57a574)[0x682574] (+0x579edc)[0x681edc] (ReorderBufferAddNewTupleCids+0x60)[0x6864dc] (SnapBuildProcessNewCid+0x94)[0x68b6a4] (heap2_decode+0x17c)[0x671584] (LogicalDecodingProcessRecord+0xbc)[0x670cd0] (+0x570f88)[0x678f88] (pg_logical_slot_get_changes+0x1c)[0x6790fc] (ExecMakeTableFunctionResult+0x29c)[0x4a92c0] (+0x3be638)[0x4c6638] (+0x3a2c14)[0x4aac14] (ExecScan+0x8c)[0x4aaca8] (+0x3bea14)[0x4c6a14] (+0x39ea60)[0x4a6a60] (+0x392378)[0x49a378] (+0x39520c)[0x49d20c] (standard_ExecutorRun+0x214)[0x49aad8] (ExecutorRun+0x64)[0x49a8b8] (+0x62e2ac)[0x7362ac] (PortalRun+0x27c)[0x735f08] (+0x626be8)[0x72ebe8] (PostgresMain+0x9a0)[0x733e9c] (+0x547be8)[0x64fbe8] (+0x547540)[0x64f540] (+0x542d30)[0x64ad30] (PostmasterMain+0x1460)[0x64a574] (+0x418888)[0x520888] Clearly, this is not an issue valgrind is meant to detect (like invalid memory access, etc.) but an application issue. I've tried reproducing it without valgrind, but it only ever happens with valgrind - my theory is it's some sort of race condition, and valgrind changes the timing in a way that makes it much more likely to hit. I need to analyze the core to inspect the state more closely, etc. Any ideas what I might be doing wrong? Or how do I load the core file? thanks Tomas |
From: John R. <jr...@bi...> - 2022-09-03 11:42:31
|
> ==123254== HEAP SUMMARY: > ==123254== in use at exit: 0 bytes in 0 blocks > ==123254== total heap usage: 6 allocs, 6 frees, 2,084 bytes allocated "2,084 bytes allocated" is the sum of all 6 arguments that were passed to malloc(), calloc() [possibly by calling malloc()], realloc() [at least the increase], etc. |
From: jian he <jia...@gm...> - 2022-09-03 07:25:42
|
helloc$valgrind ./a.out ==123254== Memcheck, a memory error detector ==123254== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==123254== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==123254== Command: ./a.out ==123254== enter string (EOF) to quit): test1 enter string (EOF) to quit): test2 enter string (EOF) to quit): (all done) test1 test2 ==123254== ==123254== HEAP SUMMARY: ==123254== in use at exit: 0 bytes in 0 blocks ==123254== total heap usage: 6 allocs, 6 frees, 2,084 bytes allocated ==123254== ==123254== All heap blocks were freed -- no leaks are possible ==123254== ==123254== For lists of detected and suppressed errors, rerun with: -s ==123254== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ----------------------------------------------------------------------------------------------------------------- simple c program source code: #include<stdio.h> #include<string.h> #include<stdlib.h> #define MAXC 1024 int main(void){ char buf[MAXC],**arr = NULL; size_t nstr = 0; /* counter for number of string stored */ for(;;){ size_t len; /* var to hold length of string after n removeal */ fputs("enter string (EOF) to quit): ",stdout); if(!fgets(buf,MAXC,stdin)){ puts("(all done)\n"); break; } buf[len = strcspn(buf,"\r\n")] = 0; /*always realloc using temp pointer to avoid mem-leak on reallco failure*/ void *tmp = realloc(arr,(nstr+1) * sizeof *arr); if(!tmp){ perror("realloc-tmp"); break; } arr = tmp; if(!(arr[nstr] = malloc(len + 1))){ perror("malloc-arr[str]"); break; } memcpy(arr[nstr++], buf,len + 1); } for(size_t i = 0; i < nstr; i++){ puts(arr[i]); free(arr[i]); } free(arr); return 0; } --------------------------------------------- New to C, I am not sure the following: total heap usage: 6 allocs, 6 frees, 2,084 bytes allocated I guess 6 allocs is 3 times mallocs called plus 3 times puts function called? But I don't know where 2084 comes from. -- I recommend David Deutsch's <<The Beginning of Infinity>> Jian |
From: Tom H. <to...@co...> - 2022-09-01 06:05:59
|
On 01/09/2022 01:03, Bresalier, Rob (Nokia - US/Murray Hill) wrote: > Don't understand why strace log has exit(0) without the underscore, I know for a fact that it was with the underscore. Because exit() and _exit() are C library functions but both call the SYS_exit system call and that is what strace shows. The difference is that _exit doesn't run atexit() handlers or do any other cleanup before calling SYS_exit. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
From: Bresalier, R. (N. - US/M. Hill) <rob...@no...> - 2022-09-01 00:18:20
|
> Normally, if it is the OOM that kills a process, you should find a trace of this in the system logs. I looked in every system log I could find, there was no indication of OOM killing it in any system log. > I do not understand what you mean by reducing the nr of callers from 12 to 6. > What are these callers ? Is that some threads of the process you are running > under valgrind ? > I mean the --num-callers option core option to valgrind. By default this is 12, and I didn't specify it. I tried using --num-callers=6 to reduce memory consumption. From the valgrind manual this means " Specifies the maximum number of entries shown in stack traces that identify program locations.". By reducing it to 6 I was hoping to reduce valgrind memory consumption in case it really was OOM killer, which I really doubt now. > And just in case: are you using the last version of Valgrind ? Yes I used the last version of valgrind and many earlier versions. > You might use "strace" on valgrind to see what is going on at the time > _exit(0) is called. I did use 'strace' and dmesg. Neither indicated it was OOM killer. I did happen to save the strace log when the SIGKILL happened. Here is the part around the _exit(0): read(2040, "R", 1) = 1 gettid() = 3332 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[], ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP BUS FPE KILL SEGV STOP SYS], NULL, 8) = 0 gettid() = 3332 write(2041, "S", 1) = 1 exit(0) = ? +++ killed by SIGKILL +++ Don't understand why strace log has exit(0) without the underscore, I know for a fact that it was with the underscore. The strace log doesn't indicate anything special happening around the _exit(0). When I removed it the SIGKILL went away. > You might also start valgrind with some debug trace e.g. -d -d -d -d -v -v -v -v Was not aware of this and didn't try it. Don't have time to try it now. Regards, Rob |