You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(12) |
Oct
(16) |
Nov
(1) |
Dec
|
2024 |
Jan
(4) |
Feb
(3) |
Mar
(6) |
Apr
(17) |
May
(2) |
Jun
(33) |
Jul
(13) |
Aug
(1) |
Sep
(6) |
Oct
(8) |
Nov
(6) |
Dec
(15) |
2025 |
Jan
(5) |
Feb
(11) |
Mar
(8) |
Apr
(20) |
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Paul F. <pj...@wa...> - 2020-05-26 16:05:50
|
> Message du 26/05/20 13:19> De : "John Reiser" the ratio is about 1:50. So right away, that's a hardware slowdown of 4X.Maybe more. The machine has 12Mbyte of cache according to cpuinfo.> Valgrind runs every tool single-threaded. So if your app averages 5 active threads,> then that is a slowdown of 5X.I was running the application in single thread mode.> Valgrind's JIT (Just-In-Time) instruction emulator has a slowdown. Assume 10X (or measure nulgrind.)Yes, this is what I see with nulgrind, about a 11x factor slowdown. However this will also account for a large part of the cache overhead.> Finally we get to "useful work": the slowdown of the tool DHAT. Assume 3X.> So (4 * 5 * 10 * 3) is a slowdown of 600X, which turns 10 minutes into 100 hours.What I'm seeing is a DHAT-only slowdown that is much more than that.A+Paul |
From: Paul F. <pj...@wa...> - 2020-05-26 14:07:46
|
> That doesn't sound right. I use DHAT extensively and expect a slowdown of> perhaps 50:1, maybe less. What you're describing is a slowdown factor of> at least several thousand.>> Bear in mind though that (1) V sequentialises thread execution, which wil> make a big difference if the program is heavily multithreaded, and (2)> I suspect dhat's scheme of looking up all memory accesses in an AVL tree> (of malloc'd blocks) doesn't scale all that well if you have tens of> millions of blocks.>> Can you run it on a smaller workload? HiI'll try on something smaller and also get some info on the number of blocks of memory allocated.A+Paul |
From: John R. <jr...@bi...> - 2020-05-26 11:18:42
|
> The server has 48Gbytes of RAM and only about 6Gbytes is being used. > > The executable is quite big > > text data bss dec hex filename > 57369168 417156 20903108 78689432 4b0b498 [snip] > > The run under DHAT is using about 2Gbytes virtual and 1.5Gbytes resident according to htop. Running standalone those are about 750M and 350M respectively. Some hardware cache+memory delays: L1 hit 3 cycles ( 32KB size) L2 hit 11 cycles (256KB size) L3 hit 25 cycles (4MB to 40MB size) miss 180 cycles The dynamic RAM chips commonly used for main memory have stayed the same speed for over 25 years: 60 nanoseconds from CAS (Column Address Strobe) to DataOut. If the CPU runs at 3GHz, then a cache miss costs at least 180 cycles. A quick estimate of (largest and slowest) cache size is given by the "cache size" line from /proc/cpuinfo: ----- $ sed 9q < /proc/cpuinfo # on a 8-year old consumer-grade machine processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 42 model name : Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz stepping : 7 microcode : 0x2f cpu MHz : 1599.982 cache size : 6144 KB ----- Assume that your "big" server has 30MB of L3 cache. For a resident set size of 350MB then that is a ratio of about 1:12. For a resident set size of 1.5GB then the ratio is about 1:50. So right away, that's a hardware slowdown of 4X. Valgrind runs every tool single-threaded. So if your app averages 5 active threads, then that is a slowdown of 5X. Valgrind's JIT (Just-In-Time) instruction emulator has a slowdown. Assume 10X (or measure nulgrind.) Finally we get to "useful work": the slowdown of the tool DHAT. Assume 3X. So (4 * 5 * 10 * 3) is a slowdown of 600X, which turns 10 minutes into 100 hours. |
From: Paul F. <pj...@wa...> - 2020-05-26 10:05:58
|
On 5/26/20, Paul FLOYD wrote: > > I'm running DHAT on what I consider to be a relatively small example. Standalone the executable runs in a bit under 10 minutes. Based on the CPU time that we print after every 10% of progress, under DHAT the same executable is going to take about 422 hours - about two and a half weeks. > > > > Does anyone have any ideas what could be causing it to be so slow? > Please tell us about your environment. > (EVERY query should supply this information!) I just wanted to have an idea of the general expectation for DHAT overhead. > Which version of DHAT? Which hardware? Which operating system and version? > Which compiler and version built DHAT? Which compiler(s) built the program under test? Red Hat Enterprise Linux Workstation release 6.5 (Santiago) gcc (GCC) 5.3.0 for the test executable. gcc version 9.2.0 (GCC) for building Valgrind, built from git HEAD. The server has 48Gbytes of RAM and only about 6Gbytes is being used. The executable is quite big text data bss dec hex filename 57369168 417156 20903108 78689432 4b0b498 [snip] The run under DHAT is using about 2Gbytes virtual and 1.5Gbytes resident according to htop. Running standalone those are about 750M and 350M respectively. I'll have to find a much smaller test before I consider doing any profiling of DHAT. I'll also try nullgrind to see if it really is the DHAT instrumentation overhead that is slowing things down. A+ Paul |
From: Julian S. <js...@ac...> - 2020-05-26 09:38:25
|
That doesn't sound right. I use DHAT extensively and expect a slowdown of perhaps 50:1, maybe less. What you're describing is a slowdown factor of at least several thousand. Bear in mind though that (1) V sequentialises thread execution, which wil make a big difference if the program is heavily multithreaded, and (2) I suspect dhat's scheme of looking up all memory accesses in an AVL tree (of malloc'd blocks) doesn't scale all that well if you have tens of millions of blocks. Can you run it on a smaller workload? J On 26/05/2020 09:21, Paul FLOYD wrote: > Hi > > I'm running DHAT on what I consider to be a relatively small example. Standalone the executable runs in a bit under 10 minutes. Based on the CPU time that we print after every 10% of progress, under DHAT the same executable is going to take about 422 hours - about two and a half weeks. > > Does anyone have any ideas what could be causing it to be so slow? Indeed, is this the sort of slowdown that I should be expecting with DHAT? The executable is intensive in both memory and floating point. Probably not helping matters, the data structures that I want to look at are over 1kB in size so I tweaked the HISTOGRAM_SIZE_LIMIT to bump it up to 2kb. > > On the DHAT side, I have thought of trying to use some macro hackery to try to inline the avl comparator function calls. Otherwise I don't have much in the way of other ideas, and DHAT doesn't have any cli options to tweak things. > > A+ > Paul > > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
From: Tom H. <to...@co...> - 2020-05-26 09:23:37
|
That's correct as AVX512 is not currently supported in valgrind so you will need a version that doesn't use that for valgrind use. Progress on adding AVX512 support is being tracked here: https://bugs.kde.org/show_bug.cgi?id=383010 Tom On 26/05/2020 10:07, Patrick Bégou wrote: > Thanks all for these precisions. > I have deployed OpenMPI myself. So I have to build it again, disabling > AVX512 optimizations. This level of optimization is used in all our CFD > codes and libraries as it improve the global code performances. > > Patrick > > Le 26/05/2020 à 10:45, Tom Hughes a écrit : >> Sorry, I misunderstood what you meant by "as previous version" there. >> >> I thought you meant the previous version worked but you actually >> meant that it failed. >> >> As Julian says there is no easy fix - you have a library installed >> that has been compiled to assume certain instructions are available >> that are not in fact available under valgrind at the moment. >> >> Tom >> >> On 26/05/2020 09:26, Patrick Bégou wrote: >>> Hi Tom, >>> >>> I'm a new user of Valgrind. I was needing it to check a large mpi code. >>> So I downloaded 3.15 version but even if hardware and software are 2 to >>> 3 years old, valgrind does'nt work for me. >>> Nor gcc7, nor OpenMPI, nor my application (even the small test) used >>> specific option when they were built. >>> >>> If this unsupported instruction (I do not know what is an EVEX prefix, >>> sorry) is the problem, how can I avoid it to use valgrind ? >>> >>> I was just thinking that 3.16 could solve my problem.... >>> >>> Patrick >>> >>> >>> >>> Le 26/05/2020 à 10:12, Tom Hughes a écrit : >>>> On 26/05/2020 09:06, Patrick Bégou wrote: >>>> >>>>> valgrind-3.16.0.RC2 doesn't work for me (as previous version on this >>>>> server). >>>> >>>> Are you saying that it fails on a binary that worked before? >>>> >>>>> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F >>>>> 0x5 0x25 0xA8 0x18 0x0 >>>>> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >>>>> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >>>>> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 >>>>> ==306850== valgrind: Unrecognised instruction at address 0x6ddf581. >>>>> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F >>>>> 0x5 0x25 0xA8 0x18 0x0 >>>>> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >>>>> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >>>>> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 >>>> >>>> Because this is an instruction with an EVEX prefix that is not >>>> supported by any version of valgrind ever so I don't see how this >>>> binary can have worked with the previous version of valgrind. >>>> >>>> I suspect that you have in fact recompiled the program with >>>> a different compiler or different optimization settings since >>>> the time when it worked? >>>> >>>> Tom >>>> >>> >>> >>> >>> _______________________________________________ >>> Valgrind-users mailing list >>> Val...@li... >>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>> >> >> > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > -- Tom Hughes (to...@co...) http://compton.nu/ |
From: John R. <jr...@bi...> - 2020-05-26 09:13:57
|
On 5/26/20, Paul FLOYD wrote: > I'm running DHAT on what I consider to be a relatively small example. Standalone the executable runs in a bit under 10 minutes. Based on the CPU time that we print after every 10% of progress, under DHAT the same executable is going to take about 422 hours - about two and a half weeks. > > Does anyone have any ideas what could be causing it to be so slow? Please tell us about your environment. (EVERY query should supply this information!) Which version of DHAT? Which hardware? Which operating system and version? Which compiler and version built DHAT? Which compiler(s) built the program under test? How big is the program? On a Linux system, run "size ./my_app; ldd ./my_app" for static size, and "ps alx | grep my_app" for total VIRTual and RESident size. How much RAM is available for use? How much demand paging is occurring? There are standard performance tools that apply to EVERY executable program, not just DHAT. On a Linux system with a DHAT that was built with symbol information retained ("-g" command-line option to gcc or clang, and no symbol stripping during static binding by /usr/bin/ld or /usr/bin/strip), what does 'perf' say? perf record valgrind --tool=DHAT <<valgrind_args>> ./my_app <<my_app_args>> perf report > perf_output.txt A smart user would supply such information in the first post. |
From: Patrick B. <Pat...@le...> - 2020-05-26 09:07:54
|
Thanks all for these precisions. I have deployed OpenMPI myself. So I have to build it again, disabling AVX512 optimizations. This level of optimization is used in all our CFD codes and libraries as it improve the global code performances. Patrick Le 26/05/2020 à 10:45, Tom Hughes a écrit : > Sorry, I misunderstood what you meant by "as previous version" there. > > I thought you meant the previous version worked but you actually > meant that it failed. > > As Julian says there is no easy fix - you have a library installed > that has been compiled to assume certain instructions are available > that are not in fact available under valgrind at the moment. > > Tom > > On 26/05/2020 09:26, Patrick Bégou wrote: >> Hi Tom, >> >> I'm a new user of Valgrind. I was needing it to check a large mpi code. >> So I downloaded 3.15 version but even if hardware and software are 2 to >> 3 years old, valgrind does'nt work for me. >> Nor gcc7, nor OpenMPI, nor my application (even the small test) used >> specific option when they were built. >> >> If this unsupported instruction (I do not know what is an EVEX prefix, >> sorry) is the problem, how can I avoid it to use valgrind ? >> >> I was just thinking that 3.16 could solve my problem.... >> >> Patrick >> >> >> >> Le 26/05/2020 à 10:12, Tom Hughes a écrit : >>> On 26/05/2020 09:06, Patrick Bégou wrote: >>> >>>> valgrind-3.16.0.RC2 doesn't work for me (as previous version on this >>>> server). >>> >>> Are you saying that it fails on a binary that worked before? >>> >>>> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F >>>> 0x5 0x25 0xA8 0x18 0x0 >>>> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >>>> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >>>> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 >>>> ==306850== valgrind: Unrecognised instruction at address 0x6ddf581. >>>> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F >>>> 0x5 0x25 0xA8 0x18 0x0 >>>> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >>>> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >>>> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 >>> >>> Because this is an instruction with an EVEX prefix that is not >>> supported by any version of valgrind ever so I don't see how this >>> binary can have worked with the previous version of valgrind. >>> >>> I suspect that you have in fact recompiled the program with >>> a different compiler or different optimization settings since >>> the time when it worked? >>> >>> Tom >>> >> >> >> >> _______________________________________________ >> Valgrind-users mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-users >> > > |
From: Tom H. <to...@co...> - 2020-05-26 08:46:20
|
Sorry, I misunderstood what you meant by "as previous version" there. I thought you meant the previous version worked but you actually meant that it failed. As Julian says there is no easy fix - you have a library installed that has been compiled to assume certain instructions are available that are not in fact available under valgrind at the moment. Tom On 26/05/2020 09:26, Patrick Bégou wrote: > Hi Tom, > > I'm a new user of Valgrind. I was needing it to check a large mpi code. > So I downloaded 3.15 version but even if hardware and software are 2 to > 3 years old, valgrind does'nt work for me. > Nor gcc7, nor OpenMPI, nor my application (even the small test) used > specific option when they were built. > > If this unsupported instruction (I do not know what is an EVEX prefix, > sorry) is the problem, how can I avoid it to use valgrind ? > > I was just thinking that 3.16 could solve my problem.... > > Patrick > > > > Le 26/05/2020 à 10:12, Tom Hughes a écrit : >> On 26/05/2020 09:06, Patrick Bégou wrote: >> >>> valgrind-3.16.0.RC2 doesn't work for me (as previous version on this >>> server). >> >> Are you saying that it fails on a binary that worked before? >> >>> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F >>> 0x5 0x25 0xA8 0x18 0x0 >>> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >>> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >>> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 >>> ==306850== valgrind: Unrecognised instruction at address 0x6ddf581. >>> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F >>> 0x5 0x25 0xA8 0x18 0x0 >>> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >>> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >>> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 >> >> Because this is an instruction with an EVEX prefix that is not >> supported by any version of valgrind ever so I don't see how this >> binary can have worked with the previous version of valgrind. >> >> I suspect that you have in fact recompiled the program with >> a different compiler or different optimization settings since >> the time when it worked? >> >> Tom >> > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > -- Tom Hughes (to...@co...) http://compton.nu/ |
From: Julian S. <js...@ac...> - 2020-05-26 08:38:41
|
You can't easily avoid this problem, because it occurs in a system library, not in your own code: ==306851== valgrind: Unrecognised instruction at address 0x6ddf581. ==306851== at 0x6DDF581: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) Possibly your least-worst option is to talk with the people who built/ installed OpenMPI on the machine, to see if you can get a build that doesn't use AVX512 instructions. J |
From: Patrick B. <Pat...@le...> - 2020-05-26 08:26:26
|
Hi Tom, I'm a new user of Valgrind. I was needing it to check a large mpi code. So I downloaded 3.15 version but even if hardware and software are 2 to 3 years old, valgrind does'nt work for me. Nor gcc7, nor OpenMPI, nor my application (even the small test) used specific option when they were built. If this unsupported instruction (I do not know what is an EVEX prefix, sorry) is the problem, how can I avoid it to use valgrind ? I was just thinking that 3.16 could solve my problem.... Patrick Le 26/05/2020 à 10:12, Tom Hughes a écrit : > On 26/05/2020 09:06, Patrick Bégou wrote: > >> valgrind-3.16.0.RC2 doesn't work for me (as previous version on this >> server). > > Are you saying that it fails on a binary that worked before? > >> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F >> 0x5 0x25 0xA8 0x18 0x0 >> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 >> ==306850== valgrind: Unrecognised instruction at address 0x6ddf581. >> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F >> 0x5 0x25 0xA8 0x18 0x0 >> vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 >> vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE >> vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 > > Because this is an instruction with an EVEX prefix that is not > supported by any version of valgrind ever so I don't see how this > binary can have worked with the previous version of valgrind. > > I suspect that you have in fact recompiled the program with > a different compiler or different optimization settings since > the time when it worked? > > Tom > |
From: Tom H. <to...@co...> - 2020-05-26 08:12:57
|
On 26/05/2020 09:06, Patrick Bégou wrote: > valgrind-3.16.0.RC2 doesn't work for me (as previous version on this > server). Are you saying that it fails on a binary that worked before? > vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5 > 0x25 0xA8 0x18 0x0 > vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 > vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE > vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 > ==306850== valgrind: Unrecognised instruction at address 0x6ddf581. > vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5 > 0x25 0xA8 0x18 0x0 > vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 > vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE > vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 Because this is an instruction with an EVEX prefix that is not supported by any version of valgrind ever so I don't see how this binary can have worked with the previous version of valgrind. I suspect that you have in fact recompiled the program with a different compiler or different optimization settings since the time when it worked? Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
From: Patrick B. <Pat...@le...> - 2020-05-26 08:07:02
|
Le 19/05/2020 à 14:32, Julian Seward a écrit : > > Greetings. > > A first release candidate for 3.16.0 is available at > https://sourceware.org/pub/valgrind/valgrind-3.16.0.RC2.tar.bz2 > (md5 = 21ac87434ed32bcfe5ea86a0978440ba) > > Please give it a try on platforms that are important for you. If no > serious > issues are reported, the 3.16.0 final release will happen on 25 May, > that is, > next Monday. > > J Hi all, valgrind-3.16.0.RC2 doesn't work for me (as previous version on this server). _*My fortran test program (error prune I think) is as simple as:*_ PROGRAM reduce USE mpi IMPLICIT NONE INTEGER :: me, ncpus, ierr REAL :: buff, resu=0 CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD,me,ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD,ncpus,ierr) buff=1 CALL MPI_ALLREDUCE(buff,resu,1,MPI_REAL,MPI_SUM,MPI_COMM_WORLD,ierr) if (me == 0 ) WRITE(6,'(a,i0,2(a,f14.6))') 'On ',me,' I have ',buff,' and got ',resu CALL MPI_FINALIZE(ierr) END PROGRAM reduce _*Compilation with:*_ mpifort reduce.F90 -o reduce mpifort --show /opt/GCC73/bin/gfortran -I/opt/openmpi-GCC73/v3.1.x-20181010/include -pthread -I/opt/openmpi-GCC73/v3.1.x-20181010/lib -Wl,-rpath -Wl,/opt/openmpi-GCC73/v3.1.x-20181010/lib -Wl,--enable-new-dtags -L/opt/openmpi-GCC73/v3.1.x-20181010/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi _*OS is *_CentOS Linux release 7.7.1908 (Core) _*Valgrind compiled with gcc7.3, configure options are:*_ ./configure --enable-only64bit --with-mpicc=$(which mpicc) --prefix=/robin/data/begou/VALGRIND/valgrind-binaries *Hardware is:* Dell Poweredge R940 4 x Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz (laubnched Q3 2017) 1.5 TB of RAM _*Compiler*_ is gcc (GCC) 7.3.0 (january 2018) _*OpenMPI *_is v3.1. from the git repo 2018/10/10 (because a patch was needed at this time) compiled with gcc7.3.0 *configure options are:* --prefix=/opt/openmpi-GCC73/v3.1.x-20181010' '--enable-mpirun-prefix-by-default' '--disable-dlopen' '--enable-mca-no-build=openib' '- -without-verbs' '--enable-mpi-cxx' '--without-slurm' '--enable-mpi-thread-multiple _*Error is:*_ [begou@grivola TESTS]$valgrind --version valgrind-3.16.0.RC2 [begou@grivola TESTS]$mpirun -np 2 valgrind ./reduce ==306850== Memcheck, a memory error detector ==306850== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==306850== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for copyright info ==306850== Command: ./reduce ==306850== ==306851== Memcheck, a memory error detector ==306851== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==306851== Using Valgrind-3.16.0.RC2 and LibVEX; rerun with -h for copyright info ==306851== Command: ./reduce ==306851== vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5 0x25 0xA8 0x18 0x0 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==306850== valgrind: Unrecognised instruction at address 0x6ddf581. vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x5 0x25 0xA8 0x18 0x0 vex amd64->IR: REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0 vex amd64->IR: VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE vex amd64->IR: PFX.66=0 PFX.F2=0 PFX.F3=0 ==306851== valgrind: Unrecognised instruction at address 0x6ddf581. ==306851== at 0x6DDF581: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x6E01A78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x6DE3E39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x552ED60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306851== by 0x555F9ED: PMPI_Init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306851== by 0x52ABBB7: PMPI_INIT (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==306851== by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce) ==306851== by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce) ==306851== Your program just tried to execute an instruction that Valgrind ==306851== did not recognise. There are two possible reasons for this. ==306851== 1. Your program has a bug and erroneously jumped to a non-code ==306851== location. If you are running Memcheck and you just saw a ==306851== warning about a bad jump, it's probably your program's fault. ==306851== 2. The instruction is legitimate but Valgrind doesn't handle it, ==306851== i.e. it's Valgrind's fault. If you think this is the case or ==306851== you are not sure, please let us know and we'll try to fix it. ==306851== Either way, Valgrind will now raise a SIGILL signal which will ==306851== probably kill your program. ==306850== at 0x6DDF581: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x6E01A78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x6DE3E39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x552ED60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306850== by 0x555F9ED: PMPI_Init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306850== by 0x52ABBB7: PMPI_INIT (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==306850== by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce) ==306850== by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce) ==306850== Your program just tried to execute an instruction that Valgrind ==306850== did not recognise. There are two possible reasons for this. ==306850== 1. Your program has a bug and erroneously jumped to a non-code ==306850== location. If you are running Memcheck and you just saw a ==306850== warning about a bad jump, it's probably your program's fault. ==306850== 2. The instruction is legitimate but Valgrind doesn't handle it, ==306850== i.e. it's Valgrind's fault. If you think this is the case or ==306850== you are not sure, please let us know and we'll try to fix it. ==306850== Either way, Valgrind will now raise a SIGILL signal which will ==306850== probably kill your program. Program received signal SIGILL: Illegal instruction. Program received signal SIGILL: Illegal instruction. Backtrace for this error: Backtrace for this error: #0 0x66ce3af in ??? #0 0x66ce3af in ??? #1 0x6ddf581 in ??? #2 0x6e01a78 in ??? #3 0x6de3e39 in ??? #1 0x6ddf581 in ??? #4 0x552ed60 in ??? #5 0x555f9ed in ??? #6 0x52abbb7 in ??? #2 0x6e01a78 in ??? #7 0x400cdd in ??? #8 0x400e8c in ??? #9 0x66ba504 in ??? #10 0x400c18 in ??? #3 0x6de3e39 in ??? #11 0xffffffffffffffff in ??? #4 0x552ed60 in ??? #5 0x555f9ed in ??? #6 0x52abbb7 in ??? #7 0x400cdd in ??? #8 0x400e8c in ??? ==306851== ==306851== Process terminating with default action of signal 4 (SIGILL) #9 0x66ba504 in ??? ==306851== at 0x648B4BB: raise (in /usr/lib64/libpthread-2.17.so) ==306851== by 0x66CE3AF: ??? (in /usr/lib64/libc-2.17.so) ==306851== by 0x6DDF580: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x6E01A78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x6DE3E39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306851== by 0x552ED60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306851== by 0x555F9ED: PMPI_Init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306851== by 0x52ABBB7: PMPI_INIT (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==306851== by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce) ==306851== by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce) #10 0x400c18 in ??? #11 0xffffffffffffffff in ??? ==306850== ==306850== Process terminating with default action of signal 4 (SIGILL) ==306850== at 0x648B4BB: raise (in /usr/lib64/libpthread-2.17.so) ==306850== by 0x66CE3AF: ??? (in /usr/lib64/libc-2.17.so) ==306850== by 0x6DDF580: opal_pointer_array_construct (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x6E01A78: mca_base_var_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x6DE3E39: opal_init_util (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libopen-pal.so.40.10.3) ==306850== by 0x552ED60: ompi_mpi_init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306850== by 0x555F9ED: PMPI_Init (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi.so.40.10.3) ==306850== by 0x52ABBB7: PMPI_INIT (in /opt/openmpi-GCC73/v3.1.x-20181010/lib/libmpi_mpifh.so.40.11.2) ==306850== by 0x400CDD: MAIN__ (in /HA/sources/begou/TESTS/reduce) ==306850== by 0x400E8C: main (in /HA/sources/begou/TESTS/reduce) ==306851== ==306851== HEAP SUMMARY: ==306851== in use at exit: 8,830 bytes in 65 blocks ==306851== total heap usage: 123 allocs, 58 frees, 90,778 bytes allocated ==306851== ==306850== ==306850== HEAP SUMMARY: ==306850== in use at exit: 8,830 bytes in 65 blocks ==306850== total heap usage: 123 allocs, 58 frees, 90,778 bytes allocated ==306850== ==306851== LEAK SUMMARY: ==306851== definitely lost: 0 bytes in 0 blocks ==306851== indirectly lost: 0 bytes in 0 blocks ==306851== possibly lost: 0 bytes in 0 blocks ==306851== still reachable: 8,830 bytes in 65 blocks ==306851== suppressed: 0 bytes in 0 blocks ==306851== Rerun with --leak-check=full to see details of leaked memory ==306851== ==306851== For lists of detected and suppressed errors, rerun with: -s ==306851== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==306850== LEAK SUMMARY: ==306850== definitely lost: 0 bytes in 0 blocks ==306850== indirectly lost: 0 bytes in 0 blocks ==306850== possibly lost: 0 bytes in 0 blocks ==306850== still reachable: 8,830 bytes in 65 blocks ==306850== suppressed: 0 bytes in 0 blocks ==306850== Rerun with --leak-check=full to see details of leaked memory ==306850== ==306850== For lists of detected and suppressed errors, rerun with: -s ==306850== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node grivola exited on signal 4 (Illegal instruction). Thanks Patrick |
From: Paul F. <pj...@wa...> - 2020-05-26 07:21:15
|
Hi I'm running DHAT on what I consider to be a relatively small example. Standalone the executable runs in a bit under 10 minutes. Based on the CPU time that we print after every 10% of progress, under DHAT the same executable is going to take about 422 hours - about two and a half weeks. Does anyone have any ideas what could be causing it to be so slow? Indeed, is this the sort of slowdown that I should be expecting with DHAT? The executable is intensive in both memory and floating point. Probably not helping matters, the data structures that I want to look at are over 1kB in size so I tweaked the HISTOGRAM_SIZE_LIMIT to bump it up to 2kb. On the DHAT side, I have thought of trying to use some macro hackery to try to inline the avl comparator function calls. Otherwise I don't have much in the way of other ideas, and DHAT doesn't have any cli options to tweak things. A+ Paul |
From: John R. <jr...@bi...> - 2020-05-22 14:10:15
|
> What would help a lot is to have a VALGRIND request, like > VALGRIND_DO_CLIENT_REQUEST_STMT, that we could use in our signal > handler to turn off leak checking. In valgrind-3.16 (to be released in 3 days; RC2 available today) see valgrind --help-dyn-options |
From: Philippe W. <phi...@sk...> - 2020-05-22 13:47:06
|
On Fri, 2020-05-22 at 15:22 +0300, Michael Widenius wrote: > Hi! > I have searched documentation, internet and header files like > memcheck.h, but not found a solution: > > When running the MariaDB test suite under valgrind, we sometimes may > get a core dump. In this case, the leaked memory report can be very > long and will be totally useless. > > What would help a lot is to have a VALGRIND request, like > VALGRIND_DO_CLIENT_REQUEST_STMT, that we could use in our signal > handler to turn off leak checking. > > Is that possible and if not, is that something that could get > implemented in the future? > Is this something that anyone else has ever requested ? > > Regards, > Monty Hello, The next version of valgrind is almost ready (Release Candidate was produced a few days ago). This release contains a feature to dynamically change many options. You can obtain the list of dynamically changeable options doing: valgrind --help-dyn-options For memcheck, this gives the below help. Based on this, you should be able to obtain what you need. Hope this helps Philippe valgrind --help-dyn-options Some command line settings are "dynamic", meaning they can be changed while Valgrind is running, like this: From the shell, using vgdb. Example: $ vgdb "v.clo --trace-children=yes --child-silent-after-fork=no" From a gdb attached to the valgrind gdbserver. Example: (gdb) monitor v.clo --trace-children=yes --child-silent-after-fork=no" From your program, using a client request. Example: #include <valgrind/valgrind.h> VALGRIND_CLO_CHANGE("--trace-children=yes"); VALGRIND_CLO_CHANGE("--child-silent-after-fork=no"); dynamically changeable options: -v --verbose -q --quiet -d --stats --vgdb=no --vgdb=yes --vgdb=full --vgdb-poll --vgdb-error --vgdb-stop-at --error-markers --show-error-list -s --show-below-main --time-stamp --trace-children --child-silent-after-fork --trace-sched --trace-signals --trace-symtab --trace-cfi --debug-dump=syms --debug-dump=line --debug-dump=frames --trace-redir --trace-syscalls --sym-offsets --progress-interval --merge-recursive-frames --vex-iropt-verbosity --suppressions --trace-flags --trace-notbelow --trace-notabove --profile-flags --gen-suppressions=no --gen-suppressions=yes --gen-suppressions=all --errors-for-leak-kinds --show-leak-kinds --leak-check-heuristics --show-reachable --show-possibly-lost --freelist-vol --freelist-big-blocks --leak-check=no --leak-check=summary --leak-check=yes --leak-check=full --ignore-ranges --ignore-range-below-sp --show-mismatched-frees valgrind: Use --help for more information. |
From: Derrick M. <der...@gm...> - 2020-05-22 13:21:56
|
I am not too familiar with memcheck, but there are client requests to enable/disable checks within a memory range using VG_USERREQ__ENABLE_ADDR_ERROR_REPORTING_IN_RANGE and VG_USERREQ__DISABLE_ADDR_ERROR_REPORTING_IN_RANGE. On Fri, May 22, 2020 at 8:24 AM Michael Widenius <mic...@gm...> wrote: > > Hi! > I have searched documentation, internet and header files like > memcheck.h, but not found a solution: > > When running the MariaDB test suite under valgrind, we sometimes may > get a core dump. In this case, the leaked memory report can be very > long and will be totally useless. > > What would help a lot is to have a VALGRIND request, like > VALGRIND_DO_CLIENT_REQUEST_STMT, that we could use in our signal > handler to turn off leak checking. > > Is that possible and if not, is that something that could get > implemented in the future? > Is this something that anyone else has ever requested ? > > Regards, > Monty > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users -- Derrick McKee Phone: (703) 957-9362 Email: der...@gm... |
From: Michael W. <mic...@gm...> - 2020-05-22 12:22:49
|
Hi! I have searched documentation, internet and header files like memcheck.h, but not found a solution: When running the MariaDB test suite under valgrind, we sometimes may get a core dump. In this case, the leaked memory report can be very long and will be totally useless. What would help a lot is to have a VALGRIND request, like VALGRIND_DO_CLIENT_REQUEST_STMT, that we could use in our signal handler to turn off leak checking. Is that possible and if not, is that something that could get implemented in the future? Is this something that anyone else has ever requested ? Regards, Monty |
From: James R. <jam...@gm...> - 2020-05-20 16:31:13
|
On Wed, May 20, 2020 at 5:04 PM Tom Hughes <to...@co...> wrote: > On 20/05/2020 17:01, James Read wrote: > > > > > > On Wed, May 20, 2020 at 2:31 PM Tom Hughes <to...@co... > > <mailto:to...@co...>> wrote: > > > > On 20/05/2020 14:23, James Read wrote: > > > > > I'm trying to use valgrind to track down a memory leak in my web > > > crawling application. The problem is my application runs just fine > > > without valgrind but when I run it under valgrind the program > > crashes > > > before it has a chance to crawl any websites. Any ideas why this > > > behaviour could happen? > > > > On the basis of the information supplied I'd say it was > > caused by excess neutron flux in the discombobulator. > > > > Seriously, if you want anybody to actually try and answer > > your question then you'll have to provide some actual > > information like, what exactly it says... > > > > > > A typical run of my program gives the following output: > > > > Redis server: :0 > > Mongo server: 127.0.0.1:27017 <http://127.0.0.1:27017> > > URL file: links/links-2 > > Max connections: 1000 > > Selected JUST CRAWLER MODE > > > > Parsed sites: 132 ^C > > Crawler thread exiting. > > Exiting. > > > > But with valgrind ./crawler -c I get the following output: > > > > ==415433== Memcheck, a memory error detector > > ==415433== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et > al. > > ==415433== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright > > info > > ==415433== Command: ./crawler -c > > ==415433== > > Redis server: :0 > > Mongo server: 127.0.0.1:27017 <http://127.0.0.1:27017> > > URL file: links/links-2 > > Max connections: 1000 > > Selected JUST CRAWLER MODE > > ==415433== Warning: ignored attempt to set SIGKILL handler in > sigaction(); > > ==415433== the SIGKILL signal is uncatchable > > setrlimit() failed > > ==415433== > > ==415433== HEAP SUMMARY: > > ==415433== in use at exit: 37,773 bytes in 92 blocks > > ==415433== total heap usage: 6,112 allocs, 6,020 frees, 460,106 bytes > > allocated > > ==415433== > > ==415433== LEAK SUMMARY: > > ==415433== definitely lost: 0 bytes in 0 blocks > > ==415433== indirectly lost: 0 bytes in 0 blocks > > ==415433== possibly lost: 0 bytes in 0 blocks > > ==415433== still reachable: 37,773 bytes in 92 blocks > > ==415433== suppressed: 0 bytes in 0 blocks > > ==415433== Rerun with --leak-check=full to see details of leaked memory > > ==415433== > > ==415433== For lists of detected and suppressed errors, rerun with: -s > > ==415433== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) > > > > As you can see no Parsed sites: value message just crashes and burns. > > I don't see any crash there, just a program that has chosen to exit. > > Does your code exit when setrlimit fails? What sort of limit is it > trying to set? My guess is that it's trying to play with RLIMIT_NOFILE > in a way that would encroach on valgrind's reserved descriptors so > valgrind is refusing the request and your program is then chooding > to exist rather than continue. > > Good guess. Will comment out that code and try to run without. James Read > Tom > > -- > Tom Hughes (to...@co...) > http://compton.nu/ > |
From: Tom H. <to...@co...> - 2020-05-20 16:05:12
|
On 20/05/2020 17:01, James Read wrote: > > > On Wed, May 20, 2020 at 2:31 PM Tom Hughes <to...@co... > <mailto:to...@co...>> wrote: > > On 20/05/2020 14:23, James Read wrote: > > > I'm trying to use valgrind to track down a memory leak in my web > > crawling application. The problem is my application runs just fine > > without valgrind but when I run it under valgrind the program > crashes > > before it has a chance to crawl any websites. Any ideas why this > > behaviour could happen? > > On the basis of the information supplied I'd say it was > caused by excess neutron flux in the discombobulator. > > Seriously, if you want anybody to actually try and answer > your question then you'll have to provide some actual > information like, what exactly it says... > > > A typical run of my program gives the following output: > > Redis server: :0 > Mongo server: 127.0.0.1:27017 <http://127.0.0.1:27017> > URL file: links/links-2 > Max connections: 1000 > Selected JUST CRAWLER MODE > > Parsed sites: 132 ^C > Crawler thread exiting. > Exiting. > > But with valgrind ./crawler -c I get the following output: > > ==415433== Memcheck, a memory error detector > ==415433== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==415433== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright > info > ==415433== Command: ./crawler -c > ==415433== > Redis server: :0 > Mongo server: 127.0.0.1:27017 <http://127.0.0.1:27017> > URL file: links/links-2 > Max connections: 1000 > Selected JUST CRAWLER MODE > ==415433== Warning: ignored attempt to set SIGKILL handler in sigaction(); > ==415433== the SIGKILL signal is uncatchable > setrlimit() failed > ==415433== > ==415433== HEAP SUMMARY: > ==415433== in use at exit: 37,773 bytes in 92 blocks > ==415433== total heap usage: 6,112 allocs, 6,020 frees, 460,106 bytes > allocated > ==415433== > ==415433== LEAK SUMMARY: > ==415433== definitely lost: 0 bytes in 0 blocks > ==415433== indirectly lost: 0 bytes in 0 blocks > ==415433== possibly lost: 0 bytes in 0 blocks > ==415433== still reachable: 37,773 bytes in 92 blocks > ==415433== suppressed: 0 bytes in 0 blocks > ==415433== Rerun with --leak-check=full to see details of leaked memory > ==415433== > ==415433== For lists of detected and suppressed errors, rerun with: -s > ==415433== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) > > As you can see no Parsed sites: value message just crashes and burns. I don't see any crash there, just a program that has chosen to exit. Does your code exit when setrlimit fails? What sort of limit is it trying to set? My guess is that it's trying to play with RLIMIT_NOFILE in a way that would encroach on valgrind's reserved descriptors so valgrind is refusing the request and your program is then chooding to exist rather than continue. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
From: James R. <jam...@gm...> - 2020-05-20 16:01:41
|
On Wed, May 20, 2020 at 2:31 PM Tom Hughes <to...@co...> wrote: > On 20/05/2020 14:23, James Read wrote: > > > I'm trying to use valgrind to track down a memory leak in my web > > crawling application. The problem is my application runs just fine > > without valgrind but when I run it under valgrind the program crashes > > before it has a chance to crawl any websites. Any ideas why this > > behaviour could happen? > > On the basis of the information supplied I'd say it was > caused by excess neutron flux in the discombobulator. > > Seriously, if you want anybody to actually try and answer > your question then you'll have to provide some actual > information like, what exactly it says... > > A typical run of my program gives the following output: Redis server: :0 Mongo server: 127.0.0.1:27017 URL file: links/links-2 Max connections: 1000 Selected JUST CRAWLER MODE Parsed sites: 132 ^C Crawler thread exiting. Exiting. But with valgrind ./crawler -c I get the following output: ==415433== Memcheck, a memory error detector ==415433== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==415433== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==415433== Command: ./crawler -c ==415433== Redis server: :0 Mongo server: 127.0.0.1:27017 URL file: links/links-2 Max connections: 1000 Selected JUST CRAWLER MODE ==415433== Warning: ignored attempt to set SIGKILL handler in sigaction(); ==415433== the SIGKILL signal is uncatchable setrlimit() failed ==415433== ==415433== HEAP SUMMARY: ==415433== in use at exit: 37,773 bytes in 92 blocks ==415433== total heap usage: 6,112 allocs, 6,020 frees, 460,106 bytes allocated ==415433== ==415433== LEAK SUMMARY: ==415433== definitely lost: 0 bytes in 0 blocks ==415433== indirectly lost: 0 bytes in 0 blocks ==415433== possibly lost: 0 bytes in 0 blocks ==415433== still reachable: 37,773 bytes in 92 blocks ==415433== suppressed: 0 bytes in 0 blocks ==415433== Rerun with --leak-check=full to see details of leaked memory ==415433== ==415433== For lists of detected and suppressed errors, rerun with: -s ==415433== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) As you can see no Parsed sites: value message just crashes and burns. James Read > Tom > > -- > Tom Hughes (to...@co...) > http://compton.nu/ > |
From: Tom H. <to...@co...> - 2020-05-20 13:32:08
|
On 20/05/2020 14:23, James Read wrote: > I'm trying to use valgrind to track down a memory leak in my web > crawling application. The problem is my application runs just fine > without valgrind but when I run it under valgrind the program crashes > before it has a chance to crawl any websites. Any ideas why this > behaviour could happen? On the basis of the information supplied I'd say it was caused by excess neutron flux in the discombobulator. Seriously, if you want anybody to actually try and answer your question then you'll have to provide some actual information like, what exactly it says... Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
From: James R. <jam...@gm...> - 2020-05-20 13:24:17
|
Hi, I'm trying to use valgrind to track down a memory leak in my web crawling application. The problem is my application runs just fine without valgrind but when I run it under valgrind the program crashes before it has a chance to crawl any websites. Any ideas why this behaviour could happen? thanks James Read |
From: Mark W. <ma...@kl...> - 2020-05-20 10:33:56
|
Hi, On Tue, 2020-05-19 at 14:32 +0200, Julian Seward wrote: > A first release candidate for 3.16.0 is available at > https://sourceware.org/pub/valgrind/valgrind-3.16.0.RC2.tar.bz2 > (md5 = 21ac87434ed32bcfe5ea86a0978440ba) > > Please give it a try on platforms that are important for you. If no serious > issues are reported, the 3.16.0 final release will happen on 25 May, that is, > next Monday. Looks good! In case people want binaries to test, I made Fedora Rawhide test packages: (aarch64, armv7hl, i686, ppc64le, s390x, x86_64) https://bodhi.fedoraproject.org/updates/FEDORA-2020-31868cd970 And packages for stable Fedora and CentOS releases: https://copr.fedorainfracloud.org/coprs/mjw/valgrind-3.16.0/ (Epel for CentOS 7, aarch64 and x86_64 Epel for CentOS 8, aarch64 and x86_64 Fedora 30, aarch64, i386 and x86_64 Fedora 31, aarch64 and x86_64 Fedora 32, aarch64 and x86_64) Note to packagers who run make check and/or make regtest (which is optional, but recommended to check the sanity of the binaries), this now validates the docbookx xml documentation, so you'll need to have xmllint (from libxml2) and the docbook-dtds catalog installed. Cheers, Mark |
From: Julian S. <js...@ac...> - 2020-05-19 12:32:41
|
Greetings. A first release candidate for 3.16.0 is available at https://sourceware.org/pub/valgrind/valgrind-3.16.0.RC2.tar.bz2 (md5 = 21ac87434ed32bcfe5ea86a0978440ba) Please give it a try on platforms that are important for you. If no serious issues are reported, the 3.16.0 final release will happen on 25 May, that is, next Monday. J |