You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
(58) |
Apr
(261) |
May
(169) |
Jun
(214) |
Jul
(201) |
Aug
(219) |
Sep
(198) |
Oct
(203) |
Nov
(241) |
Dec
(94) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(137) |
Feb
(149) |
Mar
(150) |
Apr
(193) |
May
(95) |
Jun
(173) |
Jul
(137) |
Aug
(236) |
Sep
(157) |
Oct
(150) |
Nov
(136) |
Dec
(90) |
2005 |
Jan
(139) |
Feb
(130) |
Mar
(274) |
Apr
(138) |
May
(184) |
Jun
(152) |
Jul
(261) |
Aug
(409) |
Sep
(239) |
Oct
(241) |
Nov
(260) |
Dec
(137) |
2006 |
Jan
(191) |
Feb
(142) |
Mar
(169) |
Apr
(75) |
May
(141) |
Jun
(169) |
Jul
(131) |
Aug
(141) |
Sep
(192) |
Oct
(176) |
Nov
(142) |
Dec
(95) |
2007 |
Jan
(98) |
Feb
(120) |
Mar
(93) |
Apr
(96) |
May
(95) |
Jun
(65) |
Jul
(62) |
Aug
(56) |
Sep
(53) |
Oct
(95) |
Nov
(106) |
Dec
(87) |
2008 |
Jan
(58) |
Feb
(149) |
Mar
(175) |
Apr
(110) |
May
(106) |
Jun
(72) |
Jul
(55) |
Aug
(89) |
Sep
(26) |
Oct
(96) |
Nov
(83) |
Dec
(93) |
2009 |
Jan
(97) |
Feb
(106) |
Mar
(74) |
Apr
(64) |
May
(115) |
Jun
(83) |
Jul
(137) |
Aug
(103) |
Sep
(56) |
Oct
(59) |
Nov
(61) |
Dec
(37) |
2010 |
Jan
(94) |
Feb
(71) |
Mar
(53) |
Apr
(105) |
May
(79) |
Jun
(111) |
Jul
(110) |
Aug
(81) |
Sep
(50) |
Oct
(82) |
Nov
(49) |
Dec
(21) |
2011 |
Jan
(87) |
Feb
(105) |
Mar
(108) |
Apr
(99) |
May
(91) |
Jun
(94) |
Jul
(114) |
Aug
(77) |
Sep
(58) |
Oct
(58) |
Nov
(131) |
Dec
(62) |
2012 |
Jan
(76) |
Feb
(93) |
Mar
(68) |
Apr
(95) |
May
(62) |
Jun
(109) |
Jul
(90) |
Aug
(87) |
Sep
(49) |
Oct
(54) |
Nov
(66) |
Dec
(84) |
2013 |
Jan
(67) |
Feb
(52) |
Mar
(93) |
Apr
(65) |
May
(33) |
Jun
(34) |
Jul
(52) |
Aug
(42) |
Sep
(52) |
Oct
(48) |
Nov
(66) |
Dec
(14) |
2014 |
Jan
(66) |
Feb
(51) |
Mar
(34) |
Apr
(47) |
May
(58) |
Jun
(27) |
Jul
(52) |
Aug
(41) |
Sep
(78) |
Oct
(30) |
Nov
(28) |
Dec
(26) |
2015 |
Jan
(41) |
Feb
(42) |
Mar
(20) |
Apr
(73) |
May
(31) |
Jun
(48) |
Jul
(23) |
Aug
(55) |
Sep
(36) |
Oct
(47) |
Nov
(48) |
Dec
(41) |
2016 |
Jan
(32) |
Feb
(34) |
Mar
(33) |
Apr
(22) |
May
(14) |
Jun
(31) |
Jul
(29) |
Aug
(41) |
Sep
(17) |
Oct
(27) |
Nov
(38) |
Dec
(28) |
2017 |
Jan
(28) |
Feb
(30) |
Mar
(16) |
Apr
(9) |
May
(27) |
Jun
(57) |
Jul
(28) |
Aug
(43) |
Sep
(31) |
Oct
(20) |
Nov
(24) |
Dec
(18) |
2018 |
Jan
(34) |
Feb
(50) |
Mar
(18) |
Apr
(26) |
May
(13) |
Jun
(31) |
Jul
(13) |
Aug
(11) |
Sep
(15) |
Oct
(12) |
Nov
(18) |
Dec
(13) |
2019 |
Jan
(12) |
Feb
(29) |
Mar
(51) |
Apr
(22) |
May
(13) |
Jun
(20) |
Jul
(13) |
Aug
(12) |
Sep
(21) |
Oct
(6) |
Nov
(9) |
Dec
(5) |
2020 |
Jan
(13) |
Feb
(5) |
Mar
(25) |
Apr
(4) |
May
(40) |
Jun
(27) |
Jul
(5) |
Aug
(17) |
Sep
(21) |
Oct
(1) |
Nov
(5) |
Dec
(15) |
2021 |
Jan
(28) |
Feb
(6) |
Mar
(11) |
Apr
(5) |
May
(7) |
Jun
(8) |
Jul
(5) |
Aug
(5) |
Sep
(11) |
Oct
(9) |
Nov
(10) |
Dec
(12) |
2022 |
Jan
(7) |
Feb
(13) |
Mar
(8) |
Apr
(7) |
May
(12) |
Jun
(27) |
Jul
(14) |
Aug
(27) |
Sep
(27) |
Oct
(17) |
Nov
(17) |
Dec
|
2023 |
Jan
(10) |
Feb
(18) |
Mar
(9) |
Apr
(26) |
May
|
Jun
(13) |
Jul
(18) |
Aug
(5) |
Sep
(12) |
Oct
(16) |
Nov
(1) |
Dec
|
2024 |
Jan
(4) |
Feb
(3) |
Mar
(6) |
Apr
(17) |
May
(2) |
Jun
(33) |
Jul
(13) |
Aug
(1) |
Sep
(6) |
Oct
(8) |
Nov
(6) |
Dec
(15) |
2025 |
Jan
(5) |
Feb
(11) |
Mar
(8) |
Apr
(20) |
May
(1) |
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
From: ISHIKAWA,chiaki <ish...@yk...> - 2022-05-20 15:00:38
|
Dear Paul, On 2022/05/20 16:58, Floyd, Paul wrote: > Hi Chiaki > > Debugging redirection issues isn't normally too slow. Redirection is > done when Valgrind loads the guest executable and libraries. > > Run Valgrind with --trace-redir=yes and you should see Valgrind > printing what it finds in > > * ld.so, the link loader > * the client executable > * the valgrind tool > * the valgrind shared lib preloads (core and tool) > * any client shared libraries > > libc falls under the last category, though there are a small number of > C functions in the link loader (memcpy, strcmp etc). > > You should see things like > > --830-- ld-linux-x86-64.so.2 strcmp RL-> (2016.0) 0x040343b0 > --830-- libc.so* __strcmp_sse42 RL-> (2016.0) > 0x04034370 > --830-- libc.so* __strcmp_sse2 RL-> (2016.0) > 0x04034330 > --830-- libc.so* __GI_strcmp RL-> (2016.0) > 0x040342f0 > > If you don't see any symbols being redirected then you have a problem. > > > A+ > > Paul I collected the version number info and have been running TB test suite under valgrind since this morning. That was before I read this e-mail. I will give the version number below first and see if I can run valgrind to obtain the redirection information. (The thing is the already running valgrind+thunderbird is stretching my 16GB memory linux image and I am not sure if I can start another instance of valgrind+thunderbird, or I need to bite the bullet and cancel the current run. I am afraid that the test takes close to a full day...) Anyway, let me first send this version info, and I will check to see if I can obtain the redirection info easily. Obviously, I don't seem to have the redirected symbol for strncpy in the trace. That is for sure. I do see redirection for malloc. 279:13.66 GECKO(392456) ==392459== at 0x483F7B5: malloc (vg_replace_malloc.c:381) --- version info --- Hi, Before I can figure out how to create a short reproducer, here is the version info I collected. [] Debian Version ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ uname -a Linux ip030 5.17.0-1-amd64 #1 SMP PREEMPT Debian 5.17.3-1 (2022-04-18) x86_64 GNU/Linux [gcc-10] Used compiler. I just re-compiled the source tree using this compiler and still get the same error (trace attached at the end.) Maybe I should use a newer version, but thunderbird mail client heavily relies on mozilla source code, and newer version may encounter a compiler issues (warning or worse). ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ gcc-10 --version gcc-10 (Debian 10.3.0-15) 10.3.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [glibc-a] As for glibc: I was not sure how to check for the version, but here it is. ldd --version and running libc.so as a program was something I never realized we could (!) ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ ldd /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird linux-vdso.so.1 (0x00007fffa31ae000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4403b64000) /lib64/ld-linux-x86-64.so.2 (0x00007f4403d5d000) [glibc-b] ldd --version reports: ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ ldd --version ldd (Debian GLIBC 2.33-7) 2.33 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper. [glibc-c] I did not realize that we can "run" GLIBC libc.so file this way to obtain glibc version number. The above info all points to Debian GLIBC 2.33-7 ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ /lib/x86_64-linux-gnu/libc.so.6 GNU C Library (Debian GLIBC 2.33-7) release release version 2.33. Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 10.3.0. libc ABIs: UNIQUE IFUNC ABSOLUTE For bug reporting instructions, please see: <http://www.debian.org/Bugs/>. [] Version of valgrind: valgrind --version valgrind-3.18.1 (Well, I was quite upset when I initially realized I was using valgrind-3.18.0.GIT which I installed last September, but I then verified that the bug appears with the current release, too.) [Source code] mozilla comm-central source version version is: I have added a few local mods but they don't touch the affected version. changeset: 35764:90328ce5bee2 tag: qparent fxtree: comm user: John Bieling <jo...@th...> date: Wed May 18 13:13:33 2022 +0300 summary: Bug 1732554 - Make GenericSendMessage async. r=mkmelin changeset: 35763:74a4091d1c27 [Source code] mozilla mozilla-central source version is: Again, I have added several local patches, but I did not touch the affected part (famous last words). changeset: 618071:b113470be0ad tag: qparent fxtree: central user: Mozilla Releng Treescript <rel...@mo...> date: Wed May 18 19:04:24 2022 +0000 summary: no bug - Bumping Firefox l10n changesets r=release a=l10n-bump DONTBUILD Yes thunderbird mail client uses both the browser-derived code (M-C), and mail-specific code (C-C). [] The full traceback of the reported problem. I have been running valgrind+thunderbird for quite some time since this morning. I obtained the last stack trace from basically the same errors repeated during the test suite.. (Wait, I see "279:13.65 GECKO(392456) ==392459== by 0x488D2D3: dlopen@@GLIBC_2.2.5 (dlopen.c:87)" Version 2.2.5 is not the same as the version reported for glibc. Hmm? ) 279:13.65 GECKO(392456) ==392459== Invalid read of size 8 279:13.65 GECKO(392456) ==392459== at 0x4021BF4: strncmp (strcmp.S:175) 279:13.65 GECKO(392456) ==392459== by 0x400655D: is_dst (dl-load.c:214) 279:13.65 GECKO(392456) ==392459== by 0x400771E: _dl_dst_substitute (dl-load.c:293) 279:13.65 GECKO(392456) ==392459== by 0x40079C7: fillin_rpath.isra.0 (dl-load.c:465) 279:13.65 GECKO(392456) ==392459== by 0x4007CC2: decompose_rpath (dl-load.c:636) 279:13.65 GECKO(392456) ==392459== by 0x4009E9D: cache_rpath (dl-load.c:678) 279:13.65 GECKO(392456) ==392459== by 0x4009E9D: cache_rpath (dl-load.c:659) 279:13.65 GECKO(392456) ==392459== by 0x4009E9D: _dl_map_object (dl-load.c:2174) 279:13.65 GECKO(392456) ==392459== by 0x400E4B0: openaux (dl-deps.c:64) 279:13.65 GECKO(392456) ==392459== by 0x4C0362F: _dl_catch_exception (dl-error-skeleton.c:208) 279:13.65 GECKO(392456) ==392459== by 0x400E838: _dl_map_object_deps (dl-deps.c:248) 279:13.65 GECKO(392456) ==392459== by 0x40140DF: dl_open_worker (dl-open.c:584) 279:13.65 GECKO(392456) ==392459== by 0x4C0362F: _dl_catch_exception (dl-error-skeleton.c:208) 279:13.65 GECKO(392456) ==392459== by 0x4013BF9: _dl_open (dl-open.c:858) 279:13.65 GECKO(392456) ==392459== by 0x488D247: dlopen_doit (dlopen.c:66) 279:13.65 GECKO(392456) ==392459== by 0x4C0362F: _dl_catch_exception (dl-error-skeleton.c:208) 279:13.65 GECKO(392456) ==392459== by 0x4C036EE: _dl_catch_error (dl-error-skeleton.c:227) 279:13.65 GECKO(392456) ==392459== by 0x488DA58: _dlerror_run (dlerror.c:171) 279:13.65 GECKO(392456) ==392459== by 0x488D2D3: dlopen@@GLIBC_2.2.5 (dlopen.c:87) 279:13.65 GECKO(392456) ==392459== by 0x29995389: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.65 GECKO(392456) ==392459== by 0x29995478: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.65 GECKO(392456) ==392459== by 0x299744D0: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.65 GECKO(392456) ==392459== by 0x29975DDE: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.65 GECKO(392456) ==392459== by 0x2997C347: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.65 GECKO(392456) ==392459== by 0x29978C20: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.65 GECKO(392456) ==392459== by 0x9A29B23: fQueryServerString (GLXLibrary.h:131) 279:13.65 GECKO(392456) ==392459== by 0x9A29B23: mozilla::gl::GLXLibrary::EnsureInitialized(_XDisplay*) (GLContextProviderGLX.cpp:188) 279:13.65 GECKO(392456) ==392459== by 0x9A29DCC: mozilla::gl::GLXLibrary::SupportsVideoSync(_XDisplay*) (GLContextProviderGLX.cpp:237) 279:13.65 GECKO(392456) ==392459== by 0x9D1350C: gfxPlatformGtk::CreateGlobalHardwareVsyncSource() (gfxPlatformGtk.cpp:982) 279:13.65 GECKO(392456) ==392459== by 0x9D02ABF: gfxPlatform::GetGlobalHardwareVsyncSource() (gfxPlatform.cpp:2997) 279:13.65 GECKO(392456) ==392459== by 0x9D100AC: gfxPlatform::Init() (gfxPlatform.cpp:937) 279:13.65 GECKO(392456) ==392459== by 0x9D105AF: gfxPlatform::GetPlatform() (gfxPlatform.cpp:466) 279:13.65 GECKO(392456) ==392459== by 0xD07EF73: mozilla::widget::GfxInfoBase::GetContentBackend(nsTSubstring<char16_t>&) (GfxInfoBase.cpp:1871) 279:13.65 GECKO(392456) ==392459== by 0x8BBB4B5: ??? (xptcinvoke_asm_x86_64_unix.S:101) 279:13.65 GECKO(392456) ==392459== by 0x96F0C2C: Invoke (XPCWrappedNative.cpp:1626) 279:13.65 GECKO(392456) ==392459== by 0x96F0C2C: CallMethodHelper::Call() (XPCWrappedNative.cpp:1179) 279:13.65 GECKO(392456) ==392459== by 0x96F1225: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (XPCWrappedNative.cpp:1125) 279:13.65 GECKO(392456) ==392459== by 0x970AE31: GetAttribute (xpcprivate.h:1470) 279:13.65 GECKO(392456) ==392459== by 0x970AE31: XPC_WN_GetterSetter(JSContext*, unsigned int, JS::Value*) (XPCWrappedNativeJSOps.cpp:1003) 279:13.65 GECKO(392456) ==392459== by 0x1048D4DB: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) (Interpreter.cpp:420) 279:13.65 GECKO(392456) ==392459== by 0x104A0E8C: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (Interpreter.cpp:507) 279:13.65 GECKO(392456) ==392459== by 0x104A1676: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (Interpreter.cpp:574) 279:13.65 GECKO(392456) ==392459== by 0x104A188C: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) (Interpreter.cpp:605) 279:13.65 GECKO(392456) ==392459== by 0x104A1C96: js::CallGetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) (Interpreter.cpp:731) 279:13.65 GECKO(392456) ==392459== by 0xF5FB359: CallGetter(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, js::PropertyInfoBase<unsigned int>, JS::MutableHandle<JS::Value>) (NativeObject.cpp:1983) 279:13.65 GECKO(392456) ==392459== by 0xF5FB674: bool GetExistingProperty<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, js::PropertyInfoBase<unsigned int>, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) (NativeObject.cpp:2011) 279:13.65 GECKO(392456) ==392459== by 0xF609DB6: bool NativeGetPropertyInline<(js::AllowGC)1>(JSContext*, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, IsNameLookup, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) (NativeObject.cpp:2153) 279:13.65 GECKO(392456) ==392459== by 0xF60A448: js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::Value>) (NativeObject.cpp:2184) 279:13.65 GECKO(392456) ==392459== by 0xF4000DE: js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::Value>) (ObjectOperations-inl.h:120) 279:13.65 GECKO(392456) ==392459== by 0x1048B9BB: js::GetObjectElementOperation(JSContext*, JSOp, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) (Interpreter-inl.h:412) 279:13.66 GECKO(392456) ==392459== by 0x10490F5E: js::GetElementOperationWithStackIndex(JSContext*, JS::Handle<JS::Value>, int, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) (Interpreter-inl.h:509) 279:13.66 GECKO(392456) ==392459== by 0x1049900D: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:3119) 279:13.66 GECKO(392456) ==392459== by 0x104A064B: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:389) 279:13.66 GECKO(392456) ==392459== by 0x104A1385: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (Interpreter.cpp:539) 279:13.66 GECKO(392456) ==392459== by 0x104A1676: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (Interpreter.cpp:574) 279:13.66 GECKO(392456) ==392459== Address 0x29f24919 is 9 bytes inside a block of size 15 alloc'd 279:13.66 GECKO(392456) ==392459== at 0x483F7B5: malloc (vg_replace_malloc.c:381) 279:13.66 GECKO(392456) ==392459== by 0x402074B: malloc (rtld-malloc.h:56) 279:13.66 GECKO(392456) ==392459== by 0x402074B: strdup (strdup.c:42) 279:13.66 GECKO(392456) ==392459== by 0x4007C54: decompose_rpath (dl-load.c:611) 279:13.66 GECKO(392456) ==392459== by 0x4009E9D: cache_rpath (dl-load.c:678) 279:13.66 GECKO(392456) ==392459== by 0x4009E9D: cache_rpath (dl-load.c:659) 279:13.66 GECKO(392456) ==392459== by 0x4009E9D: _dl_map_object (dl-load.c:2174) 279:13.66 GECKO(392456) ==392459== by 0x400E4B0: openaux (dl-deps.c:64) 279:13.66 GECKO(392456) ==392459== by 0x4C0362F: _dl_catch_exception (dl-error-skeleton.c:208) 279:13.66 GECKO(392456) ==392459== by 0x400E838: _dl_map_object_deps (dl-deps.c:248) 279:13.66 GECKO(392456) ==392459== by 0x40140DF: dl_open_worker (dl-open.c:584) 279:13.66 GECKO(392456) ==392459== by 0x4C0362F: _dl_catch_exception (dl-error-skeleton.c:208) 279:13.66 GECKO(392456) ==392459== by 0x4013BF9: _dl_open (dl-open.c:858) 279:13.66 GECKO(392456) ==392459== by 0x488D247: dlopen_doit (dlopen.c:66) 279:13.66 GECKO(392456) ==392459== by 0x4C0362F: _dl_catch_exception (dl-error-skeleton.c:208) 279:13.66 GECKO(392456) ==392459== by 0x4C036EE: _dl_catch_error (dl-error-skeleton.c:227) 279:13.66 GECKO(392456) ==392459== by 0x488DA58: _dlerror_run (dlerror.c:171) 279:13.66 GECKO(392456) ==392459== by 0x488D2D3: dlopen@@GLIBC_2.2.5 (dlopen.c:87) 279:13.66 GECKO(392456) ==392459== by 0x29995389: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.66 GECKO(392456) ==392459== by 0x29995478: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.66 GECKO(392456) ==392459== by 0x299744D0: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.66 GECKO(392456) ==392459== by 0x29975DDE: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.66 GECKO(392456) ==392459== by 0x2997C347: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.66 GECKO(392456) ==392459== by 0x29978C20: ??? (in /usr/lib/x86_64-linux-gnu/libGLX_mesa.so.0.0.0) 279:13.66 GECKO(392456) ==392459== by 0x9A29B23: fQueryServerString (GLXLibrary.h:131) 279:13.66 GECKO(392456) ==392459== by 0x9A29B23: mozilla::gl::GLXLibrary::EnsureInitialized(_XDisplay*) (GLContextProviderGLX.cpp:188) 279:13.66 GECKO(392456) ==392459== by 0x9A29DCC: mozilla::gl::GLXLibrary::SupportsVideoSync(_XDisplay*) (GLContextProviderGLX.cpp:237) 279:13.66 GECKO(392456) ==392459== by 0x9D1350C: gfxPlatformGtk::CreateGlobalHardwareVsyncSource() (gfxPlatformGtk.cpp:982) 279:13.66 GECKO(392456) ==392459== by 0x9D02ABF: gfxPlatform::GetGlobalHardwareVsyncSource() (gfxPlatform.cpp:2997) 279:13.66 GECKO(392456) ==392459== by 0x9D100AC: gfxPlatform::Init() (gfxPlatform.cpp:937) 279:13.66 GECKO(392456) ==392459== by 0x9D105AF: gfxPlatform::GetPlatform() (gfxPlatform.cpp:466) 279:13.66 GECKO(392456) ==392459== by 0xD07EF73: mozilla::widget::GfxInfoBase::GetContentBackend(nsTSubstring<char16_t>&) (GfxInfoBase.cpp:1871) 279:13.66 GECKO(392456) ==392459== by 0x8BBB4B5: ??? (xptcinvoke_asm_x86_64_unix.S:101) 279:13.66 GECKO(392456) ==392459== by 0x96F0C2C: Invoke (XPCWrappedNative.cpp:1626) 279:13.66 GECKO(392456) ==392459== by 0x96F0C2C: CallMethodHelper::Call() (XPCWrappedNative.cpp:1179) 279:13.66 GECKO(392456) ==392459== by 0x96F1225: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (XPCWrappedNative.cpp:1125) 279:13.66 GECKO(392456) ==392459== by 0x970AE31: GetAttribute (xpcprivate.h:1470) 279:13.66 GECKO(392456) ==392459== by 0x970AE31: XPC_WN_GetterSetter(JSContext*, unsigned int, JS::Value*) (XPCWrappedNativeJSOps.cpp:1003) 279:13.66 GECKO(392456) ==392459== by 0x1048D4DB: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) (Interpreter.cpp:420) 279:13.66 GECKO(392456) ==392459== by 0x104A0E8C: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (Interpreter.cpp:507) 279:13.66 GECKO(392456) ==392459== by 0x104A1676: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (Interpreter.cpp:574) 279:13.66 GECKO(392456) ==392459== by 0x104A188C: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) (Interpreter.cpp:605) 279:13.66 GECKO(392456) ==392459== by 0x104A1C96: js::CallGetter(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) (Interpreter.cpp:731) 279:13.66 GECKO(392456) ==392459== by 0xF5FB359: CallGetter(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, js::PropertyInfoBase<unsigned int>, JS::MutableHandle<JS::Value>) (NativeObject.cpp:1983) 279:13.66 GECKO(392456) ==392459== by 0xF5FB674: bool GetExistingProperty<(js::AllowGC)1>(JSContext*, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, js::PropertyInfoBase<unsigned int>, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) (NativeObject.cpp:2011) 279:13.66 GECKO(392456) ==392459== by 0xF609DB6: bool NativeGetPropertyInline<(js::AllowGC)1>(JSContext*, js::MaybeRooted<js::NativeObject*, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::Value, (js::AllowGC)1>::HandleType, js::MaybeRooted<JS::PropertyKey, (js::AllowGC)1>::HandleType, IsNameLookup, js::MaybeRooted<JS::Value, (js::AllowGC)1>::MutableHandleType) (NativeObject.cpp:2153) 279:13.67 GECKO(392456) ==392459== by 0xF60A448: js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::Value>) (NativeObject.cpp:2184) 279:13.67 GECKO(392456) ==392459== by 0xF4000DE: js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::Value>) (ObjectOperations-inl.h:120) 279:13.67 GECKO(392456) ==392459== by 0x1048B9BB: js::GetObjectElementOperation(JSContext*, JSOp, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) (Interpreter-inl.h:412) 279:13.67 GECKO(392456) ==392459== by 0x10490F5E: js::GetElementOperationWithStackIndex(JSContext*, JS::Handle<JS::Value>, int, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) (Interpreter-inl.h:509) 279:13.67 GECKO(392456) ==392459== by 0x1049900D: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:3119) 279:13.67 GECKO(392456) ==392459== by 0x104A064B: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:389) 279:13.67 GECKO(392456) ==392459== by 0x104A1385: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (Interpreter.cpp:539) 279:13.67 GECKO(392456) ==392459== by 0x104A1676: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) (Interpreter.cpp:574) 279:13.67 GECKO(392456) ==392459== by 0x104A188C: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) (Interpreter.cpp:605) 279:13.67 GECKO(392456) ==392459== by 0xF497661: JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) (CallAndConstruct.cpp:53) [end of memo] |
From: Floyd, P. <pj...@wa...> - 2022-05-20 07:58:21
|
Hi Chiaki Debugging redirection issues isn't normally too slow. Redirection is done when Valgrind loads the guest executable and libraries. Run Valgrind with --trace-redir=yes and you should see Valgrind printing what it finds in * ld.so, the link loader * the client executable * the valgrind tool * the valgrind shared lib preloads (core and tool) * any client shared libraries libc falls under the last category, though there are a small number of C functions in the link loader (memcpy, strcmp etc). You should see things like --830-- ld-linux-x86-64.so.2 strcmp RL-> (2016.0) 0x040343b0 --830-- libc.so* __strcmp_sse42 RL-> (2016.0) 0x04034370 --830-- libc.so* __strcmp_sse2 RL-> (2016.0) 0x04034330 --830-- libc.so* __GI_strcmp RL-> (2016.0) 0x040342f0 If you don't see any symbols being redirected then you have a problem. A+ Paul |
From: ISHIKAWA,chiaki <ish...@yk...> - 2022-05-20 07:02:08
|
Dear Paul, Thank you for your e-mail and the lucid explanation. I am sorry that I could not write to you earlier. There was something wrong with my PC hardware and it took me quite a while to re-install many software products I regularly use. I will try to create a short sample. (The whole thunderbird software is a gigantic problem.) But it may be difficult since the source code is large and if the compiler's code generation is history-sensitive, the problem may not be easy to re-create. I will also check on the versions of tools that was used when the problem was noticed. Let me have a couple of hours to check the versions. BTW, now I vaguely recall that there was an issue with DL-library released many years ago by Debian regarding the symbols for strcpy and friends. I can't recall the details now, but in that instance, the lack of proper debug symbols made the re-direction difficult(?) If my hazy memory is correct, the today's case may be influenced by a similar issue, but I better collect the versions so that someone in the know can experiment on their ends. Back then, I think I created a wrapper that introduces the symbols for strcmp and friends. But that was many years ago. TIA Chiaki PS: For those curious enough to know the hardware issue, I wanted to replace my Ryzen 1700 CPU with 16MB of L3 cache with Ryzen 3700x with 32GB of cache, solely because I learned that larger the cache, the valgrind running big program like thunderbird mail client would fare better. After a few years of use of 1700, I suspect the CPU is the limiting factor. I like it.: it uses much less power than many other modern CPUs. So it runs cool, and the PC is very silent without noisy fans. Unfortunately, when I replaced the CPUs after carefully checking BIOS version, etc. to make sure the CPU would run on the motherboard (yes, it did. It runs linux without an issue at all,), somehow Windows 10 Pro hosting my virtualbox running linux did not boot any more after the replacement and trashed my boot environment. Aargh. In the end, I figured it was faulty AMD SATA driver which got installed maybe in the last couple of years when I installed AMD's chip driver. It did not cause a problem for Ryzen 1700 for the last few years, but with 3700x, the boot fails due to it. After the boot failure, even the safe mode fails to boot. Ugh. I had to re-install windows and so had to re-install many applications and such that I use for work and hobby. Oh, such is life. But I am a happy camper now with the second hand Ryzen 3700x and hope to run and find more of these valgrind issues of TB soon. The whole build time from scratch got shorted from abot 90+ minutes to 60+ minutes. Not bad. I have yet to figure out the shortening of TB's test suite execution time. I am hampered with strange errors that I did not notice a few months before. Maybe these are newly introduced errors, including the one I reported, and I am analyzing whether I can simply suppress them or investigate in detail. On 2022/05/11 16:54, Floyd, Paul wrote: > Hi > > Can you give us > > the source of the small reproducer > > the versions of Valgrind, Debian, GCC and glibc? > > As you mention, functions like strncmp are often optimized to work on > multiple bytes at a time and to take advantage of the fact that memory > will always be allocated in a multiple of say 8 or 16 bytes. And what > happens sometime is that a function like strncmp will be replaced by > the compiler with something like __strncmp_avx128 or something like > that. If Valgrind doesn't recognize this it can't redirect it and do > error checking on it. > > I would expect that the error message contain the name of the Valgrind > redirect, for instance > > ==22489== at 0x4033B7C: __strncmp_sse42 (vg_replace_strmem.c:712) > > Si it seems to me that you have a redirection problem. For some reason > Valgrind is not seeing your strncmp when the client libc gets loaded > into memory. > > > A+ > > Paul > > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
From: Floyd, P. <pj...@wa...> - 2022-05-11 07:54:22
|
Hi Can you give us the source of the small reproducer the versions of Valgrind, Debian, GCC and glibc? As you mention, functions like strncmp are often optimized to work on multiple bytes at a time and to take advantage of the fact that memory will always be allocated in a multiple of say 8 or 16 bytes. And what happens sometime is that a function like strncmp will be replaced by the compiler with something like __strncmp_avx128 or something like that. If Valgrind doesn't recognize this it can't redirect it and do error checking on it. I would expect that the error message contain the name of the Valgrind redirect, for instance ==22489== at 0x4033B7C: __strncmp_sse42 (vg_replace_strmem.c:712) Si it seems to me that you have a redirection problem. For some reason Valgrind is not seeing your strncmp when the client libc gets loaded into memory. A+ Paul |
From: ISHIKAWA,chiaki <ish...@yk...> - 2022-05-11 03:10:24
|
Hi, I have been analyzing thunderbird mail client under valgrind for sometime. memcheck has been so useful for me to find memory-related errors. Thank you for releasing this great tool. Recently, I noticed an invalid read of 8 bytes warning, which should be familiar to all of us. Interestingly, the initial part of the stack trace is found in a report in Qt bug database. It comes from dynamic loading library support. https://bugreports.qt.io/browse/QTBUG-90374 It was filed last year. My system is Debian GNU/Linux and I used gcc to compile thunderbird. The report was done by someone who uses clang. I believe the issue lies in a certain version of dl-library, glibc OR valgrind? The reason I say valgrind might be to blame, too, is as follows. (Debian is known to release toolchains very conservatively. I think that is why I did not see this issue last year.) Actually, mine has line numbers slight off due to version differences I suspect. 143:39.43 GECKO(115765) ==115769== Invalid read of size 8 143:39.64 GECKO(115765) ==115769== at 0x4021BF4: strncmp (strcmp.S:175) 143:39.64 GECKO(115765) ==115769== by 0x400655D: is_dst (dl-load.c:214) 143:39.64 GECKO(115765) ==115769== by 0x4007666: _dl_dst_count (dl-load.c:251) 143:39.64 GECKO(115765) ==115769== by 0x4007857: expand_dynamic_string_token (dl-load.c:393) 143:39.64 GECKO(115765) ==115769== by 0x40079C7: fillin_rpath.isra.0 (dl-load.c:465) 143:39.68 GECKO(115765) ==115769== by 0x4007CC2: decompose_rpath (dl-load.c:636) 143:39.68 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath (dl-load.c:678) 143:39.68 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath (dl-load.c:659) ... [omitted] ... My local valgrind dump tells me where the address was allocated. 143:40.60 GECKO(115765) ==115769== Address 0x27ba3819 is 9 bytes inside a block of size 15 alloc'd 143:40.65 GECKO(115765) ==115769== at 0x483CF9B: malloc (vg_replace_malloc.c:380) 143:40.65 GECKO(115765) ==115769== by 0x402074B: malloc (rtld-malloc.h:56) 143:40.65 GECKO(115765) ==115769== by 0x402074B: strdup (strdup.c:42) 143:40.65 GECKO(115765) ==115769== by 0x4007C54: decompose_rpath (dl-load.c:611) 143:40.65 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath (dl-load.c:678) 143:40.65 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath (dl-load.c:659) 143:40.65 GECKO(115765) ==115769== by 0x4009E9D: _dl_map_object (dl-load.c:2174) 143:40.65 GECKO(115765) ==115769== by 0x400E4B0: openaux (dl-deps.c:64) ... [omission] ... I *think* this is a valid error case of large-sized READ used in strncmp reading beyond the allocated memory boundary. (strcmp.S shows 8 octets read instead of one octet at a time.) I think such a usage of strdup/str{n}cmp combination is abound in C source codes. So I thought maybe valgrind was reporting something different. Otherwise, many application programs have to create suppression for this type of issue. That is what I thought initially. A different type of error I thought initially was, say, for example, 9 bytes inside a block of size 15 might mean somehow the data contains uninitialized data in the string area in that position. However, come to think of it, if so, strdup would have triggered a valgrind warning before this. There is no warning from valgrind for strdup. Also, I created a test program and realized that in that case, valgrind prints ==120076== Conditional jump or move depends on uninitialised value(s) ==120076== at 0x4843172: strncmp (vg_replace_strmem.c:663) ==120076== by 0x108778: main (in /home/ishikawa/Dropbox/TB-DIR/a.out) So the original problem must be the read beyond malloc'ed area boundary. Now, is dl-library to blame? I think dl-library has been used literally hundreds of million times or more daily and is hard to think that there is a bug there. (Famous last word). Dl-library does not have control how long each path strings are (I think it is trying to record the path components of a loading path), and thus cannot control valgrind messages generated due to 8-char read going beyond the malloced memory end. (So probably people have to create suppression after all. If the particular version has this issue.) As for valgrind, can valgrind be somehow more intelligent in this case? Maybe creating a substitute strcmp? (I know single char comparison at a time would be slower than comparing 8 characters at a time when appropriate). But at least, this type of surprise warning would be reduced. However, we may have a problem here for glibc.. If this read beyond the malloced region is for real, we have a problem. I have no idea how this behavior is constrained or sanctioned by C standard, C library standard or POSIX standard, but the use of 8 octets strcmp.S can lead to a real issue possibly unless malloc() does allocate memory chunks in 8 or larger unit uniformly. Unless glibc makes sure that there is a guard area between malloc area and the end of user virtual space. I have an experience where a bitblt-like CPU instruction expected us to create a bitmap with a horizontal bit length of multiple of 16 (or 8?). even if the really used screen size is less than that. So we had to round it up to the multiple of 16 (or 8?). I got a bit stingy on memory use and once created a bitmap data with the raster line not appended with this extra octets to make its length a multiple of 16 (or 8). Kaboom. I created this memory area using the C runtime library of the CPU/computer maker's OS. When the CPU bitblt-like instruction accessed the last raster line data, it fetches data 16 (or 8?) octets at a time and at the end, it accessed beyond the malloced area. And it was BEYOND the allocated user memory space by the OS. (The access of 8-byte read for intermediate ratlines ended up reading the next allocated rasterline area, and so it was OK.) So the program crashed due to memory violation. It took me a couple of weeks to figure this out since bitblt-like instruction did not offer any clue regarding where the address violation occurred. Also, only one of the screen bitmaps created thusly was at the end of user virtual space and it was difficult to realize why the instruction crashed seemingly in random manner when it handled other bitmaps without a problem. The CPU vendor intended to use the instruction only for the main display screen of its work station and in that case, the memory is preallocated in neat 16-multiple horizontal. I tried to use the bitblt-like instruction for arbitrary use-defined virtual screen. So my message here is that there *can* be a grave consequence of this malloc and reading larger than originally assumed chunk behavior, but I am not sure where to report this and alert the developers. Yeah, if malloc() allocates 8 or 16 byte chunks always, it should be OK [and we are better off it is built this way due to some standard, glibc manifest, or whatever published document which won't change overnight.] Even in this age of PC users having GBs of memory, I hate to think of programs which allocates memory using 3 or 4 octet length... Chiaki |
From: John R. <jr...@bi...> - 2022-04-21 17:06:18
|
On 4/20/22 05:18, Yang Zhong wrote: > The AMX is the NEW feature in Intel new platform and from host, we can > find below cpu flags: > > amx_bf16, amx_tile, amx_int8 > > The SPEC can be found in: > https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf > > The issue I mentioned should be related with AMX features missed in > valgrind emulated CPU. If someone will implement this feature on valgrind, > I can help verify. Thanks! If you really want to help, then start today by collecting and/or writing actual code that emulates the hardware that implements the feature. Collect (or find, or write) the code from Chapter 3, "INTEL® AMX INSTRUCTION SET REFERENCE, A-Z", of that .pdf. Create actual subroutines and data declarations, and *test* it against your apps. Put the code into a public repository such as GitHub. The top-level function should be something like unsigned char const *emulate_amx( // returns next instruction pointer unsigned char const *ip, // pointer to first byte of instruction stream unsigned long *general_registers[16], // hardware state unsigned long long *zmm_registers[16], // zmm (ymm, xmm) registers struct Xsave *xsave_area, // tile registers etc. ... } which if successful returns a pointer to the next instruction, else an error code which is the negative of a small positive integer. Such code will go a long way towards getting AMX supported by valgrind, because it will enable valgrind-developers to focus on implementing valgrind instead of on finding, de-ciphering, and mentally interpreting documentation. |
From: Tom H. <to...@co...> - 2022-04-20 12:45:03
|
On 20/04/2022 13:41, Tom Hughes via Valgrind-users wrote: > Again until we know what "AMX features" are it's impossible to comment > in any detail. So apparently AMX is this: https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions So not only is it new instructions, it is new two dimensional registers so it's likely to be a huge task to add support. I think we're still trying to get the AVX512 support merged so that might give you some idea of the timelines on this sort of change. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
From: Tom H. <to...@co...> - 2022-04-20 12:42:01
|
On 20/04/2022 13:18, Yang Zhong wrote: > On Wed, Apr 20, 2022 at 09:37:17AM +0100, Tom Hughes wrote: >> On 20/04/2022 09:01, Yang Zhong wrote: >> >>> So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible >>> with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't >>> read correct register value. thanks! >> >> When running under valgrind you are running on an emulated CPU not >> the real CPU and the results of cpuid will reflect the capabilities >> of that emulated CPU rather than the real CPU. >> >> Do the bits that you are trying to check reflect something (like new >> instructions) that valgrind will need to be concerned about? >> > > Thanks Tom for your quickly response! > > The AMX is the NEW feature in Intel new platform and from host, we can > find below cpu flags: > > amx_bf16, amx_tile, amx_int8 That tells me nothing. > The SPEC can be found in: > https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf No I'm not going to spend my day digging through thousands of pages of the latest instruction set reference trying to figure out what exactly this feature is... > The issue I mentioned should be related with AMX features missed in > valgrind emulated CPU. If someone will implement this feature on valgrind, > I can help verify. Thanks! Again until we know what "AMX features" are it's impossible to comment in any detail. If AMX features involved new instructions then yes it will definitely need somebody to do the work to add support for them. Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
From: Yang Z. <yan...@in...> - 2022-04-20 12:34:35
|
On Wed, Apr 20, 2022 at 09:37:17AM +0100, Tom Hughes wrote: > On 20/04/2022 09:01, Yang Zhong wrote: > > >So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible > >with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't > >read correct register value. thanks! > > When running under valgrind you are running on an emulated CPU not > the real CPU and the results of cpuid will reflect the capabilities > of that emulated CPU rather than the real CPU. > > Do the bits that you are trying to check reflect something (like new > instructions) that valgrind will need to be concerned about? > Thanks Tom for your quickly response! The AMX is the NEW feature in Intel new platform and from host, we can find below cpu flags: amx_bf16, amx_tile, amx_int8 The SPEC can be found in: https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf The issue I mentioned should be related with AMX features missed in valgrind emulated CPU. If someone will implement this feature on valgrind, I can help verify. Thanks! Yang > Tom > > -- > Tom Hughes (to...@co...) > http://compton.nu/ |
From: Tom H. <to...@co...> - 2022-04-20 09:31:07
|
On 20/04/2022 09:01, Yang Zhong wrote: > So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible > with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't > read correct register value. thanks! When running under valgrind you are running on an emulated CPU not the real CPU and the results of cpuid will reflect the capabilities of that emulated CPU rather than the real CPU. Do the bits that you are trying to check reflect something (like new instructions) that valgrind will need to be concerned about? Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
From: Yang Z. <yan...@in...> - 2022-04-20 08:18:22
|
Hello all, Recently our QAs used valgrind tool(valgrind-3.20.0.GIT) to verify memory leak with Qemu release on Intel new Sapphire Rapids platform. They only focused on Sapphire Rapids' new features which are merged into Linux and Qemu release. #The command /usr/local/bin/valgrind --log-file=/root/valgrind.log --leak-check=full -v \ ./qemu-system-x86_64 \ ...... #Qemu will report below issue with valgrind tool: qemu-system-x86_64: warning: prctl(ARCH_REQ_XCOMP_GUEST_PERM) failure for feature bit 18 qemu-system-x86_64: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Operation not permitted W/o valgrind tool, the AMX can work normally in latest Linux and Qemu release. I checked the Qemu code, below syscall will fail(rc=-1) with valgrind tool int rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_GUEST_PERM, bit); Notice: bit=18, which is AMX feature in xstate area. I also use same valgrind command to check amx selftool on Sapphire Rapids platform. /usr/local/bin/valgrind --log-file=/root/valgrind.log --leak-check=full -v linux/tools/testing/selftests/x86/amx_64 amx_64: [FAIL] xstate cpuid: invalid tile data size/offset: 0/0: Success from the linux/tools/testing/selftests/x86/amx.c eax = CPUID_LEAF_XSTATE; ecx = XFEATURE_XTILEDATA; cpuid(&eax, &ebx, &ecx, &edx); /* * eax: XTILEDATA state component size * ebx: XTILEDATA state component offset in user buffer */ if (!eax || !ebx) fatal_error("xstate cpuid: invalid tile data size/offset: %d/%d", eax, ebx); Above code only read AMX xtilestate's offset and size from xstate buffer by cpuid. But from the error information, with valgrind tool, the amx selftest tool in Linux can't read correct Sapphire Rapids platform's cpuid registers. I also tried this same command w/o valgrind tool in intel older platform(without this feature), we can get same error information, but this should be normal behavior. root@icx:~/yangzhon/projects/amx/linux# tools/testing/selftests/x86/amx_64 amx_64: [FAIL] xstate cpuid: invalid tile data size/offset: 0/0: Success So, from above issue in Intel new platform, the valgrind need do some enablings to be compatible with on new platform? Seems valgrind tool can't identify the real HW platform because cpuid can't read correct register value. thanks! Regards, Yang |
From: Aaron B. <bo...@pr...> - 2022-03-25 17:10:27
|
Some further info : the .in_place folder is empty, which explains why ln command fails. ------- Original Message ------- On Friday, March 25th, 2022 at 13:05, Aaron Boxer via Valgrind-users <val...@li...> wrote: > Hello! > > I am trying to build valgrind 3.18 on a petalinux system (embedded Xilinx board) > Even though I am logged in as root., I get this error about symbolic link : > > mkdir -p ../.in_place; \ > for f in vgpreload_core-arm64-linux.so ; do \ > rm -f ../.in_place/$f; \ > ln -f -s ../coregrind/$f ../.in_place; \ > done > mkdir -p ../.in_place; \ > for f in ; do \ > rm -f ../.in_place/$f.dSYM; \ > ln -f -s ../coregrind/$f.dSYM ../.in_place; \ > done > ln: failed to create symbolic link '../.in_place/vgpreload_core-arm64-linux.so': Operation not permitted > > Any insight here would be greatly appreciated. > Thanks! > Aaron > > Sent with [ProtonMail](https://protonmail.com/) secure email. |
From: Aaron B. <bo...@pr...> - 2022-03-25 17:05:54
|
Hello! I am trying to build valgrind 3.18 on a petalinux system (embedded Xilinx board) Even though I am logged in as root., I get this error about symbolic link : mkdir -p ../.in_place; \ for f in vgpreload_core-arm64-linux.so ; do \ rm -f ../.in_place/$f; \ ln -f -s ../coregrind/$f ../.in_place; \ done mkdir -p ../.in_place; \ for f in ; do \ rm -f ../.in_place/$f.dSYM; \ ln -f -s ../coregrind/$f.dSYM ../.in_place; \ done ln: failed to create symbolic link '../.in_place/vgpreload_core-arm64-linux.so': Operation not permitted Any insight here would be greatly appreciated. Thanks! Aaron Sent with [ProtonMail](https://protonmail.com/) secure email. |
From: Narayanan I. <na...@yo...> - 2022-03-16 14:35:58
|
Hi Philippe, Thank you for your reply. 1) I am using valgrind-3.17.0 on a Ubuntu 21.10 box (sorry I had incorrectly mentioned Ubuntu 20.04 in my original report). Not sure if this is the latest release or not. 2) I did not try it with --tool=none. It takes some time of testing to happen even with memcheck so not sure a none tool would help. 3) Yes this is the first error that I encounter. And I had tried the approach that you suggest with --vgdb-error=1 and did attach to the process through gdb at exactly the point when the error is issued. But did not know what more to do then. The application stack trace looks good just like I would expect. 4) I am not sure how to make use of the debug switches in further analyzing this. I have found a workaround for my issue and that is to remove a 1Mb allocation in the stack and move it to the heap. For reasons not yet known, that made the error disappear. So I have decided to move on for now. To me, the symptoms I have seen so far make it seem like a valgrind issue and not an application issue. If there is anything that you or other valgrind experts would like to know from the failing case, I can try work with you. Thanks again. Narayanan. |
From: Philippe W. <phi...@sk...> - 2022-03-16 00:58:26
|
If you are not using the last release of valgrind, you might try with the last release. Wondering if the problem also happens with other tools (e.g. --tool=none). Otherwise, you could try to debug your application when running under valgrind when it encounters the problem. Eg. use arguments --vgdb=full --vgdb-error=1 --vgdb-stop-at=exit,valgrindabexit (assuming the below is the first error you encounter. If not, you should first fix your code to solve the errors previously reported by valgrind). You could also compare the valgrind trace between a succesful run and an unsuccesful run, with e.g. the valgrind debug switches -v -v -v -d -d -d --trace-signals=yes and see if you detect a difference between the 2 runs. Note that with the above switches, you should see some debug log of the signal handling and of the stack extension mechanism. Hope this helps Philippe On Mon, 2022-03-14 at 11:30 -0400, Narayanan Iyer via Valgrind-users wrote: > One correction (not sure it matters). I believe the application uses 1.25Mb of stack space at the time of the failure (not .25 as I had originally mentioned). > > Narayanan. > > -----Original Message----- > From: Narayanan Iyer [mailto:na...@yo...] > Sent: Monday, March 14, 2022 11:27 AM > To: val...@li... > Cc: 'Narayanan Iyer' <na...@yo...> > Subject: Can't extend stack during signal delivery : too small or bad protection modes > > Hi, > > While running the automated test suite (which has hundreds of tests) for my application with valgrind, I occasionally see failures like the following in some of the tests. > > ==29753== Can't extend stack to 0x1ffeec7948 during signal delivery for thread 1: > ==29753== too small or bad protection modes > ==29753== > ==29753== Process terminating with default action of signal 11 (SIGSEGV): dumping core > ==29753== Access not within mapped region at address 0x1FFEEC7948 > ==29753== at 0x4849FD8: strncpy (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) > ==29753== by 0x489AE7C: cli_get_sub_quals (sr_unix/cli_parse.c:593) > ==29753== by 0x489ABC3: parse_arg (sr_unix/cli_parse.c:0) > ==29753== by 0x489BD6E: parse_triggerfile_cmd (sr_unix/cli_parse.c:1128) > ==29753== by 0x4BD2377: trigger_parse (sr_unix/trigger_parse.c:1416) > ==29753== by 0x4B12152: trigger_update_rec (sr_unix/trigger_update.c:1386) > ==29753== by 0x4B16171: trigger_update_rec_helper (sr_unix/trigger_update.c:2171) > ==29753== by 0x4B163B9: trigger_update (sr_unix/trigger_update.c:2224) > ==29753== by 0x4B86385: op_fnztrigger (sr_port/op_fnztrigger.c:248) > ==29753== by 0x5ABA384: _ydboctoplanhelpers (in YDBOcto/build/src/_ydbocto.so) > ==29753== by 0x1774F1EF: ??? > ==29753== by 0xAAAAAAAAAAAAAAA9: ??? > ==29753== If you believe this happened as a result of a stack > ==29753== overflow in your program's main thread (unlikely but > ==29753== possible), you can try to increase the size of the > ==29753== main thread stack using the --main-stacksize= flag. > ==29753== The main thread stack size used in this run was 268435456. > ==29753== Invalid write of size 8 > ==29753== at 0x483A124: _vgnU_freeres (in /usr/libexec/valgrind/vgpreload_core-amd64-linux.so) > ==29753== Address 0x1ffeec8808 is on thread 1's stack > > If I rerun just the failing test, it passes fine. Every time the list of tests that fail keeps changing. If I run the test without valgrind, it passes all the time. > > Originally I got a failure with the --main-stacksize set to 16Mb so I bumped it to 256Mb. And I still keep getting this failure at different tests. I also set the ulimit for stacksize to 256Mb just in case and I still see the failures. > > The application is a single-threaded application and I know for sure it does not use anywhere near 256Mb of stack space. The stack trace shown above keeps changing across the many random failures but in all of those stack traces, I believe only around .25Mb of stack space would be used at the most. > > In this application, a SIGALRM signal would happen every 1 second or so. The application does not set up any alternate stack (i.e. no sigaltstack() call). Not sure if that can be related to the random failure or not. > > This is on a Ubuntu 20.04 system. And my application was compiled with gcc. > > Not sure how to debug this further. Any help in this regard is appreciated. > > Thanks, > Narayanan. > > > > > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |
From: Narayanan I. <na...@yo...> - 2022-03-14 15:30:23
|
One correction (not sure it matters). I believe the application uses 1.25Mb of stack space at the time of the failure (not .25 as I had originally mentioned). Narayanan. -----Original Message----- From: Narayanan Iyer [mailto:na...@yo...] Sent: Monday, March 14, 2022 11:27 AM To: val...@li... Cc: 'Narayanan Iyer' <na...@yo...> Subject: Can't extend stack during signal delivery : too small or bad protection modes Hi, While running the automated test suite (which has hundreds of tests) for my application with valgrind, I occasionally see failures like the following in some of the tests. ==29753== Can't extend stack to 0x1ffeec7948 during signal delivery for thread 1: ==29753== too small or bad protection modes ==29753== ==29753== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==29753== Access not within mapped region at address 0x1FFEEC7948 ==29753== at 0x4849FD8: strncpy (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==29753== by 0x489AE7C: cli_get_sub_quals (sr_unix/cli_parse.c:593) ==29753== by 0x489ABC3: parse_arg (sr_unix/cli_parse.c:0) ==29753== by 0x489BD6E: parse_triggerfile_cmd (sr_unix/cli_parse.c:1128) ==29753== by 0x4BD2377: trigger_parse (sr_unix/trigger_parse.c:1416) ==29753== by 0x4B12152: trigger_update_rec (sr_unix/trigger_update.c:1386) ==29753== by 0x4B16171: trigger_update_rec_helper (sr_unix/trigger_update.c:2171) ==29753== by 0x4B163B9: trigger_update (sr_unix/trigger_update.c:2224) ==29753== by 0x4B86385: op_fnztrigger (sr_port/op_fnztrigger.c:248) ==29753== by 0x5ABA384: _ydboctoplanhelpers (in YDBOcto/build/src/_ydbocto.so) ==29753== by 0x1774F1EF: ??? ==29753== by 0xAAAAAAAAAAAAAAA9: ??? ==29753== If you believe this happened as a result of a stack ==29753== overflow in your program's main thread (unlikely but ==29753== possible), you can try to increase the size of the ==29753== main thread stack using the --main-stacksize= flag. ==29753== The main thread stack size used in this run was 268435456. ==29753== Invalid write of size 8 ==29753== at 0x483A124: _vgnU_freeres (in /usr/libexec/valgrind/vgpreload_core-amd64-linux.so) ==29753== Address 0x1ffeec8808 is on thread 1's stack If I rerun just the failing test, it passes fine. Every time the list of tests that fail keeps changing. If I run the test without valgrind, it passes all the time. Originally I got a failure with the --main-stacksize set to 16Mb so I bumped it to 256Mb. And I still keep getting this failure at different tests. I also set the ulimit for stacksize to 256Mb just in case and I still see the failures. The application is a single-threaded application and I know for sure it does not use anywhere near 256Mb of stack space. The stack trace shown above keeps changing across the many random failures but in all of those stack traces, I believe only around .25Mb of stack space would be used at the most. In this application, a SIGALRM signal would happen every 1 second or so. The application does not set up any alternate stack (i.e. no sigaltstack() call). Not sure if that can be related to the random failure or not. This is on a Ubuntu 20.04 system. And my application was compiled with gcc. Not sure how to debug this further. Any help in this regard is appreciated. Thanks, Narayanan. |
From: Narayanan I. <na...@yo...> - 2022-03-14 15:27:30
|
Hi, While running the automated test suite (which has hundreds of tests) for my application with valgrind, I occasionally see failures like the following in some of the tests. ==29753== Can't extend stack to 0x1ffeec7948 during signal delivery for thread 1: ==29753== too small or bad protection modes ==29753== ==29753== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==29753== Access not within mapped region at address 0x1FFEEC7948 ==29753== at 0x4849FD8: strncpy (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so) ==29753== by 0x489AE7C: cli_get_sub_quals (sr_unix/cli_parse.c:593) ==29753== by 0x489ABC3: parse_arg (sr_unix/cli_parse.c:0) ==29753== by 0x489BD6E: parse_triggerfile_cmd (sr_unix/cli_parse.c:1128) ==29753== by 0x4BD2377: trigger_parse (sr_unix/trigger_parse.c:1416) ==29753== by 0x4B12152: trigger_update_rec (sr_unix/trigger_update.c:1386) ==29753== by 0x4B16171: trigger_update_rec_helper (sr_unix/trigger_update.c:2171) ==29753== by 0x4B163B9: trigger_update (sr_unix/trigger_update.c:2224) ==29753== by 0x4B86385: op_fnztrigger (sr_port/op_fnztrigger.c:248) ==29753== by 0x5ABA384: _ydboctoplanhelpers (in YDBOcto/build/src/_ydbocto.so) ==29753== by 0x1774F1EF: ??? ==29753== by 0xAAAAAAAAAAAAAAA9: ??? ==29753== If you believe this happened as a result of a stack ==29753== overflow in your program's main thread (unlikely but ==29753== possible), you can try to increase the size of the ==29753== main thread stack using the --main-stacksize= flag. ==29753== The main thread stack size used in this run was 268435456. ==29753== Invalid write of size 8 ==29753== at 0x483A124: _vgnU_freeres (in /usr/libexec/valgrind/vgpreload_core-amd64-linux.so) ==29753== Address 0x1ffeec8808 is on thread 1's stack If I rerun just the failing test, it passes fine. Every time the list of tests that fail keeps changing. If I run the test without valgrind, it passes all the time. Originally I got a failure with the --main-stacksize set to 16Mb so I bumped it to 256Mb. And I still keep getting this failure at different tests. I also set the ulimit for stacksize to 256Mb just in case and I still see the failures. The application is a single-threaded application and I know for sure it does not use anywhere near 256Mb of stack space. The stack trace shown above keeps changing across the many random failures but in all of those stack traces, I believe only around .25Mb of stack space would be used at the most. In this application, a SIGALRM signal would happen every 1 second or so. The application does not set up any alternate stack (i.e. no sigaltstack() call). Not sure if that can be related to the random failure or not. This is on a Ubuntu 20.04 system. And my application was compiled with gcc. Not sure how to debug this further. Any help in this regard is appreciated. Thanks, Narayanan. |
From: Floyd, P. <pj...@wa...> - 2022-03-01 09:58:28
|
On 2022-02-22 17:28, Norbert Reher wrote: > > So I was searching the mailing list for a possible work around to reduce > the number of false/positives reported and found a mail (with identical > subject) from Adriaan Schmidt dated 2016-11-28 14:42:43. > Not an answer, but just for reference the thread in question can be found here https://sourceforge.net/p/valgrind/mailman/valgrind-users/thread/1C789EC78A01A643ACB88798B277DCED0523382A%40DENBGAT9EL4MSX.ww902.siemens.net/#msg35518667 A+ Paul |
From: Stefano A. <san...@al...> - 2022-03-01 07:09:24
|
On Mon, 2022-02-28 at 13:11 -0800, Stefano Antonelli wrote: > When the leak occurs, the number of 4k pages owned by this render > device increases. Running valgrind leak-test (or gdb with malloc > breakpoint), the number of 4k pages owned by this render device _do > not_ increase. When I run callgrind, the same behaviour happens. valgrind --tool=callgrind --instr-atstart=no ./qml-mwe3 Once the application is up and running, I use: :~# callgrind_control --instr=on 4120 PID 4120: ./qml-mwe3 sending command instrumentation on to pid 4120 OK. :~# callgrind_control -s 4120 PID 4120: ./qml-mwe3 sending command status internal to pid 4120 Number of running threads: 4, thread IDs: 1 2 3 4 Events collected: Ir Functions: 1,104 (executed 1,289, contexts 1,104) Basic blocks: 1,816 (executed 1,120,586,516, call sites 452) :~# callgrind_control --instr=off 4120 PID 4120: ./qml-mwe3 sending command instrumentation off to pid 4120 OK. After the I turn on the instrumentation, the display stops updating completely (it's running an animation). Once I turn off the instrumentation, the display resumes. I would expect the animation to run dreadfully slow, but not freeze in place for the duration of time instrumentation was on. I don't know exactly how much time this was, but if I had to guess I'd say around 10 minutes. I should also add that the processor is armhf and I noticed with callgrind I get a couple of warnings: ==4120== Callgrind, a call-graph generating cache profiler ==4120== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al. ==4120== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info ==4120== Command: ./qml-mwe3 ==4120== ==4120== For interactive control, run 'callgrind_control -h'. ==4120== Warning: noted but unhandled ioctl 0x6443 with no size/direction hints. ==4120== This could cause spurious value errors to appear. ==4120== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==4120== Warning: noted but unhandled ioctl 0x4b51 with no size/direction hints. ==4120== This could cause spurious value errors to appear. ==4120== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. disInstr(thumb): unhandled instruction: 0xDEFF 0xF8DD I don't know if that's breaking things or not. Nor do I know what these IOCTLs are, but I am trying to parse README_MISSING_SYSCALL_OR_IOCTL as I go along here. -Stef |
From: Stefano A. <san...@al...> - 2022-02-28 21:12:15
|
On Mon, 2022-02-28 at 09:45 -0800, Stefano Antonelli wrote: > On Mon, 2022-02-28 at 18:00 +0100, Floyd, Paul wrote: > > On 2022-02-28 17:28, Stefano Antonelli wrote: > > > Using a memory pool doesn't explain why the leak doesn't get bigger > when running valgrind though does it? So I think I understand the problem. QML sits on top of EGL. The EGL layer is supplied in binary form. In pmap I see a growing number of pages owned by /dev/dri/renderD129 which is a device file. crw-rw---- 1 root render 226, 129 Feb 23 15:48 /dev/dri/renderD129 When the leak occurs, the number of 4k pages owned by this render device increases. Running valgrind leak-test (or gdb with malloc breakpoint), the number of 4k pages owned by this render device _do not_ increase. I believe that QML creates "a graphic thing" from this render device and probably also deletes it, but I don't think the library controlling the render device is freeing the memory. I *think* this library (libglapi.so) is using malloc, but I don't have any source code. ldd shows me: 12: 00000000 0 FUNC GLOBAL DEFAULT UND malloc@GLIBC_2.4 (3) So something real is happening when valgrind leak-test is used that's affecting this graphics library in a good way. Can anyone explain what valgrind does? I know it replaces malloc with vg_malloc, but does anything else happen? Thanks, Stef |
From: Stefano A. <san...@al...> - 2022-02-28 18:01:02
|
On Mon, 2022-02-28 at 18:00 +0100, Floyd, Paul wrote: > On 2022-02-28 17:28, Stefano Antonelli wrote: > I don't know much about Qml. Does this use any sort of memory manager > or > garbage collector? Qml is built on top of javascript. There is definitely a garbage collector. > One thing that you could try is to run your application with > --tool=massif and then repeat the run but add --pages-as-heap=yes. If > there is a big difference in the two then it means that your > application > is using a memory pool not based on malloc. Use ms_print (or > massif-visualizer) to see the massif profiles. There is a huge difference. 1MB vs 90MB. Using a memory pool doesn't explain why the leak doesn't get bigger when running valgrind though does it? And thanks for the help! -Stef |
From: Floyd, P. <pj...@wa...> - 2022-02-28 17:00:52
|
On 2022-02-28 17:28, Stefano Antonelli wrote: > Hello list, > > I have a Qml application that leaks memory when run normally. I've > reduced it to a very simple Qml app and it still leaks. And the leak > is visible with pmap almost immediately. The application runs on an > embedded device (armhf). Hi Stefano I don't know much about Qml. Does this use any sort of memory manager or garbage collector? One thing that you could try is to run your application with --tool=massif and then repeat the run but add --pages-as-heap=yes. If there is a big difference in the two then it means that your application is using a memory pool not based on malloc. Use ms_print (or massif-visualizer) to see the massif profiles. A+ Paul |
From: Stefano A. <san...@al...> - 2022-02-28 16:43:25
|
Hello list, I have a Qml application that leaks memory when run normally. I've reduced it to a very simple Qml app and it still leaks. And the leak is visible with pmap almost immediately. The application runs on an embedded device (armhf). When I run it with valgrind: /usr/bin/valgrind.bin --tool=memcheck --error-limit=no \ --log-file=vg-220214.log --leak-check=full \ --leak-resolution=high --show-reachable=yes ./qml-mwe3 There is no leak. I've tried running it for a solid day and pmap doesn't show any leak. If I run the same application in gdb (just letting it run). It leaks. However, if I put a break point on malloc: gdb ./qml-mwe3 (gbd) run --- some time passes --- ^C (gdb) break malloc (gdb) commands > continue > end (gdb) run Then the leak stops. No memory is freed, but it absolutely stops increasing. Can anyone offer any guesses as to what's going on? I would appreciate any ideas to debug this further. Thanks, Stef |
From: Norbert R. <Nor...@gm...> - 2022-02-22 16:28:45
|
Dear all, I am analyzing a program with Helgrind for possible data races. Thread-safe static initialization is used to implement singletons inside the code. As expected several false/positves are reported. gcc version: 10.2.1 compile options: -std=c++17 -O3 -m64 -g ... Valgrind version: 3.16.1. So I was searching the mailing list for a possible work around to reduce the number of false/positives reported and found a mail (with identical subject) from Adriaan Schmidt dated 2016-11-28 14:42:43. He used helgrind annotations ANNOTATE_HAPPENS_AFTER and ANNOTATE_HAPPENS_BEFORE to make Helgrind warnings disappear for singletons using thread-safe static initialization. I haven't found any reply to his final question if the solution found could brake the analysis. If I missed the reply please let me know. Thank you in advance. |
From: Nikolaus R. <Nik...@ra...> - 2022-02-22 12:31:02
|
Hello, I'm debugging a C++ program with Helgrind. The program is memcheck clean. I am getting the following error (trimmed the stacktraces to the relevant parts): ==1910679== Thread #4: lock order "0x4EB0948 before 0x18B920" violated ==1910679== ==1910679== Observed (incorrect) order is: acquisition of lock at 0x18B920 ==1910679== by 0x1405CB: unique_lock (unique_lock.h:68) ==1910679== by 0x1405CB: steamfs::do_lookup(unsigned long, char const*, fuse_entry_param*) (callbacks.cpp:410) ==1910679== by 0x1491CA: create (callbacks.cpp:1429) ==1910679== by 0x1491CA: operator() (callbacks.cpp:1447) ==1910679== ==1910679== followed by a later acquisition of lock at 0x4EB0948 ==1910679== by 0x14060D: unique_lock (unique_lock.h:68) ==1910679== by 0x14060D: steamfs::do_lookup(unsigned long, char const*, fuse_entry_param*) (callbacks.cpp:418) ==1910679== by 0x1491CA: create (callbacks.cpp:1429) ==1910679== by 0x1491CA: operator() (callbacks.cpp:1447) ==1910679== ==1910679== Required order was established by acquisition of lock at 0x4EB0948 ==1910679== by 0x141648: lock_guard (std_mutex.h:159) ==1910679== by 0x141648: steamfs::do_readdir(fuse_req*, unsigned long, unsigned long, long, fuse_file_info const*, int) (callbacks.cpp:600) ==1910679== by 0x14240F: readdirplus (callbacks.cpp:1384) ==1910679== by 0x14240F: operator() (callbacks.cpp:1389) ==1910679== ==1910679== followed by a later acquisition of lock at 0x18B920 ==1910679== by 0x1405CB: unique_lock (unique_lock.h:68) ==1910679== by 0x1405CB: steamfs::do_lookup(unsigned long, char const*, fuse_entry_param*) (callbacks.cpp:410) ==1910679== by 0x141839: steamfs::do_readdir(fuse_req*, unsigned long, unsigned long, long, fuse_file_info const*, int) (callbacks.cpp:677) ==1910679== by 0x14240F: readdirplus (callbacks.cpp:1384) ==1910679== by 0x14240F: operator() (callbacks.cpp:1389) The problem is that the code that accesses the second lock (0x4EB0948) is not accessing the same variable. Now, it could of course be that there is a bug somewhere that results in both variables pointing to the same address, but it is much more likely that this is coincidence (since both variables are part of structures that are dynamically allocated and freed). Is it possible that Helgrind is not recognizing that these are two *different* locks with different lifetimes that just happen to be allocated at the same address? Best, -Nikolaus -- GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« |