|
From: Rainer G. <rge...@gm...> - 2008-09-22 15:11:22
|
Hi there, I am currently diagnosing a potential race condition. Unfortuantely, this does not happen in my lab, so I can not report first hand. But I got a helgrind log from one of my users. What I see there puzzles me much, probably I am misunderstanding something. First of all, these are the versions used: ==31831== Helgrind, a thread error detector. ==31831== Copyright (C) 2007-2007, and GNU GPL'd, by OpenWorks LLP et al. ==31831== Using LibVEX rev 1854, a library for dynamic binary translation. ==31831== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. ==31831== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework. ==31831== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. I see a lot of messages like these: ==31831== Possible data race during read of size 4 at 0x6622E8 ==31831== at 0x50464FE: pthread_mutex_lock (pthread_mutex_lock.c:87) ==31831== by 0x4C2217B: pthread_mutex_lock (hg_intercepts.c:400) ==31831== by 0x420AFC: dbgGetThrdInfo (debug.c:604) ==31831== by 0x42139C: dbgExitFunc (debug.c:1041) ==31831== by 0x4304AF: wtpAdviseMaxWorkers (wtp.c:540) ==31831== by 0x439498: queueAdviseMaxWorkers (queue.c:118) ==31831== by 0x43AB54: queueStart (queue.c:1840) ==31831== by 0x4404F2: actionConstructFinalize (action.c:296) ==31831== by 0x440D4C: addAction (action.c:878) ==31831== by 0x416246: cfline (conf.c:1065) ==31831== by 0x416B6A: processConfFile (conf.c:450) ==31831== by 0x417288: doIncludeLine (conf.c:225) ==31831== Old state: shared-modified by threads #1, #2 ==31831== New state: shared-modified by threads #1, #2 ==31831== Reason: this thread, #1, holds no consistent locks ==31831== Last consistently used lock for 0x6622E8 was first observed ==31831== at 0x4C2463E: pthread_mutex_init (hg_intercepts.c:346) ==31831== by 0x42F366: wtpInitialize (wtp.c:78) ==31831== by 0x42F44E: wtpConstruct (wtp.c:88) ==31831== by 0x43A9C0: queueStart (queue.c:1799) ==31831== by 0x4404F2: actionConstructFinalize (action.c:296) ==31831== by 0x440D4C: addAction (action.c:878) ==31831== by 0x416246: cfline (conf.c:1065) ==31831== by 0x416B6A: processConfFile (conf.c:450) ==31831== by 0x417288: doIncludeLine (conf.c:225) ==31831== by 0x43DD25: doCustomHdlr (cfsysline.c:101) ==31831== by 0x43E180: processCfSysLineCommand (cfsysline.c:719) ==31831== by 0x415E2E: cfsysline (conf.c:362) When I interpret them correctly, they point me to exactly the location where I try to lock the mutex (code line in question is "pthread_mutex_lock(&mutCallStack);"). I have never before seen this in my (short) experience with helgrind. I would appreciate if someone could explain what's going on there. Any help is deeply appreciated. Thanks, Rainer |
|
From: Konstantin S. <kon...@gm...> - 2008-09-22 18:26:36
|
Hi Rainer, helgrind is under active development now in two separate branches (HGDEV & YARD). The version in trunk is outdated. Please try YARD first: svn co svn://svn.valgrind.org/valgrind/branches/YARD YARD cd YARD && \ ./autogen.sh && \ ./configure --prefix=`pwd`/Inst && \ make && \ make install The race report you observe should be suppressed by the suppression files that comes with helgrind. If you'll see this race with YARD, we will have to update the suppression files. --kcc On Mon, Sep 22, 2008 at 7:10 PM, Rainer Gerhards <rge...@gm...> wrote: > Hi there, > > I am currently diagnosing a potential race condition. Unfortuantely, > this does not happen in my lab, so I can not report first hand. But I > got a helgrind log from one of my users. What I see there puzzles me > much, probably I am misunderstanding something. > > First of all, these are the versions used: > > ==31831== Helgrind, a thread error detector. > ==31831== Copyright (C) 2007-2007, and GNU GPL'd, by OpenWorks LLP et al. > ==31831== Using LibVEX rev 1854, a library for dynamic binary translation. > ==31831== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP. > ==31831== Using valgrind-3.3.1-Debian, a dynamic binary > instrumentation framework. > ==31831== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al. > > I see a lot of messages like these: > > ==31831== Possible data race during read of size 4 at 0x6622E8 > ==31831== at 0x50464FE: pthread_mutex_lock (pthread_mutex_lock.c:87) > ==31831== by 0x4C2217B: pthread_mutex_lock (hg_intercepts.c:400) > ==31831== by 0x420AFC: dbgGetThrdInfo (debug.c:604) > ==31831== by 0x42139C: dbgExitFunc (debug.c:1041) > ==31831== by 0x4304AF: wtpAdviseMaxWorkers (wtp.c:540) > ==31831== by 0x439498: queueAdviseMaxWorkers (queue.c:118) > ==31831== by 0x43AB54: queueStart (queue.c:1840) > ==31831== by 0x4404F2: actionConstructFinalize (action.c:296) > ==31831== by 0x440D4C: addAction (action.c:878) > ==31831== by 0x416246: cfline (conf.c:1065) > ==31831== by 0x416B6A: processConfFile (conf.c:450) > ==31831== by 0x417288: doIncludeLine (conf.c:225) > ==31831== Old state: shared-modified by threads #1, #2 > ==31831== New state: shared-modified by threads #1, #2 > ==31831== Reason: this thread, #1, holds no consistent locks > ==31831== Last consistently used lock for 0x6622E8 was first observed > ==31831== at 0x4C2463E: pthread_mutex_init (hg_intercepts.c:346) > ==31831== by 0x42F366: wtpInitialize (wtp.c:78) > ==31831== by 0x42F44E: wtpConstruct (wtp.c:88) > ==31831== by 0x43A9C0: queueStart (queue.c:1799) > ==31831== by 0x4404F2: actionConstructFinalize (action.c:296) > ==31831== by 0x440D4C: addAction (action.c:878) > ==31831== by 0x416246: cfline (conf.c:1065) > ==31831== by 0x416B6A: processConfFile (conf.c:450) > ==31831== by 0x417288: doIncludeLine (conf.c:225) > ==31831== by 0x43DD25: doCustomHdlr (cfsysline.c:101) > ==31831== by 0x43E180: processCfSysLineCommand (cfsysline.c:719) > ==31831== by 0x415E2E: cfsysline (conf.c:362) > > When I interpret them correctly, they point me to exactly the location > where I try to lock the mutex (code line in question is > "pthread_mutex_lock(&mutCallStack);"). I have never before seen this > in my (short) experience with helgrind. > > I would appreciate if someone could explain what's going on there. Any > help is deeply appreciated. > > Thanks, > Rainer > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Julian S. <js...@ac...> - 2008-09-22 20:19:36
|
On Monday 22 September 2008, Konstantin Serebryany wrote: > helgrind is under active development now in two separate branches > (HGDEV & YARD). > The version in trunk is outdated. Please try YARD first: > svn co svn://svn.valgrind.org/valgrind/branches/YARD YARD Yes, do try YARD; you might like it more than the 3.3.1 helgrind. Nevertheless, the error you show: > > ==31831== Possible data race during read of size 4 at 0x6622E8 > > ==31831== at 0x50464FE: pthread_mutex_lock (pthread_mutex_lock.c:87) > > ==31831== by 0x4C2217B: pthread_mutex_lock (hg_intercepts.c:400) > > ==31831== by 0x420AFC: dbgGetThrdInfo (debug.c:604) > [...] should have been suppressed, so you should not see it. It is due to an internal implementation limitation which is hard to avoid, so in some cases we suppress errors which we know cannot be from end-user code. I think the relevant suppression in glibc-2.34567-NPTL-helgrind.supp is: { helgrind-glibc25-101 Helgrind:Race obj:/lib*/libpthread-2.5.*so fun:pthread_* } (or corresponding other versions for glibc > 2.5, in the same file) One possible reason it did not suppress the error is that you have debug information on/for your libpthread, which is an unusual setup that Helgrind has not been properly tested against. But I am not 100% sure about that. * if you uninstall the libc debuginfo rpm/deb, does that change anything? * what glibc version are you using? J |
|
From: Rainer G. <rge...@gm...> - 2008-09-23 07:44:52
|
Thanks for all the information, this is very helpful. But please let me re-iterate that I do not see these messages in my labs but got them in a report back from a user. So I need to talk to him and see if he is willing to do the extra work. Anyhow, your pointers are also helpful for me directly. helgrind was a great aid in sorting out some issues with the code. I'll now try the alternates in my lab and see if they bring up different results. I am pretty sure there is a race condition in my code, I just have not been able to find it yet. Maybe the new tools help me get a grip at them. Once again many thanks for your advise and for the great and extremely useful tools. They really help make open source a superior platform by enabling programmers to spot hard-to-find bugs quite easily. Thanks again, Rainer On Mon, Sep 22, 2008 at 8:54 PM, Julian Seward <js...@ac...> wrote: > > On Monday 22 September 2008, Konstantin Serebryany wrote: > >> helgrind is under active development now in two separate branches >> (HGDEV & YARD). >> The version in trunk is outdated. Please try YARD first: >> svn co svn://svn.valgrind.org/valgrind/branches/YARD YARD > > Yes, do try YARD; you might like it more than the 3.3.1 helgrind. > > Nevertheless, the error you show: > >> > ==31831== Possible data race during read of size 4 at 0x6622E8 >> > ==31831== at 0x50464FE: pthread_mutex_lock (pthread_mutex_lock.c:87) >> > ==31831== by 0x4C2217B: pthread_mutex_lock (hg_intercepts.c:400) >> > ==31831== by 0x420AFC: dbgGetThrdInfo (debug.c:604) >> [...] > > should have been suppressed, so you should not see it. It is due to an > internal implementation limitation which is hard to avoid, so in some > cases we suppress errors which we know cannot be from end-user code. > > I think the relevant suppression in glibc-2.34567-NPTL-helgrind.supp is: > > { > helgrind-glibc25-101 > Helgrind:Race > obj:/lib*/libpthread-2.5.*so > fun:pthread_* > } > > (or corresponding other versions for glibc > 2.5, in the same file) > > One possible reason it did not suppress the error is that you have > debug information on/for your libpthread, which is an unusual setup that > Helgrind has not been properly tested against. > > But I am not 100% sure about that. > > * if you uninstall the libc debuginfo rpm/deb, does that change anything? > > * what glibc version are you using? > > J > |