You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
1
(23) |
2
(40) |
3
(17) |
4
(10) |
|
5
(14) |
6
(41) |
7
(26) |
8
(23) |
9
(15) |
10
(25) |
11
(14) |
|
12
(23) |
13
(11) |
14
(18) |
15
(21) |
16
(18) |
17
(8) |
18
(14) |
|
19
(16) |
20
(15) |
21
(12) |
22
(11) |
23
(8) |
24
(11) |
25
(12) |
|
26
(9) |
27
(17) |
28
(31) |
29
(16) |
30
(10) |
31
(17) |
|
|
From: <sv...@va...> - 2006-03-19 18:19:22
|
Author: sewardj
Date: 2006-03-19 18:19:11 +0000 (Sun, 19 Mar 2006)
New Revision: 5778
Log:
Yet another essay: document the MPI wrapper library.
Modified:
trunk/docs/xml/manual-core.xml
Modified: trunk/docs/xml/manual-core.xml
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- trunk/docs/xml/manual-core.xml 2006-03-16 11:36:23 UTC (rev 5777)
+++ trunk/docs/xml/manual-core.xml 2006-03-19 18:19:11 UTC (rev 5778)
@@ -2375,4 +2375,361 @@
=20
</sect1>
=20
+
+<sect1 id=3D"manual-core.mpiwrap" xreflabel=3D"MPI Wrappers">
+<title>Debugging MPI Parallel Programs with Valgrind</title>
+
+<para> Valgrind supports debugging of distributed-memory applications
+which use the MPI message passing standard. This support consists of a
+library of wrapper functions for the
+<computeroutput>PMPI_*</computeroutput> interface. When incorporated
+into the application's address space, either by direct linking or by
+<computeroutput>LD_PRELOAD</computeroutput>, the wrappers intercept
+calls to <computeroutput>PMPI_Send</computeroutput>,
+<computeroutput>PMPI_Recv</computeroutput>, etc. They then
+use client requests to inform Valgrind of memory state changes caused
+by the function being wrapped. This reduces the number of false
+positives that Memcheck otherwise typically reports for MPI
+applications.</para>
+
+<para>The wrappers also take the opportunity to carefully check
+size and definedness of buffers passed as arguments to MPI functions, he=
nce
+detecting errors such as passing undefined data to
+<computeroutput>PMPI_Send</computeroutput>, or receiving data into a
+buffer which is too small.</para>
+
+
+<sect2 id=3D"manual-core.mpiwrap.build" xreflabel=3D"Building MPI Wrappe=
rs">
+<title>Building and installing the wrappers</title>
+
+<para> The wrapper library will be built automatically if possible.
+Valgrind's configure script will look for a suitable
+<computeroutput>mpicc</computeroutput> to build it with. This must be
+the same <computeroutput>mpicc</computeroutput> you use to build the
+MPI application you want to debug. By default, Valgrind tries
+<computeroutput>mpicc</computeroutput>, but you can specify a
+different one by using the configure-time flag
+<computeroutput>--with-mpicc=3D</computeroutput>. Currently the
+wrappers are only buildable with
+<computeroutput>mpicc</computeroutput>s which are based on GNU
+<computeroutput>gcc</computeroutput> or Intel's
+<computeroutput>icc</computeroutput>.</para>
+
+<para>Check that the configure script prints a line like this:</para>
+
+<programlisting><![CDATA[
+checking for usable MPI2-compliant mpicc and mpi.h... yes, mpicc
+]]></programlisting>
+
+<para>If it says <computeroutput>... no</computeroutput>, your
+<computeroutput>mpicc</computeroutput> has failed to compile and link
+a test MPI2 program.</para>
+
+<para>If the configure test succeeds, continue in the usual way with
+<computeroutput>make</computeroutput> and <computeroutput>make
+install</computeroutput>. The final install tree should then contain
+<computeroutput>libmpiwrap.so</computeroutput>.
+</para>
+
+<para>Compile up a test MPI program (eg, MPI hello-world) and try
+this:</para>
+
+<programlisting><![CDATA[
+LD_PRELOAD=3D$prefix/lib/valgrind/<platform>/libmpiwrap.so \
+ mpirun [args] $prefix/bin/valgrind ./hello
+]]></programlisting>
+
+<para>You should see something similar to the following</para>
+
+<programlisting><![CDATA[
+valgrind MPI wrappers 31901: Active for pid 31901
+valgrind MPI wrappers 31901: Try MPIWRAP_DEBUG=3Dhelp for possible optio=
ns
+]]></programlisting>
+
+<para>repeated for every process in the group. If you do not see
+these, there is an build/installation problem of some kind.</para>
+
+<para> The MPI functions to be wrapped are assumed to be in an ELF
+shared object with soname matching
+<computeroutput>libmpi.so*</computeroutput>. This is known to be
+correct at least for Open MPI and Quadrics MPI, and can easily be
+changed if required.</para>=20
+</sect2>
+
+
+<sect2 id=3D"manual-core.mpiwrap.gettingstarted"=20
+ xreflabel=3D"Getting started with MPI Wrappers">
+<title>Getting started</title>
+
+<para>Compile your MPI application as usual, taking care to link it
+using the same <computeroutput>mpicc</computeroutput> that your
+Valgrind build was configured with.</para>
+
+<para>
+Use the following basic scheme to run your application on Valgrind with
+the wrappers engaged:</para>
+
+<programlisting><![CDATA[
+MPIWRAP_DEBUG=3D[wrapper-args] \
+ LD_PRELOAD=3D$prefix/lib/valgrind/<platform>/libmpiwrap.so \
+ mpirun [mpirun-args] \
+ $prefix/bin/valgrind [valgrind-args] \
+ [application] [app-args]
+]]></programlisting>
+
+<para>As an alternative to
+<computeroutput>LD_PRELOAD</computeroutput>ing
+<computeroutput>libmpiwrap.so</computeroutput>, you can simply link it
+to your application if desired. This should not disturb native
+behaviour of your application in any way.</para>
+</sect2>
+
+
+<sect2 id=3D"manual-core.mpiwrap.controlling"=20
+ xreflabel=3D"Controlling the MPI Wrappers">
+<title>Controlling the wrapper library</title>
+
+<para>Environment variable
+<computeroutput>MPIWRAP_DEBUG</computeroutput> is consulted at
+startup. The default behaviour is to print a starting banner</para>
+
+<programlisting><![CDATA[
+valgrind MPI wrappers 16386: Active for pid 16386
+valgrind MPI wrappers 16386: Try MPIWRAP_DEBUG=3Dhelp for possible optio=
ns
+]]></programlisting>
+
+<para> and then be relatively quiet.</para>
+
+<para>You can give a list of comma-separated options in
+<computeroutput>MPIWRAP_DEBUG</computeroutput>. These are</para>
+
+<itemizedlist>
+ <listitem>
+ <para><computeroutput>verbose</computeroutput>:
+ show entries/exits of all wrappers. Also show extra
+ debugging info, such as the status of outstanding=20
+ <computeroutput>MPI_Request</computeroutput>s resulting
+ from uncompleted <computeroutput>MPI_Irecv</computeroutput>s.</para>
+ </listitem>
+ <listitem>
+ <para><computeroutput>quiet</computeroutput>:=20
+ opposite of <computeroutput>verbose</computeroutput>, only print=20
+ anything when the wrappers want
+ to report a detected programming error, or in case of catastrophic
+ failure of the wrappers.</para>
+ </listitem>
+ <listitem>
+ <para><computeroutput>warn</computeroutput>:=20
+ by default, functions which lack proper wrappers
+ are not commented on, just silently
+ ignored. This causes a warning to be printed for each unwrapped
+ function used, up to a maximum of three warnings per function.</para=
>
+ </listitem>
+ <listitem>
+ <para><computeroutput>strict</computeroutput>:=20
+ print an error message and abort the program if=20
+ a function lacking a wrapper is used.</para>
+ </listitem>
+</itemizedlist>
+
+<para> If you want to use Valgrind's XML output facility
+(<computeroutput>--xml=3Dyes</computeroutput>), you should pass
+<computeroutput>quiet</computeroutput> in
+<computeroutput>MPIWRAP_DEBUG</computeroutput> so as to get rid of any
+extraneous printing from the wrappers.</para>
+
+</sect2>
+
+
+<sect2 id=3D"manual-core.mpiwrap.limitations"=20
+ xreflabel=3D"Abilities and Limitations of MPI Wrappers">
+<title>Abilities and limitations</title>
+
+<sect3>
+<title>Functions</title>
+
+<para>All MPI2 functions except
+<computeroutput>MPI_Wtick</computeroutput>,
+<computeroutput>MPI_Wtime</computeroutput> and
+<computeroutput>MPI_Pcontrol</computeroutput> have wrappers. The
+first two are not wrapped because they return a=20
+<computeroutput>double</computeroutput>, and Valgrind's
+function-wrap mechanism cannot handle that (it could easily enough be
+extended to). <computeroutput>MPI_Pcontrol</computeroutput> cannot be
+wrapped as it has variable arity:=20
+<computeroutput>int MPI_Pcontrol(const int level, ...)</computeroutput><=
/para>
+
+<para>Most functions are wrapped with a default wrapper which does
+nothing except complain or abort if it is called, depending on
+settings in <computeroutput>MPIWRAP_DEBUG</computeroutput> listed
+above. The following functions have "real", do-something-useful
+wrappers:</para>
+
+<programlisting><![CDATA[
+PMPI_Send PMPI_Bsend PMPI_Ssend PMPI_Rsend
+
+PMPI_Recv PMPI_Get_count
+
+PMPI_Isend PMPI_Ibsend PMPI_Issend PMPI_Irsend
+
+PMPI_Irecv
+PMPI_Wait PMPI_Waitall
+PMPI_Test PMPI_Testall
+
+PMPI_Iprobe PMPI_Probe
+
+PMPI_Cancel
+
+PMPI_Sendrecv
+
+PMPI_Type_commit PMPI_Type_free
+
+PMPI_Bcast PMPI_Gather PMPI_Scatter PMPI_Alltoall
+PMPI_Reduce PMPI_Allreduce PMPI_Op_create
+
+PMPI_Comm_create PMPI_Comm_dup PMPI_Comm_free PMPI_Comm_rank PMPI_Comm_s=
ize
+
+PMPI_Error_string
+PMPI_Init PMPI_Initialized PMPI_Finalize
+]]></programlisting>
+
+<para> A few functions such as
+<computeroutput>PMPI_Address</computeroutput> are listed as
+<computeroutput>HAS_NO_WRAPPER</computeroutput>. They have no wrapper
+at all as there is nothing worth checking, and giving a no-op wrapper
+would reduce performance for no reason.</para>
+
+<para> Note that the wrapper library itself can itself generate large
+numbers of calls to the MPI implementation, especially when walking
+complex types. The most common functions called are
+<computeroutput>PMPI_Extent</computeroutput>,
+<computeroutput>PMPI_Type_get_envelope</computeroutput>,
+<computeroutput>PMPI_Type_get_contents</computeroutput>, and
+<computeroutput>PMPI_Type_free</computeroutput>. </para>
+</sect3>
+
+<sect3>
+<title>Types</title>
+
+<para> MPI-1.1 structured types are supported, and walked exactly.
+The currently supported combiners are
+<computeroutput>MPI_COMBINER_NAMED</computeroutput>,
+<computeroutput>MPI_COMBINER_CONTIGUOUS</computeroutput>,
+<computeroutput>MPI_COMBINER_VECTOR</computeroutput>,
+<computeroutput>MPI_COMBINER_HVECTOR</computeroutput>
+<computeroutput>MPI_COMBINER_INDEXED</computeroutput>,
+<computeroutput>MPI_COMBINER_HINDEXED</computeroutput> and
+<computeroutput>MPI_COMBINER_STRUCT</computeroutput>. This should
+cover all MPI-1.1 types. The mechanism (function
+<computeroutput>walk_type</computeroutput>) should extend easily to
+cover MPI2 combiners.</para>
+
+<para>MPI defines some named structured types
+(<computeroutput>MPI_FLOAT_INT</computeroutput>,
+<computeroutput>MPI_DOUBLE_INT</computeroutput>,
+<computeroutput>MPI_LONG_INT</computeroutput>,
+<computeroutput>MPI_2INT</computeroutput>,
+<computeroutput>MPI_SHORT_INT</computeroutput>,
+<computeroutput>MPI_LONG_DOUBLE_INT</computeroutput>) which are pairs
+of some basic type and a C <computeroutput>int</computeroutput>.
+Unfortunately the MPI specification makes it impossible to look inside
+these types and see where the fields are. Therefore these wrappers
+assume the types are laid out as <computeroutput>struct { float val;
+int loc; }</computeroutput> (for
+<computeroutput>MPI_FLOAT_INT</computeroutput>), etc, and act
+accordingly. This appears to be correct at least for Open MPI 1.0.2
+and for Quadrics MPI.</para>
+
+<para>If <computeroutput>strict</computeroutput> is an option specified=20
+in <computeroutput>MPIWRAP_DEBUG</computeroutput>, the application
+will abort if an unhandled type is encountered. Otherwise, the=20
+application will print a warning message and continue.</para>
+
+<para>Some effort is made to mark/check memory ranges corresponding to
+arrays of values in a single pass. This is important for performance
+since asking Valgrind to mark/check any range, no matter how small,
+carries quite a large constant cost. This optimisation is applied to
+arrays of primitive types (<computeroutput>double</computeroutput>,
+<computeroutput>float</computeroutput>,
+<computeroutput>int</computeroutput>,
+<computeroutput>long</computeroutput>, <computeroutput>long
+long</computeroutput>, <computeroutput>short</computeroutput>,
+<computeroutput>char</computeroutput>, and <computeroutput>long
+double</computeroutput> on platforms where <computeroutput>sizeof(long
+double) =3D=3D 8</computeroutput>). For arrays of all other types, the
+wrappers handle each element individually and so there can be a very
+large performance cost.</para>
+
+</sect3>
+
+</sect2>
+
+
+<sect2 id=3D"manual-core.mpiwrap.writingwrappers"=20
+ xreflabel=3D"Writing new MPI Wrappers">
+<title>Writing new wrappers</title>
+
+<para>
+For the most part the wrappers are straightforward. The only
+significant complexity arises with nonblocking receives.</para>
+
+<para>The issue is that <computeroutput>MPI_Irecv</computeroutput>
+states the recv buffer and returns immediately, giving a handle
+(<computeroutput>MPI_Request</computeroutput>) for the transaction.
+Later the user will have to poll for completion with
+<computeroutput>MPI_Wait</computeroutput> etc, and when the
+transaction completes successfully, the wrappers have to paint the
+recv buffer. But the recv buffer details are not presented to
+<computeroutput>MPI_Wait</computeroutput> -- only the handle is. The
+library therefore maintains a shadow table which associates
+uncompleted <computeroutput>MPI_Request</computeroutput>s with the
+corresponding buffer address/count/type. When an operation completes,
+the table is searched for the associated address/count/type info, and
+memory is marked accordingly.</para>
+
+<para>Access to the table is guarded by a (POSIX pthreads) lock, so as
+to make the library thread-safe.</para>
+
+<para>The table is allocated with
+<computeroutput>malloc</computeroutput> and never
+<computeroutput>free</computeroutput>d, so it will show up in leak
+checks.</para>
+
+<para>Writing new wrappers should be fairly easy. The source file is
+<computeroutput>auxprogs/libmpiwrap.c</computeroutput>. If possible,
+find an existing wrapper for a function of similar behaviour to the
+one you want to wrap, and use it as a starting point. The wrappers
+are organised in sections in the same order as the MPI 1.1 spec, to
+aid navigation. When adding a wrapper, remember to comment out the
+definition of the default wrapper in the long list of defaults at the
+bottom of the file (do not remove it, just comment it out).</para>
+</sect2>
+
+<sect2 id=3D"manual-core.mpiwrap.whattoexpect"=20
+ xreflabel=3D"What to expect with MPI Wrappers">
+<title>What to expect when using the wrappers</title>
+
+<para>The wrappers should reduce Memcheck's false-error rate on MPI
+applications. Because the wrapping is done at the MPI interface,
+there will still potentially be a large number of errors reported in
+the MPI implementation below the interface. The best you can do is
+try to suppress them.</para>
+
+<para>You may also find that the input-side (buffer
+length/definedness) checks find errors in your MPI use, for example
+passing too short a buffer to
+<computeroutput>MPI_Recv</computeroutput>.</para>
+
+<para>Functions which are not wrapped may increase the false
+error rate. A possible approach is to run with
+<computeroutput>MPI_DEBUG</computeroutput> containing
+<computeroutput>warn</computeroutput>. This will show you functions
+which lack proper wrappers but which are nevertheless used. You can
+then write wrappers for them.
+</para>
+
+</sect2>
+
+</sect1>
+
</chapter>
|
|
From: Bart V. A. <bar...@gm...> - 2006-03-19 18:15:55
|
> > > > No. I was trying to say that wrapping is thread safe - each thread's > > > wrapping activities are completely independent. If a wrapper functio= n > > > accesses global data then perhaps it does need locking, but that's > > > just standard requirements for multithreaded programming. > > > But Valgrind forces programs to run single-threaded, right? So no lockin= g > > should be necessary in wrapper functions provided by tools? > > That doesn't sound right to me, valgrind only forces processes to run > single-threaded in the same was as running them on a UP box does. It's > not possible to predict when or prevent another thread from starting > hence locking of global state will be needed. I will try to rephrase my question. Suppose that a tool is being used that instruments memory accesses, and suppose that the following happens: - In thread where the instrumented code is running, a memory access occurs and a tool function is called, e.g. drd_trace_load(), such that the tool knows about the memory access and can update its state information. - In any another thread pthread_mutex_lock() is called. Suppose that pthread_mutex_lock() is redirected to a function in vg_preloaded.c, and tha= t this redirected function (still client code) invokes the client request VG_USERREQ__PRE_PTHREAD_MUTEX_LOCK. This will invoke the scheduler (coregrind/m_scheduler/scheduler.c). Suppose that this scheduler calls VG_TRACK(pre_mutex_lock). - From the previous discussion I conclude that drd_trace_load() and drd_track_pre_mutex_lock() can run "simultaneously" and hence have to arbitrate access to shared state information via locking. However, it is no= t acceptable because of performance reasons to introduce locking each time drd_trace_load() is called. Is there a way to ensure that drd_trace_load() and drd_track_pre_mutex_lock() do not run simultaneously, e.g. by putting the semaphore run_sema down before drd_track_pre_mutex_lock() is called ? |
|
From: Bart V. A. <bar...@gm...> - 2006-03-19 17:54:35
|
On 3/18/06, Julian Seward <js...@ac...> wrote: > > On Tuesday 14 March 2006 18:01, Bart Van Assche wrote: > > I'm using Valgrind 3.1.0 with MontaVista Linux Professional 3.1 > > and it works great. > > Did you try 3.1.1 ? I'm hoping I didn't break anything in the > 3.1.0 -> 3.1.1 transition, and there are some worthwhile ppc32 > fixes in 3.1.1. > We are migrating from MontaVista Linux Professional 3.1 to 4.0 (2.6 kernel)= . Maybe there will be some time to deploy Valgrind 3.1.1 after that transition. |
|
From: Bart V. A. <bar...@gm...> - 2006-03-19 17:48:25
|
On 3/19/06, Julian Seward <js...@ac...> wrote: > Good. Is it possible to see also the contents of drd, enough > that we can actually run the system? Do you also have some > small test programs which demonstrate races etc that drd can find? There is still some work to do - I'd like to realize the first three items before I send you the tool: - Implement segment merging and detection of obsolete segments (see also th= e DIOTA paper) -- the memory use of the tool keeps increasing each time a pthread function is called. - Test the tool with nontrivial applications. - Implement support for reuse of thread-ID's - currently it is assumed that thread-ID's are not reused. - Free the memory allocated (inside the tool) for mutex state information i= f a mutex is no longer in use. - Error reporting: call stacks of both conflicting accesses. - Writing test programs for the drd tool. > Some difficulties I encountered: > > - I need the thread ID of the joined thread for > > track_post_pthread_join(). There are two difficulties: > > * There are applications that call pthread_join with zero as the first > > argument (i.e. thread not specified). > > Are you sure? POSIX doesn't appear to say anything about that: > http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_join.html > The impression I get from that URL and also the Linux man page is > that the joined thread must be specified. Thanks for the link -- I won't try to support pthread_join(0). By the way, you reminded me of the fact that pthread_t does not have to be a scalar datatype -- even assuming that (pthread_t)0 is not a valid thread ID is nonportable. Precisely what information do you need to handle pthread_join > correctly? Getting the tid is tricky because there could be > a race condition (V reallocates the same lwp_tid to a new > thread before you get the required info from the scheduler) > so we will have to be careful with that. I think I have a solution for this race condition: I removed the assignment "tst->status =3D VgTs_Empty;" from run_a_thread_NORETURN() and moved it to scheduler.c, just after the point where VG_TRACK(post_thread_join) is calle= d from the pthread_join wrapper. But I'm afraid this will break calls to clone() that do not originate from pthread_create() ? Note: it's no problem for the drd tool that pthread_t thread ID's are reuse= d before track_post_thread_join() is called, it only must be ensured that V's ThreadId is not reused after pthread_join() finishes and before track_post_thread_join() is called. > - For each mutex, I need the following information: recursion count > > (depth of recursive locking) and at the time pthread_mutex_lock() is > > called, the thread ID of the last thread that called > > pthread_mutex_unlock(). This information is now stored in my tool. Is > > this the right place, or should this information be managed by the > > Valgrind core such that it is also accessible by Helgrind ? > > Maybe this stuff should be in m_pthreadmodel.c. I think you > will have to make friends with that module to be really successful > with drd. By the way, this state information is already managed by the drd tool. This state information is updated inside the mutex tracking functions. It would be nice if that stuff would be in m_pthreadmodel.c, but for me it's not essential. |
|
From: <js...@ac...> - 2006-03-19 11:23:53
|
Nightly build on minnie ( SuSE 10.0, ppc32 ) started at 2006-03-19 02:00:02 GMT Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 194 tests, 11 stderr failures, 5 stdout failures ================= memcheck/tests/leak-cycle (stderr) memcheck/tests/leak-tree (stderr) memcheck/tests/leakotron (stdout) memcheck/tests/mempool (stderr) memcheck/tests/pointer-trace (stderr) memcheck/tests/sigaltstack (stderr) memcheck/tests/stack_changes (stdout) memcheck/tests/stack_changes (stderr) memcheck/tests/xml1 (stderr) none/tests/faultstatus (stderr) none/tests/mremap (stderr) none/tests/ppc32/jm-fp (stdout) none/tests/ppc32/jm-fp (stderr) none/tests/ppc32/test_fx (stdout) none/tests/ppc32/test_fx (stderr) none/tests/ppc32/test_gx (stdout) |
|
From: Julian S. <js...@ac...> - 2006-03-19 11:14:39
|
> I think it should be comparing against VG_(client_rlimit_data).rlim_max > to make sure you aren't trying to raise either limit above the current > hard limit. I'm not quite clear what should be compared against what. Perhaps better if you fix it. J |
|
From: Tom H. <to...@co...> - 2006-03-19 08:37:15
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
> syswrap-generic.c, function PRE(sys_setrlimit), has these:
>
> ((struct vki_rlimit *)ARG2)->rlim_max > ((struct vki_rlimit *)ARG2)->rlim_max)
>
> ((struct vki_rlimit *)ARG2)->rlim_max > ((struct vki_rlimit *)ARG2)->rlim_max)
>
> (lines 5074/5084 respectively). x > x is always False.
>
> Anybody have any idea why there are here / what they should be?
> 'svn ann' says I was the last person to edit them, but I have
> no memory of messing with this code :-)
It was actually me that put that in (revision 345677 in the old
repository) and it does look wrong.
> It looks like it might be some kind of typo, but I'm not
> sure what.
I think it should be comparing against VG_(client_rlimit_data).rlim_max
to make sure you aren't trying to raise either limit above the current
hard limit.
The stack limit code below it has the same problem.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: <js...@ac...> - 2006-03-19 04:49:00
|
Nightly build on phoenix ( SuSE 10.0 ) started at 2006-03-19 03:30:01 GMT Checking out vex source tree ... done Building vex ... done Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 225 tests, 6 stderr failures, 0 stdout failures ================= memcheck/tests/leak-tree (stderr) memcheck/tests/stack_switch (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) none/tests/x86/faultstatus (stderr) none/tests/x86/int (stderr) |
|
From: <js...@ac...> - 2006-03-19 03:55:02
|
Nightly build on g5 ( YDL 4.0, ppc970 ) started at 2006-03-19 04:40:00 CET Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 199 tests, 6 stderr failures, 2 stdout failures ================= memcheck/tests/leak-cycle (stderr) memcheck/tests/leak-tree (stderr) memcheck/tests/leakotron (stdout) memcheck/tests/pointer-trace (stderr) none/tests/faultstatus (stderr) none/tests/fdleak_fcntl (stderr) none/tests/mremap (stderr) none/tests/ppc32/mftocrf (stdout) |
|
From: Tom H. <to...@co...> - 2006-03-19 03:44:01
|
Nightly build on dunsmere ( athlon, Fedora Core 4 ) started at 2006-03-19 03:30:07 GMT Results differ from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 227 tests, 8 stderr failures, 1 stdout failure ================= memcheck/tests/leak-tree (stderr) memcheck/tests/mempool (stderr) memcheck/tests/pointer-trace (stderr) memcheck/tests/stack_switch (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/x86/sse1_memory (stdout) none/tests/x86/faultstatus (stderr) none/tests/x86/int (stderr) ================================================= == Results from 24 hours ago == ================================================= Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 227 tests, 7 stderr failures, 1 stdout failure ================= memcheck/tests/leak-tree (stderr) memcheck/tests/pointer-trace (stderr) memcheck/tests/stack_switch (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/x86/sse1_memory (stdout) none/tests/x86/faultstatus (stderr) none/tests/x86/int (stderr) ================================================= == Difference between 24 hours ago and now == ================================================= *** old.short Sun Mar 19 03:37:11 2006 --- new.short Sun Mar 19 03:43:56 2006 *************** *** 8,11 **** ! == 227 tests, 7 stderr failures, 1 stdout failure ================= memcheck/tests/leak-tree (stderr) memcheck/tests/pointer-trace (stderr) --- 8,12 ---- ! == 227 tests, 8 stderr failures, 1 stdout failure ================= memcheck/tests/leak-tree (stderr) + memcheck/tests/mempool (stderr) memcheck/tests/pointer-trace (stderr) |
|
From: Tom H. <th...@cy...> - 2006-03-19 03:37:57
|
Nightly build on gill ( x86_64, Fedora Core 2 ) started at 2006-03-19 03:00:03 GMT Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 249 tests, 7 stderr failures, 2 stdout failures ================= memcheck/tests/stack_switch (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/x86/sse1_memory (stdout) none/tests/amd64/faultstatus (stderr) none/tests/fdleak_fcntl (stderr) none/tests/tls (stdout) none/tests/x86/faultstatus (stderr) none/tests/x86/int (stderr) |
|
From: Tom H. <th...@cy...> - 2006-03-19 03:33:01
|
Nightly build on alvis ( i686, Red Hat 7.3 ) started at 2006-03-19 03:15:04 GMT Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 226 tests, 21 stderr failures, 1 stdout failure ================= memcheck/tests/addressable (stderr) memcheck/tests/badjump (stderr) memcheck/tests/describe-block (stderr) memcheck/tests/erringfds (stderr) memcheck/tests/leak-0 (stderr) memcheck/tests/leak-cycle (stderr) memcheck/tests/leak-regroot (stderr) memcheck/tests/leak-tree (stderr) memcheck/tests/match-overrun (stderr) memcheck/tests/mempool (stderr) memcheck/tests/partial_load_dflt (stderr) memcheck/tests/partial_load_ok (stderr) memcheck/tests/partiallydefinedeq (stderr) memcheck/tests/pointer-trace (stderr) memcheck/tests/sigkill (stderr) memcheck/tests/stack_changes (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/x86/sse1_memory (stdout) memcheck/tests/xml1 (stderr) none/tests/x86/faultstatus (stderr) none/tests/x86/int (stderr) |
|
From: Tom H. <th...@cy...> - 2006-03-19 03:25:40
|
Nightly build on dellow ( x86_64, Fedora Core 4 ) started at 2006-03-19 03:10:07 GMT Results differ from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 249 tests, 5 stderr failures, 1 stdout failure ================= memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/x86/sse1_memory (stdout) none/tests/amd64/faultstatus (stderr) none/tests/x86/faultstatus (stderr) none/tests/x86/int (stderr) ================================================= == Results from 24 hours ago == ================================================= Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 249 tests, 6 stderr failures, 1 stdout failure ================= memcheck/tests/pointer-trace (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/x86/sse1_memory (stdout) none/tests/amd64/faultstatus (stderr) none/tests/x86/faultstatus (stderr) none/tests/x86/int (stderr) ================================================= == Difference between 24 hours ago and now == ================================================= *** old.short Sun Mar 19 03:18:28 2006 --- new.short Sun Mar 19 03:25:29 2006 *************** *** 8,11 **** ! == 249 tests, 6 stderr failures, 1 stdout failure ================= ! memcheck/tests/pointer-trace (stderr) memcheck/tests/x86/scalar (stderr) --- 8,10 ---- ! == 249 tests, 5 stderr failures, 1 stdout failure ================= memcheck/tests/x86/scalar (stderr) |
|
From: Tom H. <th...@cy...> - 2006-03-19 03:24:46
|
Nightly build on aston ( x86_64, Fedora Core 3 ) started at 2006-03-19 03:05:09 GMT Results unchanged from 24 hours ago Checking out valgrind source tree ... done Configuring valgrind ... done Building valgrind ... done Running regression tests ... failed Regression test results follow == 249 tests, 6 stderr failures, 1 stdout failure ================= memcheck/tests/stack_switch (stderr) memcheck/tests/x86/scalar (stderr) memcheck/tests/x86/scalar_supp (stderr) memcheck/tests/x86/sse1_memory (stdout) none/tests/amd64/faultstatus (stderr) none/tests/x86/faultstatus (stderr) none/tests/x86/int (stderr) |
|
From: Julian S. <js...@ac...> - 2006-03-19 03:03:37
|
> I am now one step further: my drd tool is now notified about > thread creation, thread termination and mutex locking / unlocking. The > list of conflicting accesses shortened significantly. I have attached > the svn diffs against version 1594:5748M. Good. Is it possible to see also the contents of drd, enough that we can actually run the system? Do you also have some small test programs which demonstrate races etc that drd can find? > Can someone please review/comment on the changes I made ? They seem plausible. My view is, this first phase is to construct a proof-of-concept patch which we can play with a bit, to see how well it works. If that looks good then the next stage is to consider the cleanest way to integrate it. As a result, in this first stage it doesn't matter much if there are ugly infrastructure hacks. > Some difficulties I encountered: > - I need the thread ID of the joined thread for > track_post_pthread_join(). There are two difficulties: > * There are applications that call pthread_join with zero as the first > argument (i.e. thread not specified). Are you sure? POSIX doesn't appear to say anything about that: http://www.opengroup.org/onlinepubs/007908799/xsh/pthread_join.html The impression I get from that URL and also the Linux man page is that the joined thread must be specified. > * Even if the first argument of pthread_join() is nonzero, > VG_(get_lwp_tid)() cannot be called since this information is cleaned > up as soon as the thread stops. I guess that's part of what m_pthreadmodel.c is supposed to do: /* [...] One tricky problem we need to solve is the mapping between pthread_t identifiers and internal thread identifiers. */ Precisely what information do you need to handle pthread_join correctly? Getting the tid is tricky because there could be a race condition (V reallocates the same lwp_tid to a new thread before you get the required info from the scheduler) so we will have to be careful with that. > - For each mutex, I need the following information: recursion count > (depth of recursive locking) and at the time pthread_mutex_lock() is > called, the thread ID of the last thread that called > pthread_mutex_unlock(). This information is now stored in my tool. Is > this the right place, or should this information be managed by the > Valgrind core such that it is also accessible by Helgrind ? Maybe this stuff should be in m_pthreadmodel.c. I think you will have to make friends with that module to be really successful with drd. > - To be implemented: a notification when either > pthread_mutex_destroy() is called or the mutex memory is freed (POSIX > mutexes do not have to be initialized / destroyed via > pthread_mutex_init() / pthread_mutex_destroy()). I have to investigate > this further. Could be expensive to check all frees etc to know when mutex memory is destroyed. J |
|
From: Julian S. <js...@ac...> - 2006-03-19 00:50:37
|
syswrap-generic.c, function PRE(sys_setrlimit), has these:
((struct vki_rlimit *)ARG2)->rlim_max > ((struct vki_rlimit
*)ARG2)->rlim_max)
((struct vki_rlimit *)ARG2)->rlim_max > ((struct vki_rlimit
*)ARG2)->rlim_max)
(lines 5074/5084 respectively). x > x is always False.
Anybody have any idea why there are here / what they should be?
'svn ann' says I was the last person to edit them, but I have
no memory of messing with this code :-)
It looks like it might be some kind of typo, but I'm not
sure what.
J
|