|
From: Ashley P. <as...@qu...> - 2007-03-09 19:16:01
|
Hi,
I've been thinking about this patch for a week or two now because of
problems with the a couple of hand-rolled memcpy() routines we have, a
lot of our code is clean on 64 bit platforms when the
--partial-loads-ok=yes flag is used but fails with short buffer overruns
on i686, this patch was an attempt to fix that.
Unfortunatly in practice it only seems to fix a small % of errors for us
because our copy routines only use dword copies for larger messages and
word copies for smaller messages, I'm still seeing errors of the form
shown if anybody can recommend a way of suppressing them properly. The
key thing is it's a four byte aligned/eight byte misaligned word read
immediately after a chunk of addressable memory and the suppressions
format doesn't let me specify this.
RMS_RANK 0
Invalid read of size 4
at 0x4305C19: elan_tportTxStart (tportTx.c:299)
by 0x4245705: MPID_ELAN_SendContig (adi2send.c:70)
by 0x424566F: MPID_SendContig (adi2send.c:139)
by 0x4242044: MPID_SendDatatype (adi2hsend.c:66)
by 0x426A595: PMPI_Send (send.c:91)
by 0x8048F46: main (sendvector.c:94)
Address 0x5f0f41c is 0 bytes after a block of size 20 alloc'd
at 0x401B742: malloc (vg_replace_malloc.c:207)
by 0x42443D7: MPID_PackMessage (adi2mpack.c:37)
by 0x424200C: MPID_SendDatatype (adi2hsend.c:62)
by 0x426A595: PMPI_Send (send.c:91)
by 0x8048F46: main (sendvector.c:94)
Also the comment in mc_include.h where it says "default: YES" appears
incorrect.
Ashley,
$ svn diff memcheck/mc_main.c memcheck/mc_include.h include/pub_tool_debuginfo.h
Index: memcheck/mc_main.c
===================================================================
--- memcheck/mc_main.c (revision 6637)
+++ memcheck/mc_main.c (working copy)
@@ -1160,6 +1160,7 @@
SizeT n_addrs_bad = 0;
Addr ai;
Bool partial_load_exemption_applies;
+ Bool partial_dword_load_exemption_applies;
UChar vbits8;
Bool ok;
@@ -1225,8 +1226,13 @@
= MC_(clo_partial_loads_ok) && szB == VG_WORDSIZE
&& VG_IS_WORD_ALIGNED(a)
&& n_addrs_bad < VG_WORDSIZE;
-
- if (n_addrs_bad > 0 && !partial_load_exemption_applies)
+
+ partial_dword_load_exemption_applies
+ = MC_(clo_partial_dword_loads_ok) && szB == 8
+ && VG_IS_8_ALIGNED(a)
+ && n_addrs_bad < 8;
+
+ if (n_addrs_bad > 0 && !partial_load_exemption_applies && !partial_dword_load_exemption_applies)
mc_record_address_error( VG_(get_running_tid)(), a, szB, False );
return vbits64;
@@ -4367,6 +4373,7 @@
/*------------------------------------------------------------*/
Bool MC_(clo_partial_loads_ok) = False;
+Bool MC_(clo_partial_dword_loads_ok) = False;
Int MC_(clo_freelist_vol) = 5000000;
LeakCheckMode MC_(clo_leak_check) = LC_Summary;
VgRes MC_(clo_leak_resolution) = Vg_LowRes;
@@ -4377,6 +4384,7 @@
static Bool mc_process_cmd_line_options(Char* arg)
{
VG_BOOL_CLO(arg, "--partial-loads-ok", MC_(clo_partial_loads_ok))
+ else VG_BOOL_CLO(arg, "--partial-dword-loads-ok",MC_(clo_partial_dword_loads_ok))
else VG_BOOL_CLO(arg, "--show-reachable", MC_(clo_show_reachable))
else VG_BOOL_CLO(arg, "--workaround-gcc296-bugs",MC_(clo_workaround_gcc296_bugs))
Index: memcheck/mc_include.h
===================================================================
--- memcheck/mc_include.h (revision 6637)
+++ memcheck/mc_include.h (working copy)
@@ -254,6 +254,9 @@
/* Allow loads from partially-valid addresses? default: YES */
extern Bool MC_(clo_partial_loads_ok);
+/* Allow dword loads from partially-valid addresses? default: YES */
+extern Bool MC_(clo_partial_dword_loads_ok);
+
/* Max volume of the freed blocks queue. */
extern Int MC_(clo_freelist_vol);
Index: include/pub_tool_debuginfo.h
===================================================================
--- include/pub_tool_debuginfo.h (revision 6637)
+++ include/pub_tool_debuginfo.h (working copy)
@@ -77,6 +77,8 @@
/* Succeeds if the address is within a shared object or the main executable.
It doesn't matter if debug info is present or not. */
extern Bool VG_(get_objname) ( Addr a, Char* objname, Int n_objname );
+/* Returns the mapped base of the object for a given eip */
+extern Bool VG_(get_objbase) ( Addr a, Addr* base);
/* Puts into 'buf' info about the code address %eip: the address, function
name (if known) and filename/line number (if known), like this:
|
|
From: Julian S. <js...@ac...> - 2007-03-10 12:26:40
|
[Marginally OT ..] On Friday 09 March 2007 19:15, Ashley Pittman wrote: > Hi, > > I've been thinking about this patch for a week or two now because of > problems with the a couple of hand-rolled memcpy() routines we have, a > lot of our code is clean on 64 bit platforms when the > --partial-loads-ok=3Dyes flag is used but fails with short buffer overruns > on i686, this patch was an attempt to fix that. > [...] I wonder if eventually we will have to deal with a more general version of this problem or something closely related, as a result of gcc-4.3. =46rom http://gcc.gnu.org/gcc-4.3/changes.html: Code generation of block move (memcpy) and block set (memset) was rewritten. GCC can now pick the best algorithm (loop, unrolled loop, instruction with rep prefix or a library call) based on the size of the block being copied and the CPU being optimized for. A new option -minline-stringops-dynamically has been added. With this option string operations of unknown size are expanded such that small blocks are copied by in-line code, while for large blocks a library call is used. Th= is results in faster code than -minline-all-stringops when the library implementation is capable of using cache hierarchy hints. [...] The current strategy (which is also used by Ashley's patch) is: * With --partial-loads-ok=3Dno [the default] for a partially or completely invalid read, report a bad-address error and return 'defined' for the definedness (V) bits corresponding to the invalid addresses. That sounds strange but avoids error cascades. * With --partial-loads-ok=3Dyes, act as above, but don't emit an error under certain circumstances as defined by partial_load_exemption_applies in the code. Problem with this is, since the V bits from the unaddressible areas are set to 'defined', if the program goes on to use data read from such areas (eg beyond block ends) there will be no warning. Perhaps a better fix is to set those V bits to 'undefined' in that case. iow, for a read which partially overlaps the end of a block: =2D With --partial-loads-ok=3Dno [the default] give an addressing error, and set the corresponding V bits to 'defined' =2D With --partial-loads-ok=3Dyes do not give an addressing error, and set the corresponding V bits to 'undefined', so we yelp later if anybody uses them If this sounds all a bit arcane, that's because it is :-) The last part of Sec 2.7 of http://www.valgrind.org/docs/memcheck2005.pdf might help. J |
|
From: Ashley P. <as...@qu...> - 2007-03-12 12:18:39
|
On Sat, 2007-03-10 at 12:24 +0000, Julian Seward wrote: > [Marginally OT ..] I wouldn't have said so. > Perhaps a better fix is to set those V bits to 'undefined' in that case. > > iow, for a read which partially overlaps the end of a block: > > - With --partial-loads-ok=no [the default] > give an addressing error, and set the corresponding V bits to 'defined' > > - With --partial-loads-ok=yes > do not give an addressing error, and set the corresponding V bits > to 'undefined', so we yelp later if anybody uses them > > If this sounds all a bit arcane, that's because it is :-) The last part > of Sec 2.7 of http://www.valgrind.org/docs/memcheck2005.pdf might help. I had to read that two or three times to understand it but yes I think you are right, it's probably "OK" to set the Vbits to defined if a error was raised when they were read but if the error was suppressed then they should be set to undefined. This highlights the sensible but not always obvious approach that when diagnosing programs with memcheck errors the errors should be tackled in chronological order rather than in order of severity, in particular earlier "invalid read" errors could have the effect of hiding later "undefined behaviour" errors. Ashley, |
|
From: Julian S. <js...@ac...> - 2007-03-10 12:37:51
|
I presume you're meaning 32-bit for 'word' and 64-bit for 'dword' here, and this is on a 32-bit target. > Unfortunatly in practice it only seems to fix a small % of errors for us > because our copy routines only use dword copies for larger messages and > word copies for smaller messages, I'm still seeing errors of the form > shown if anybody can recommend a way of suppressing them properly. The > key thing is it's a four byte aligned/eight byte misaligned word read > immediately after a chunk of addressable memory and the suppressions > format doesn't let me specify this. > > RMS_RANK 0 > Invalid read of size 4 > at 0x4305C19: elan_tportTxStart (tportTx.c:299) > by 0x4245705: MPID_ELAN_SendContig (adi2send.c:70) > by 0x424566F: MPID_SendContig (adi2send.c:139) > by 0x4242044: MPID_SendDatatype (adi2hsend.c:66) > by 0x426A595: PMPI_Send (send.c:91) > by 0x8048F46: main (sendvector.c:94) > Address 0x5f0f41c is 0 bytes after a block of size 20 alloc'd > at 0x401B742: malloc (vg_replace_malloc.c:207) > by 0x42443D7: MPID_PackMessage (adi2mpack.c:37) > by 0x424200C: MPID_SendDatatype (adi2hsend.c:62) > by 0x426A595: PMPI_Send (send.c:91) > by 0x8048F46: main (sendvector.c:94) Um, this is a bit hard to think about without a specific example to prod at. Could you make one? J |
|
From: Ashley P. <as...@qu...> - 2007-03-12 18:14:56
|
On Sat, 2007-03-10 at 12:35 +0000, Julian Seward wrote:
> I presume you're meaning 32-bit for 'word' and 64-bit for 'dword' here,
> and this is on a 32-bit target.
Yes, word=4 bytes, d(ouble)word=8 bytes.
> > Unfortunatly in practice it only seems to fix a small % of errors for us
> > because our copy routines only use dword copies for larger messages and
> > word copies for smaller messages, I'm still seeing errors of the form
> > shown if anybody can recommend a way of suppressing them properly. The
> > key thing is it's a four byte aligned/eight byte misaligned word read
> > immediately after a chunk of addressable memory and the suppressions
> > format doesn't let me specify this.
> >
> > RMS_RANK 0
> > Invalid read of size 4
> > at 0x4305C19: elan_tportTxStart (tportTx.c:299)
> > by 0x4245705: MPID_ELAN_SendContig (adi2send.c:70)
> > by 0x424566F: MPID_SendContig (adi2send.c:139)
> > by 0x4242044: MPID_SendDatatype (adi2hsend.c:66)
> > by 0x426A595: PMPI_Send (send.c:91)
> > by 0x8048F46: main (sendvector.c:94)
> > Address 0x5f0f41c is 0 bytes after a block of size 20 alloc'd
> > at 0x401B742: malloc (vg_replace_malloc.c:207)
> > by 0x42443D7: MPID_PackMessage (adi2mpack.c:37)
> > by 0x424200C: MPID_SendDatatype (adi2hsend.c:62)
> > by 0x426A595: PMPI_Send (send.c:91)
> > by 0x8048F46: main (sendvector.c:94)
>
> Um, this is a bit hard to think about without a specific example
> to prod at. Could you make one?
Assume that shm is a shared memory fifo and the process copying out
knows to read the size from the correct slot and only loads the relevant
data. In our experiance the use of a copy routine like this is
significantly faster than memcpy() in the cases we care about.
The code complains on i686 and x86_64 normally, --partial-loads-ok=yes
allows it to run cleanly on x86_64 but it still reports errors on i686.
Originally I had suppressions for the warning but then I realised that
it suppressed the case where invalid addresses were being passed to
send_data()
#include <inttypes.h>
#define ELAN_ALIGNUP(x,a) ((typeof(x))(((uintptr_t)(x) + ((a)-1)) &
(-(a)))) /* 'a' power of 2 */
uint64_t *shm;
void copy (void *dst, void *src, int size) {
if ( (int)src & 7 ) {
memcpy(dst,src,size);
return;
}
int ndwords = (ELAN_ALIGNUP(size, 8) >> 3);
int i,idx;
uint64_t *s = src;
uint64_t *d = dst;
idx = 0;
for ( i = 0 ; i < ndwords ; i++ )
d[idx++] = s[i];
}
int send_data (int idx, void *base, int size) {
shm[0] = idx;
shm[1] = size;
copy(&shm[2],base,size);
}
int main () {
shm = malloc(1024*64);
int i;
for ( i = 0 ; i < 32 ; i++ ) {
char *buf = malloc(i);
send_data(42,buf,i);
free(buf);
}
return 0;
}
|