|
From: Duncan S. <bal...@fr...> - 2005-05-03 11:23:59
|
>From the manual: "3.3.6 Overlapping source and destination blocks The following C library functions copy some data from one memory block to another (or something similar): memcpy(), strcpy(), strncpy(), strcat(), strncat(). The blocks pointed to by their src and dst pointers aren't allowed to overlap. Memcheck checks for this. For example: ==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21) ==27492== at 0x40026CDC: memcpy (mc_replace_strmem.c:71) ==27492== by 0x804865A: main (overlap.c:40) ==27492== by 0x40246335: __libc_start_main (../sysdeps/generic/libc-start.c:129) ==27492== by 0x8048470: (within /auto/homes/njn25/grind/head6/memcheck/tests/overlap) ==27492== You don't want the two blocks to overlap because one of them could get partially trashed by the copying." Maybe it's worth giving more of an explanation here. I've noticed while floating around on the internet that people thing valgrind is just being pedantic when giving this warning and target < source. There are two problems: (1) if copying is done from largest address to smallest address. I don't know of any memcpy that does this. (2) if memcpy zeroes out the target before copying. This has been shown to improve the performance of memcpy on some intel architectures, due to cache effects. Of course it is fatal if there is any overlap between source and target. Most memcpy's don't do this kind of trick, but it's worth keeping in mind. All the best, Duncan. |
|
From: Jeremy F. <je...@go...> - 2005-05-06 09:11:59
|
Duncan Sands wrote:
>Maybe it's worth giving more of an explanation here. I've noticed while floating around on the internet
>that people thing valgrind is just being pedantic when giving this warning and target < source. There are
>two problems:
>
>(1) if copying is done from largest address to smallest address. I don't know of any memcpy that does this.
>
>
The some optimisation guide (AMD? Via?) recommends it for some cases.
>(2) if memcpy zeroes out the target before copying. This has been shown to improve the performance of memcpy
>on some intel architectures, due to cache effects. Of course it is fatal if there is any overlap between
>source and target. Most memcpy's don't do this kind of trick, but it's worth keeping in mind.
>
>
I think it's common on the PPC, because the dcbz instruction zeros a
cache line in preparation for the destination write (this is based on 5
year old PPC experience, so I don't know what's best on a modern
implementation).
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-07-19 00:48:35
|
On Thu, 5 May 2005, Jeremy Fitzhardinge wrote: >> Maybe it's worth giving more of an explanation here. I've noticed while floating around on the internet >> that people thing valgrind is just being pedantic when giving this warning and target < source. There are >> two problems: >> >> (1) if copying is done from largest address to smallest address. I don't know of any memcpy that does this. >> > > The some optimisation guide (AMD? Via?) recommends it for some cases. > >> (2) if memcpy zeroes out the target before copying. This has been shown to improve the performance of memcpy >> on some intel architectures, due to cache effects. Of course it is fatal if there is any overlap between >> source and target. Most memcpy's don't do this kind of trick, but it's worth keeping in mind. >> > > I think it's common on the PPC, because the dcbz instruction zeros a > cache line in preparation for the destination write (this is based on 5 > year old PPC experience, so I don't know what's best on a modern > implementation). I've updated the 3.0 docs for this. N |
|
From: Julian S. <js...@ac...> - 2005-07-19 07:45:11
|
> >> (2) if memcpy zeroes out the target before copying. This has been shown > >> to improve the performance of memcpy on some intel architectures, due to > >> cache effects. Of course it is fatal if there is any overlap between > >> source and target. Most memcpy's don't do this kind of trick, but it's > >> worth keeping in mind. > > > > I think it's common on the PPC, because the dcbz instruction zeros a > > cache line in preparation for the destination write (this is based on 5 > > year old PPC experience, so I don't know what's best on a modern > > implementation). Yeh, it allocates the line in L2 without having to fetch it from memory since we know it's just about to be completely overwritten. Certainly memset in glibc uses dcbz; not sure about memcpy. J |