|
From: ozgun e. <ku...@ho...> - 2004-06-13 03:22:59
|
>I suppose ba_value is called 15 mio. times, and you wonder why this only >makes >it to 4.6 mio. L2 misses? Perhaps the access order to the array is not as >random as you thought? I was testing Cachegrind on an executable file (60 MB) constructed from /usr/bin. Trying it on a random file, I get around 99.2% cache misses. This is really amazing! Thanks to Cachegrind I realized there were some fundamental problems with my assumptions. Ozgun. _________________________________________________________________ Help STOP SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail |
|
From: ozgun e. <ku...@ho...> - 2004-06-14 20:54:52
|
>Only 27%!? That's really bad, especially if its L2 -- remember that an L2 >miss can cost 100s of cycles (206 in the worst case on my Athlon). Is there a website (paper) that summarizes the cost of cache misses? I've read a paper that says the cost of accessing the memory is around 15 CPU cyles... >99%? That is amazing. Either your program is pathologically bad, or >Cachegrind is doing something wrong. I don't think Cachegrind is doing anything wrong. You could say I'm (basically) probing random memory that is 32 MB long. Thanks, Ozgun. _________________________________________________________________ Help STOP SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail |
|
From: Nicholas N. <nj...@ca...> - 2004-06-15 08:25:10
|
On Mon, 14 Jun 2004, ozgun erdogan wrote: > Is there a website (paper) that summarizes the cost of cache misses? I've > read a paper that says the cost of accessing the memory is around 15 CPU > cyles... There are many. Google for eg. "cache miss cost" or "cache optimizations". > I don't think Cachegrind is doing anything wrong. You could say I'm > (basically) probing random memory that is 32 MB long. Right, in that case it sounds like Cachegrind is giving the right answers. Can you rethink your data structures? It could make an *enormous* speed difference. N |
|
From: Sebastian K. <Seb...@so...> - 2004-06-15 10:19:42
|
ozgun erdogan wrote: > >> Only 27%!? That's really bad, especially if its L2 -- remember that >> an L2 >> miss can cost 100s of cycles (206 in the worst case on my Athlon). > > > Is there a website (paper) that summarizes the cost of cache misses? You can measure it yourself. For example this little tool (made by a bunch of benchmark guys incl. me): http://www.sf.net/projects/later will tell you memory latency in ns (translating nanoseconds to cpu cycles is easy) with about 5-10% accuracy (run it as a user who has rioght to up process the priority). > I've read a paper that says the cost of accessing the memory is around > 15 CPU cyles... It must be 15-20 years old. Today it's around 100-120 cycles min. on Athnlon 64 and about 200-300 on Pentium 4. Those are good to best cases. In bad case, when TLB miss occurs multiple those numbers by 3. -- Sebastian Kaliszewski |
|
From: <mv...@cs...> - 2004-06-19 15:55:59
|
> You can measure it yourself. For example this little tool (made by a bu= nch=20 > of benchmark guys incl. me): http://www.sf.net/projects/later will tell= you=20 > memory latency in ns (translating nanoseconds to cpu cycles is easy) wi= th=20 > about 5-10% accuracy (run it as a user who has rioght to up process the= =20 > priority). Dear Sebastian, on windows the "later" utility give me the same time with and without L1. What does this mean? Yes, it is not directly related to valgrind, but I'm curious. Thanks! mario |