From: Matthias D. <mat...@we...> - 2001-10-30 18:04:00
|
Hey there... ]) powered by LINUX, kernel 2.4.13-ac4 ([ I have played all day with the fast_memcpy probing. I realized that it doesn't produce reliable test results - at least on my machine (PIII 500). Whenever I start probe_fast_memcpy(...) for the first time - the results are just fine on my PIII here (mostly SSE or MMXEXT are the fastest). But should I run probe_fast_memcpy(...) a few times after the first time, it will always produce totally different (and lot worse) results (mostly linux kernel memcpy is the fastest). If I try it 30 minutes later after the machine has been used a bit, the results are back to the first time (SSE/MMXEXT best). I have the following lines in probe_fast_memcpy(...) under suspicion: t = rdtsc(); for(j=0;j<100;j++) memcpy_method[i].function(buf1,buf2,BUFSIZE); t = rdtsc() - t; Actually just that memcpy line. We earlier in the source did a... /* make sure buffers are present on physical memory */ memcpy(buf1,buf2,BUFSIZE); ...and now we are repeating that process 100 times each with every memcpy version we have. This might trigger some caching mechanism with the processor or some optimization with gas... I really don't know. Nevertheless we are moving the same data, from the and to the same place. Changing the lines to... t = rdtsc(); for(j=0;j<50;j++) { memcpy_method[i].function(buf2,buf1,BUFSIZE); memcpy_method[i].function(buf1,buf2,BUFSIZE); } t = rdtsc() - t; ...does solve the problem. Like I said. I have no explanation why -- but it works like a charm that way. Now we also really move the data back and forth. Which is closer to reality AFAIK - or at least we won't copy some identical data quite a few times to the same place within xine (usually). Please note that I have also changed the j<100 to j<50 because we are doing memcpy twice so 50 iterations are enough. I didn't commit the change to CVS because this is your area Miguel :) and I'm not 100% sure if I am really right. So what do you think...? :-) So long, Matt (who is hoping we wasn't totally off the road *grin*) -- ]) mat...@we..., GPG 0x51FA41C6, matthew2k on JABBER.org, ICQ# 89464954 ([ |
From: Miguel F. <mi...@ce...> - 2001-10-30 18:26:09
|
Hi Matthias, Matthias Dahl wrote: > I didn't commit the change to CVS because this is your area Miguel :) and I'm > not 100% sure if I am really right. So what do you think...? :-) Well, i also had some small variation with the old method, but nothing as dramatic as you said (most likely due to other threads on my system). If it makes the probe better for your system then commit it! I can't think of any side effect, it should be as effective or better than the original... Regards, Miguel |
From: Matthias D. <mat...@we...> - 2001-10-30 19:59:57
|
On Tue, Oct 30, 2001 at 04:29:49PM -0200, Miguel Freitas wrote: > Hi Matthias, Hey Miguel... > Well, i also had some small variation with the old method, but nothing > as dramatic as you said (most likely due to other threads on my system). Well to be sure I killed all non-vital processes on my system when I played with the probing. The only explanation I can think of is that maybe gas or the processor itself are doing some optimization or caching here... on what processor have you tested the probing? I am using a P3 - so maybe it is a specific feature/problem of it...?! Oh man... :-) > If it makes the probe better for your system then commit it! Done. :-) So long, Matt. -- ]) mat...@we..., GPG 0x51FA41C6, matthew2k on JABBER.org, ICQ# 89464954 ([ No snowflake in an avalanche ever feels responsible. |