From: Stuart M. <stu...@st...> - 2002-11-19 21:01:57
|
Hi Paul On Mon, 18 Nov 2002 13:09:06 -0500 pau...@ti... wrote: > Stuart, > > Looks good. Do you have any performance numbers showing how this > compares to the old handler? I did some performance testing while writing the code, but thought I'd better go back and get some figures from unmodified code for comparison. This is all on a 200MHz ST40GX1 + 112MB RAM 1. Build util-linux from clean: 2.4.17 14m37 2.4.18 14m08 2.4.18+tlb 8m55 All files accessed over NFS 2. Quake (timedemo demo1) 2.4.18 17.1 FPS 2.4.18+tlb 19.1 FPS 3. LMBench LMBench figures are less impressive. Most are unchanged excpt those relating to local communication, context switching (especially for processes with a big footprint) and memory latency (second line is with the TLB patch): Context switching - times in microseconds - smaller is better ------------------------------------------------------------- Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw --------- ------------- ----- ------ ------ ------ ------ ------- ------- iptest102 Linux 2.4.18_ 18.0 108.6 324.9 152.2 494.8 182.8 486.3 iptest102 Linux 2.4.18+ 16.0 106.9 314.0 109.4 354.4 112.0 348.9 *Local* Communication latencies in microseconds - smaller is better ------------------------------------------------------------------- Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn --------- ------------- ----- ----- ---- ----- ----- ----- ----- ---- iptest102 Linux 2.4.18_ 18.0 60.5 128. 241.7 539.4 371.0 722.8 1116 iptest102 Linux 2.4.18+ 16.0 54.2 126. 261.1 548.9 390.6 753.4 1144 *Local* Communication bandwidths in MB/s - bigger is better ----------------------------------------------------------- Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- ----- iptest102 Linux 2.4.18_ 21.0 23.7 39.2 28.8 120.1 48.1 45.5 115. 93.9 iptest102 Linux 2.4.18+ 23.5 24.0 39.9 28.1 136.0 53.0 49.6 136. 100.4 Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) --------------------------------------------------- Host OS Mhz L1 $ L2 $ Main mem Guesses --------- ------------- ---- ----- ------ -------- ------- iptest102 Linux 2.4.18_ 200 10.0 182.9 295.7 iptest102 Linux 2.4.18+ 200 10.0 183.0 189.4 No L2 cache? The most intersting is actually the graph which LMBench produces showing memory latency. Previously this had a massive shoulder as soon as the data set became larger then 256K, ie as soon as as the data set became larger than the TLB size. Worst case we saw 200nS access times reduced to 1000uS Thats now down to a more reasonable 235nS. > This looks like a good candidate to merge into the restructure branch as > well. Can you roll a patch against that? (There's also some ST40 stuff > in cache-sh4.c that wants your attention). I'll give it a try, however there are a raft of other patches I'll need to roll forwards to 2.5 first for basic board support. Most should apply pretty cleanly, so it shouldn't take too long, and its something I've been putting off doing so this will be a good oportunity. Stuart > On Mon, 2002-11-18 at 06:51, Stuart Menefy wrote: > > Attached is my second attempt at an improved TLB miss handler. > > > > Unfortuntaly the previous patch was corrupted, so I've double checked this > > one, and it applies cleanly to 2.4 brach HEAD (2.4.20-pre10) and almost > > cleanly to 2.4.18. > > > > This should build for both SH3 and SH4, although it has only been tested > > on SH4. > > > > Regards, > > -- > Paul Mundt > pau...@ti... > TimeSys Corporation > |