Re: [linuxsh-dev] Improved TLB miss handler (second attempt)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Paul

On Mon, 18 Nov 2002 13:09:06 -0500 pau...@ti... wrote:

> Stuart,
> 
> Looks good. Do you have any performance numbers showing how this
> compares to the old handler?

I did some performance testing while writing the code, but thought I'd
better go back and get some figures from unmodified code for comparison.

This is all on a 200MHz ST40GX1 + 112MB RAM

1. Build util-linux from clean:

2.4.17		14m37
2.4.18		14m08
2.4.18+tlb	 8m55

All files accessed over NFS

2. Quake (timedemo demo1)

2.4.18		17.1 FPS
2.4.18+tlb	19.1 FPS

3. LMBench

LMBench figures are less impressive. Most are unchanged excpt those relating
to local communication, context switching (especially for processes with a
big footprint) and memory latency (second line is with the TLB patch):

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
iptest102 Linux 2.4.18_  18.0  108.6  324.9  152.2  494.8   182.8   486.3
iptest102 Linux 2.4.18+  16.0  106.9  314.0  109.4  354.4   112.0   348.9

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
iptest102 Linux 2.4.18_  18.0  60.5 128. 241.7 539.4 371.0 722.8 1116
iptest102 Linux 2.4.18+  16.0  54.2 126. 261.1 548.9 390.6 753.4 1144

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
iptest102 Linux 2.4.18_ 21.0 23.7 39.2   28.8  120.1   48.1   45.5 115.  93.9
iptest102 Linux 2.4.18+ 23.5 24.0 39.9   28.1  136.0   53.0   49.6 136. 100.4

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------  ---- ----- ------    --------    -------
iptest102 Linux 2.4.18_   200  10.0  182.9  295.7
iptest102 Linux 2.4.18+   200  10.0  183.0  189.4    No L2 cache?

The most intersting is actually the graph which LMBench produces showing
memory latency. Previously this had a massive shoulder as soon as the data
set became larger then 256K, ie as soon as as the data set became larger
than the TLB size. Worst case we saw 200nS access times reduced to 1000uS
Thats now down to a more reasonable 235nS.
> This looks like a good candidate to merge into the restructure branch as
> well. Can you roll a patch against that? (There's also some ST40 stuff
> in cache-sh4.c that wants your attention).

I'll give it a try, however there are a raft of other patches I'll
need to roll forwards to 2.5 first for basic board support. Most should
apply pretty cleanly, so it shouldn't take too long, and its something I've
been putting off doing so this will be a good oportunity.

Stuart

> On Mon, 2002-11-18 at 06:51, Stuart Menefy wrote:
> > Attached is my second attempt at an improved TLB miss handler.
> > 
> > Unfortuntaly the previous patch was corrupted, so I've double checked this
> > one, and it applies cleanly to 2.4 brach HEAD (2.4.20-pre10) and almost
> > cleanly to 2.4.18.
> > 
> > This should build for both SH3 and SH4, although it has only been tested
> > on SH4.
> > 
> 
> Regards,
> 
> -- 
> Paul Mundt
> pau...@ti...
> TimeSys Corporation
>