From: Laurent V. <Lau...@wa...> - 2004-06-13 21:50:06
|
Le sam 12/06/2004 =E0 16:16, BlaisorBlade a =E9crit : > I've decided to do benchmarks to check how much SYSEMU saves in benchmark= =20 > which also access memory (memLoop.c) and how much could save the 0 contex= t=20 > switch idea (provided that segmentation has low cost). >=20 > First, about the benchmark on the Laurent Vivier page: I think that the "= 60 %"=20 > number is meaningless - I guess it is that calculated with "real time", w= hich=20 > is not very meaningful IMHO - that is the time from when the process star= t to=20 > when it ends, and counts even time spent by executing other processes. A = more=20 > meaningful difference is done with the sum of user+system time: >=20 > average time (user+system): > - without SYSEMU=20 > 64.910 > - with SYSEMU > 51.321 >=20 > SYSEMU saves (64.910 - 51.321) / 64.910 * 100 % =3D 20,9 % of the time wi= thout=20 > SYSEMU, in this benchmark. Hello Paolo, thank you for your comments. the real question is: how accurate is the command "time" under UML ? I choose "real" time for several reasons: - what user feels is the most important (how many time he waits ?) - I don't really know how is computed "sys" time under UML: is it host + guest "sys" time ? How "time" takes into account the time of the process "ptracing" the user process and, thus, the sys time of the guest kernel ? I made my measurements on a 8 cpus Xeon server with several gigabytes of memory, with no load and only one user: me. Host: real 0m7.920s user+sys 0m7.930s real - (user+sys) -0m0.010s=20 (mmhhh, a negative value, there is really no load ;-) ) So I didn't really explain the measurements I had : w/o SYSEMU: real 6m16.956s 6m17.126s 6m16.461s user+sys 1m03.712s 1m06.577s 1m04.442s w/ SYSEMU: real 3m55.052s 3m56.964s 3m54.179s user+sys 0m52.347s 0m48.481s 0m53.135s Could you explain where we lost : w/o SYSEMU real - (user+sys) 5m13.144s 5m10.549s 5m12.002s=20 w/ SYSEMU real - (user+sys) 3m02.705s 3m08.483s 3m01.044s In the TLB flushes ? in the "ptracing" process ? in other processes ? IMHO, I thought it's in guest kernel, so "real" is more significant than "u= ser+sys". BUT I think you're the real specialist of UML and I'm not... > I've re-benchmarked UML with SYSEMU using memLoop.c which tries to measur= e the=20 > effects of accessing memory: it access one byte per page, thus causing th= e=20 > CPU to reload in the TLB the page table entry (PTE) for that page. IMHO, = this=20 > benchmark shows that most of the gap vs the host is in the 2 remaining CS= per=20 > syscall: the 2 we save with SYSEMU account for about 25% of the getpid=20 > execution, most of the gap is still there. >=20 > In the attached files NPAGES =3D 64 (see source), but I also posted resul= ts with=20 > NPAGES =3D 512. Also, please, don't look at the "elapsed" time: it's=20 > meaningless. >=20 > In fact getpidLoop measures only the cost of TLB flushes, while memLoop a= lso=20 > measures the cost of TLB misses after the TLB flush, which can be compare= d=20 > against memLoopPure, which runs no syscall and thus never flushes the TLB= s. >=20 > To see this, I must be sure that memLoopPure has no TLB fault, i.e. that = the=20 > PTEs for all pages fit in the TLB; this happen when NPAGES =3D 64, not wh= en=20 > NPAGES=3D512. In the two cases, we have working sets of 64 * PAGE_SIZE = =3D 128k=20 > and of 512 * PAGE_SIZE =3D 2 M. >=20 > On the host, memLoop and memLoopPure have similar user time, since there = is=20 > never a TLB flush. When NPAGES =3D 512, each page access causes a TLB mis= s, so=20 > the user time is always similar, both on the host and the guest, and both= =20 > with and without syscalls. >=20 > But when NPAGES =3D 64, on the host the TLB is never flushed (except when= =20 > another process is executing): it is filled only once and then used. >=20 > On the guest, instead, with NPAGES =3D 64 the user time of memLoop is dou= ble=20 > than the memLoopPure one. And since 0.40 s are for the getpid() calls,=20 > touch_mem() uses 0.40 s in memLoopPure and 1.20 s in memLoop: 3 times the= old=20 > time. > -------- > HOST: >=20 > host $ time ./getpidLoop 1000000 >=20 > 0.27user 0.21system 0:00.55elapsed 87%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (70major+11minor)pagefaults 0swaps > -------- > With NPAGES =3D 64: >=20 > host $ time ./memLoop 1000000 >=20 > 1.11user 0.23system 0:01.46elapsed 91%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (79major+75minor)pagefaults 0swaps > ---- > host $ time ./memLoopPure 1000000 > 0.88user 0.00system 0:00.97elapsed 90%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (78major+75minor)pagefaults 0swaps > -------- > With NPAGES =3D 512 >=20 > host $ time ./memLoop 1000000 >=20 > 8.93user 0.24system 0:09.84elapsed 93%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (79major+523minor)pagefaults 0swaps > ---- > host $ time ./memLoopPure 1000000 >=20 > 8.71user 0.01system 0:09.43elapsed 92%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (78major+523minor)pagefaults 0swaps >=20 > ------------ > On the guest, with SYSEMU: >=20 > guest # /usr/bin/time=20 > /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/getpidLoop 1000000 >=20 > 0.42user 3.87system 0:16.09elapsed 26%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+76minor)pagefaults 0swaps > -------- > With NPAGES =3D 64: > ---- > guest # /usr/bin/time /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/memL= oop=20 > 1000000 >=20 > 1.60user 4.00system 0:18.02elapsed 31%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+146minor)pagefaults 0swaps > ---- > guest # /usr/bin/time=20 > /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/memLoopPure 1000000 >=20 > 0.85user 0.05system 0:01.01elapsed 88%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+146minor)pagefaults 0swaps > -------- > With NPAGES =3D 512: >=20 > guest # /usr/bin/time /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/memL= oop=20 > 1000000 >=20 > 9.09user 4.18system 0:28.37elapsed 46%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+594minor)pagefaults 0swaps > ---- > guest # /usr/bin/time=20 > /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/memLoopPure 1000000 >=20 > 8.76user 0.07system 0:11.57elapsed 76%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+594minor)pagefaults 0swaps >=20 > ---------------- > On the guest, without SYSEMU: > (we always about 25% increase for system time vs SYSEMU, except for=20 > memLoopPure, but equal user time: we don't save the TLB misses) >=20 > # /usr/bin/time /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/getpidLoop= =20 > 1000000 > 0.42user 5.01system 0:21.08elapsed 25%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+76minor)pagefaults 0swaps > ---- > With NPAGES =3D 64: >=20 > guest # /usr/bin/time=20 > /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/memLoopPure 1000000 > (about the same, as expected) >=20 > 0.86user 0.02system 0:00.94elapsed 92%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+146minor)pagefaults 0swaps > ---- > guest # /usr/bin/time /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/memL= oop=20 > 1000000 > (about 25% increase for system time, equal user time: we don't save the T= LB=20 > misses) > 1.62user 5.00system 0:26.73elapsed 24%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+146minor)pagefaults 0swaps >=20 > -------- >=20 > With NPAGES =3D 512 >=20 > guest # /usr/bin/time=20 > /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/memLoopPure 1000000 >=20 > 8.84user 0.02system 0:10.86elapsed 81%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+594minor)pagefaults 0swaps >=20 > ---- >=20 > guest # /usr/bin/time /mnt/host/home/paolo/Dati/Sorgenti/Varie/C-C++/memL= oop=20 > 1000000 > 9.15user 5.06system 0:36.66elapsed 38%CPU (0avgtext+0avgdata 0maxresident= )k > 0inputs+0outputs (0major+594minor)pagefaults 0swaps --=20 Laurent Vivier +------------------------------------------------+ "Any sufficiently advanced technology is=20 indistinguishable from magic." -- Arthur C. Clarke "Aller les Bleus" - France 2 - 1 Angleterre |