From: Gwenole B. <gb...@di...> - 2004-05-22 05:37:12
|
Hi, > It did the trick. Now SheepShaver rums smoothly and reasonably fast,=20= > especially when color depth of the host display is set to 16 bit (32=20= > bit is somehow slower). This is understandable because we could need to do color conversion as=20= well. The 16-bit version can be processed faster than the 32-bit=20 version because simply more pixels can be processed at a time. However, since you indicate a noticeable difference, it then may be=20 interesting to write MMX/SSE blitters to bring 32-bit blitting to=20 current 16-bit blitting speeds. > This trick doesn't work on the latest CVS version (compilation fails=20= > after modification). Please post the error message + date of CVS version. I have committed a=20= hopefully fixed version of testandset(), aka the one from LinuxThreads=20= so it should really work. However, please only grab the sysdeps.h (testandset) change as the rest=20= could break too. Aka, I sometimes get a crash when a *lot* of ADB=20 events are coming up (e.g. mouse). > Now if only did SheepShaver run OS-X... I wish it were, but I am aware=20= > that it would require adding MMU support, what is a major speed=20 > bottleneck in PearPC. Actually, I think we can get rid of address translation or at least=20 minimize it a lot, even on 32-bit systems. However, we will still need=20= memory protection, but this can be trivially achieved in a fast way=20 with a JIT compiler. An interpreter would require explicit recovery=20 points. > Nonetheless, it's still fun to play with OS 8.6 on my PC. Speedometer=20= > indicates that on P4 1.4 SheepShaver is faster than real PowerMac=20 > 80Mhz. Impressive. And using an Athlon 64 or Opteron at the same speed could easily bring=20= up to a 2x increase. That's because the current (portable) JIT1 engine=20= makes heavy use of load/stores. The next could speed up things by at=20 least a factor of two. Imagine that the current JIT already does less=20 than 1/8 of native speed. ;-) > I will play with the crashes with -O2 later (following your advices). Another user reported that on the forum, setting DISABLE_DBC to 1 with=20= the patch I sent you earlier fixed his problems. That's why I am=20 working on a more complete direct block chaining code that should fix=20 corner cases of the previous method. However, it currently breaks on=20 kpxrun test/test-powerpc, where test-powerpc is a statically linked ppc=20= binary with the same emulator. I'd rather say it's slow and since I am=20= an unlucky guy, it breaks after several minutes... Bye, Gwenol=E9.= |