It is understandable that XDarwin under MacOSX, having to swim in the syrup-like Quartz, is slower than XFree86 under LinuxPPC, but why is it even slower now than it used to be? And why is the fullscreen mode slower than the rootless mode?
I did some tests with x11perf -scroll-500 (yes, i know..., but this *is* real). On my iBook 466MHz (aty128 card), I had a rootless-patched (pre-fb) XF-4.1 that used to give rates of 30/s(rootless) and 6/s(fullscreen). And I think, but I am not sure, that the same speed was still there in the CVS version as of a couple of days ago. The new version now gives 6.3/s(rootless) and 1.3/s(fullscreen). For comparison, the same machine under LinuxPPC and XFree-4.1.99 gives 300/s (!).
Can Quartz really be made responsible for a slowdown of a factor of more than 200?
XDarwin is slower than LinuxPPC for two reasons. First of all, XDarwin is absolutely not hardware accelerated. I assume most of the difference is here. Second, we are sitting on top of Cocoa and Quartz, which are doing lots of image processing of their own.
Pre-4.1 rootless is faster than the current rootless, because it's been switched from Carbon to Cocoa for drawing, which has added an extra copy in the middle. It crashes lots less but runs slower. This should change eventually, if Cocoa exposes the lower-level buffer or some of the private CoreGraphics headers are released.
(Aside: I've been absolutely drooling over the CoreGraphics headers. I may implement a CoreGraphics version and then see if I can convince somebody to release the functions I use.)
I don't know why rootless is faster than full-screen. Perhaps the direct display mode is slow unless we use it in a particular way. Looks like it would be faster to implement full-screen in a big window. Hmm.
The remaining difference between pre-4.1 and 4.1 is probably fb itself. fb is easier to hardware-accelerate, and it allows all sorts of image compositing operations, but it may be less highly tuned than cfb, or maybe it's just fundamentally slower. Or maybe we forgot to turn optimizations on when we compiled the binaries...
I did a bunch of speed comparisons when 4.1 came out by running the entire x11perf suite. I didn't post any of them because I couldn't wasn't really sure I believed the results. In any case, here's some general comments:
1. You really should also run tests in console mode for comparision. Except for very rare cases, console mode is currently the fastest.
2. In my tests the 4.0.3 console mode (based on cfb) was generally far slower than 4.1 console mode (based on fb). In some tests it was over 10 times slower.
3. Some important differences between fb and the cfb/mfb combo:
- fb is much smaller
- Cfb/mfb was designed to minimize instruction per primitive while fb minimizes memory references. Fb also tries to minimize memory reads as they are typically much slower than writes.
- Thus fb executes more instructions but touches memory less.
- According to Keith Packard the tradeoff is somewhere between 10 and 40 MHz CPU speed, where slower CPU's will do better with cfb.
4. The 4.1 full screen Quartz mode seemed to come in about the same speed as the old 4.0.3 console code. I wouldn't be surprised if it was slower than console mode, but it seemed like a suspicious coincidence that it was close to the 4.0.3 console results. I never had time to look into this more closely so I dropped it.
Clearly in the future we will need to spend more time improving performance. As Greg mentioned, the single biggest improvement we can make here is hardware acceleration. This is a high priority right after rootless support for the 4.2 release.
In any case careful performance comparisons by users between various modes and versions is very welcome. It could be that there was just something strange about the CVS code when I built the XDarwin1.0a1 snapshot as improvements are ongoing in all other areas of the X server while we are working on rootless.
Yeah, something is strange allright... I tried doing some tests on standard XFree 4.1, patched XFree 4.1 in rootless and patched XFree 4.1 in full screen.
In my case, rootless is by far the slowest. Full screen mode actually speeded up some after patching - not a lot, but it was definately not slower. Machine is a 1st gen iMac DV/400, Rage 128 graphics.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.