From: Allen A. <ak...@ar...> - 2009-12-22 19:39:04
|
On Mon, Dec 21, 2009 at 12:05:50PM -0700, tom fogal wrote: | Allen Akin <ak...@ar...> writes: | > The only reliable way to solve the problem is to benchmark the | > operations you want to use, in precisely the conditions you intend | > to use them, and then make a judgement as to whether the performance | > is good enough. | | That's not nearly good enough either. Well, as far as I know it's the only existing technique that's guaranteed to work, so I wouldn't dismiss it too casually. I sympathize, though. I've been down this path and I know it isn't an easy one. | Rendering a 128^3 dataset in a 400x400 window (which is *tiny* for us) | on some Apple platforms (I think some Intels, older 10.5 with some ATI | cards) can take upwards of ten minutes, during which the application | is hung. ... So for your app, you can't afford to benchmark every time -- you have to do it once (during testing, at installation, or the first run after installation, for example) and save the results for later reuse. One good way to do this is to provide a separate "tuning" app or script that runs benchmarks and saves configuration information. Users can run it whenever they install/upgrade your app or their hardware. | ... On Linux systems with really old nvidia drivers, it crashes | the system. On Linux systems with old nvidia drivers, it crashes the | X server. On Linux systems with new nvidia drivers, a watchdog timer | kicks in after 4 seconds and the screen visibly blinks, corruption is | observed in other apps such as firefox, etc. Those are genuine problems, and I don't mean to minimize them, but they're outside the scope of the performance-characterization question. If the systems are that unstable, your app is going to have problems even when you're not benchmarking. | Further, drivers can do whatever they want on just about any GL call. | We recently added such a `benchmark' feature to our application, | choosing the LoD dynamically based on the time to render the previous | few frames. Many drivers will take, e.g. ~50ms for the majority of | frames, and then spike with e.g. a 300ms frame before dropping back | down to 50. Yes, that's one of the many problems that make performance "guarantees" less useful than you'd like. I don't know what's happening in the particular case you've described (maybe some time-consuming memory-management operation that's only triggered occasionally), but it's hard for me to see how you could boil down this sort of complex behavior into a caps bit that simply says "accelerated" or "not accelerated". | This `solution' is like asking app developers to benchmark JITted code; | there's far too much variability for it to actually work. ... Experience suggests that's not true in general. Nearly everyone tests their code on some number of systems and tunes it accordingly. What I'm suggesting is just formalizing that process a bit. Your case may be different, of course, but I don't know enough about it to speak to it directly. | ... I would much | rather deal with a state explosion. Let's assume for the moment that you can come up with a good scheme for encoding the operations you want and the state that applies at the time the operations are executed (and this is by no means easy). Don't forget that you have to handle sequences of operations, and that execution order matters. You'll have to deal with typically tens of GL state variables that can materially affect the performance of each operation, and you'll also have to deal with behavior that's data-dependent (size limits, powers-of-two, etc.), and you'll also have to deal with state that's outside the GL (e.g. characteristics of the drawing surfaces that are determined by the window system). If you try to handle this as a straightforward database problem, I think you're talking billions of records. On the other hand, there may be only tens to hundreds of cases that are relevant to your app, and your benchmarks for checking them can be concise. This is a genuinely hard problem. The industry uses a mix of caps bits (or equivalent, like extension advertisements), minimum requirements for hardware, and pipeline "validation" schemes. As long as the underlying functionality keeps changing, no single one of these approaches will be ideal for long. At least performance-testing is robust. | > I haven't tracked this subject since I left the ARB, but maybe the | > Khronos folks have done some more work on it. | | What is the appropriate forum for this? I'm sorry, I don't know; I've been away from it for too long. One of the other folks here might be able to make a suggestion. If not, I'd check at opengl.org. | As I argue initially and above, since it is the only possibility at | this point, I'm going to argue strongly that extension advertisement | implies some sort of standard on performance. ... It was once (maybe still is) common policy that the presence of an extension implies some acceleration. But as I mentioned before, even if there's some acceleration, it's risky to assume the system is fast enough to use in your particular case. Allen |