Re: [Mesa3d-dev] Yet more r300g fear and loathing...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mon, Dec 21, 2009 at 12:05:50PM -0700, tom fogal wrote:
| Allen Akin <ak...@ar...> writes:
| > The only reliable way to solve the problem is to benchmark the
| > operations you want to use, in precisely the conditions you intend
| > to use them, and then make a judgement as to whether the performance
| > is good enough.
| 
| That's not nearly good enough either.

Well, as far as I know it's the only existing technique that's
guaranteed to work, so I wouldn't dismiss it too casually.  I
sympathize, though.  I've been down this path and I know it isn't an
easy one.

| Rendering a 128^3 dataset in a 400x400 window (which is *tiny* for us)
| on some Apple platforms (I think some Intels, older 10.5 with some ATI
| cards) can take upwards of ten minutes, during which the application
| is hung.  ...

So for your app, you can't afford to benchmark every time -- you have to
do it once (during testing, at installation, or the first run after
installation, for example) and save the results for later reuse.

One good way to do this is to provide a separate "tuning" app or script
that runs benchmarks and saves configuration information.  Users can run
it whenever they install/upgrade your app or their hardware.

|       ... On Linux systems with really old nvidia drivers, it crashes
| the system.  On Linux systems with old nvidia drivers, it crashes the
| X server.  On Linux systems with new nvidia drivers, a watchdog timer
| kicks in after 4 seconds and the screen visibly blinks, corruption is
| observed in other apps such as firefox, etc.

Those are genuine problems, and I don't mean to minimize them, but
they're outside the scope of the performance-characterization question.
If the systems are that unstable, your app is going to have problems
even when you're not benchmarking.

| Further, drivers can do whatever they want on just about any GL call.
| We recently added such a `benchmark' feature to our application,
| choosing the LoD dynamically based on the time to render the previous
| few frames.  Many drivers will take, e.g. ~50ms for the majority of
| frames, and then spike with e.g. a 300ms frame before dropping back
| down to 50.

Yes, that's one of the many problems that make performance "guarantees"
less useful than you'd like.  I don't know what's happening in the
particular case you've described (maybe some time-consuming
memory-management operation that's only triggered occasionally), but
it's hard for me to see how you could boil down this sort of complex
behavior into a caps bit that simply says "accelerated" or "not
accelerated".

| This `solution' is like asking app developers to benchmark JITted code;
| there's far too much variability for it to actually work.  ...

Experience suggests that's not true in general.  Nearly everyone tests
their code on some number of systems and tunes it accordingly.  What I'm
suggesting is just formalizing that process a bit.

Your case may be different, of course, but I don't know enough about it
to speak to it directly.

|                                                        ... I would much
| rather deal with a state explosion.

Let's assume for the moment that you can come up with a good scheme for
encoding the operations you want and the state that applies at the time
the operations are executed (and this is by no means easy).  Don't
forget that you have to handle sequences of operations, and that
execution order matters.  You'll have to deal with typically tens of GL
state variables that can materially affect the performance of each
operation, and you'll also have to deal with behavior that's
data-dependent (size limits, powers-of-two, etc.), and you'll also have
to deal with state that's outside the GL (e.g. characteristics of the
drawing surfaces that are determined by the window system).  If you try
to handle this as a straightforward database problem, I think you're
talking billions of records.

On the other hand, there may be only tens to hundreds of cases that are
relevant to your app, and your benchmarks for checking them can be
concise.

This is a genuinely hard problem.  The industry uses a mix of caps bits
(or equivalent, like extension advertisements), minimum requirements for
hardware, and pipeline "validation" schemes.  As long as the underlying
functionality keeps changing, no single one of these approaches will be
ideal for long.  At least performance-testing is robust.

| > I haven't tracked this subject since I left the ARB, but maybe the
| > Khronos folks have done some more work on it.
| 
| What is the appropriate forum for this?

I'm sorry, I don't know; I've been away from it for too long.  One of
the other folks here might be able to make a suggestion.  If not, I'd
check at opengl.org.

| As I argue initially and above, since it is the only possibility at
| this point, I'm going to argue strongly that extension advertisement
| implies some sort of standard on performance.  ...

It was once (maybe still is) common policy that the presence of an
extension implies some acceleration.  But as I mentioned before, even if
there's some acceleration, it's risky to assume the system is fast
enough to use in your particular case.

Allen