From: Owen T. <ot...@re...> - 2004-11-05 22:08:34
Attachments:
texprof.h
|
Over a last few days, I've been working on a tool to allow watching a live view of the texture usage of a DRI application. I'm not going to go into a lot of detail here - there is a more complete description at: http://fishsoup.net/software/texturetop/ And an even more complete README files linked to from here, but the 10 second description of the change to the DRI source code is this: If the LIBGL_PROFILE_SERVER is set to [HOST]:PORT, then a separate thread is created that connects to the server and waits for the server to request updates. In request to each UPDATE command, it dumps all the creations/deletions/modifications of the state back to the server via a text-based protoocl. I've attached the header file for the new internal 'texprof.h' to the mail. It's pretty clean and simple to use ... the changes to the Savage driver to support texture profiling aren't large. That's the only card I've written support for, since I wanted to concentrate on getting the framework working. It may be possible to get a lot of the other cards working at once by adding support to texmem.c. As far as I know, other than adding support for additional cards, this change is reasonably complete and bug-free. There's a lot of other information that could be reported over the profiling interface, but what is there now seems to work well. (One limitation is that only 2d textures are supported at the moment.) My questions would be: - Is this interesting to anybody but me? It's been quite useful to me so far in just the couple of hours I've had it working... - Does this duplicate something else that is out there already? - What's the chance of getting this integrated into the main sources? Thanks for any feedback, Owen |
From: Keith W. <ke...@tu...> - 2004-11-06 18:11:56
|
Owen Taylor wrote: > Over a last few days, I've been working on a tool to allow watching > a live view of the texture usage of a DRI application. I'm not going to > go into a lot of detail here - there is a more complete description at: > > http://fishsoup.net/software/texturetop/ > > And an even more complete README files linked to from here, but the 10 > second description of the change to the DRI source code is this: > > If the LIBGL_PROFILE_SERVER is set to [HOST]:PORT, then a separate > thread is created that connects to the server and waits for the > server to request updates. In request to each UPDATE command, it > dumps all the creations/deletions/modifications of the state > back to the server via a text-based protoocl. > > I've attached the header file for the new internal 'texprof.h' to the > mail. It's pretty clean and simple to use ... the changes to the Savage > driver to support texture profiling aren't large. That's the only > card I've written support for, since I wanted to concentrate on getting > the framework working. It may be possible to get a lot of the other > cards working at once by adding support to texmem.c. > > As far as I know, other than adding support for additional cards, this > change is reasonably complete and bug-free. There's a lot of other > information that could be reported over the profiling interface, but > what is there now seems to work well. (One limitation is that only > 2d textures are supported at the moment.) > > My questions would be: > > - Is this interesting to anybody but me? It's been quite useful to me > so far in just the couple of hours I've had it working... It's definitely interesting. Have you been using it to tune an application, or from the point of view of tuning driver behaviour? > - Does this duplicate something else that is out there already? Not to my knowledge. > - What's the chance of getting this integrated into the main sources? I think pretty good. Mesa wraps some things that are called directly by the patch, I'm not sure what the gotchas with libraries creating threads without the applications knowledge are, but if it's only done in response to an environment flag & maybe a compile option, I can't see that being a problem. It'd be good to let people play around with it a little first, which for me means early in the week, most likely. Keith |
From: Owen T. <ot...@re...> - 2004-11-06 20:42:19
|
On Sat, 2004-11-06 at 18:11 +0000, Keith Whitwell wrote: > > My questions would be: > >=20 > > - Is this interesting to anybody but me? It's been quite useful to me=20 > > so far in just the couple of hours I've had it working... >=20 > It's definitely interesting. Have you been using it to tune an applicati= on,=20 > or from the point of view of tuning driver behaviour? Well, some of both. I was seeing some bad performance on the app that I was blaming on the driver... I thought I was running out of texture ram and the LRU swap algorithm was causing problems. But once I had texturetop up and running, it was clear that I was just doing something stupid in my app :-) > > - Does this duplicate something else that is out there already? >=20 > Not to my knowledge. Cool. > > - What's the chance of getting this integrated into the main sources? >=20 > I think pretty good. Mesa wraps some things that are called directly by = the=20 > patch, I'm not sure what the gotchas with libraries creating threads with= out=20 > the applications knowledge are, but if it's only done in response to an=20 > environment flag & maybe a compile option, I can't see that being a probl= em. One of the reasons I used a separate thread was that it is pretty much invisible to the application - requiring application main loop integration or even forking off a child process would be a lot more intrusive. (Forking off child processes gets you in trouble with=20 SIGCHILD) Given a POSIX compliant thread implementation (like NPTL), the main possible area of difficulty I can think of is signal delivery - POSIX specifies that signals are delivered randomly to a thread not blocking that signal, so if the app is, say, counting on a select() call in their main thread being woken up when a signal comes in, that might not happen. Calling pthread_sigmask() with sigfillset() when the thread starts would help that. Another occurs to me in this area is that the patch probably should set CLOEXEC on its file descriptors. But these are small details, and as you say, if its something that has to turned explicitly, it's not a big deal if it causes a particular=20 app to misbehave. > It'd be good to let people play around with it a little first, which for = me=20 > means early in the week, most likely. Sounds good, Owen |
From: Dave A. <ai...@li...> - 2004-11-08 01:20:44
|
> > - What's the chance of getting this integrated into the main sources? > I've just patched this into the radeon driver (and texmem.c) http://freedesktop.org/~airlied/patches/dri/radeon_texturetop.patch It contains pieces of the original patch and is against my working tree which is a few weeks old so it may not apply cleanly... I might get time later to move it up to HEAD.... Adding support for another cards that uses texmem should be simple enough work... Owen, I also had to change the length check in your texturetop apps main.c from an == to a > or else it was telling me it was getting an oversize app.. now to see what my app does... Dave. -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie pam_smb / Linux DECstation / Linux VAX / ILUG person |
From: Dave A. <ai...@li...> - 2004-11-08 01:23:34
|
> I've just patched this into the radeon driver (and texmem.c) > http://freedesktop.org/~airlied/patches/dri/radeon_texturetop.patch Oh and this patch may not work completly... it looks like it should but I might have missed something all my height/widths turn up 0.. Dave. > > It contains pieces of the original patch and is against my working tree > which is a few weeks old so it may not apply cleanly... I might get time > later to move it up to HEAD.... > > Adding support for another cards that uses texmem should be simple enough > work... > > Owen, I also had to change the length check in your texturetop apps main.c > from an == to a > or else it was telling me it was getting an oversize > app.. > > now to see what my app does... > > Dave. > > -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie pam_smb / Linux DECstation / Linux VAX / ILUG person |
From: Dave A. <ai...@li...> - 2004-11-08 02:44:00
|
> > I've just patched this into the radeon driver (and texmem.c) > > http://freedesktop.org/~airlied/patches/dri/radeon_texturetop.patch Okay the patch above should now work on the radeon driver, I had forgotten a call to update the texture.... Just some comments and criticisms from a quick look over it: The protocol processing code in the texturetop (server code - it is called client code but it listens on a socket which makes it a server in my mind), is fragile, starting manytex with -n 99 makes it crap out on my system, I think the processing of the buffer from the client isn't robust enough, it should probably use a proper circular buffer or a state machine based decode.. I also would wonder if a packet protocol might not be better placed for a protocol that goes between two processes where no human is going to be looking at it,, it certainly would be more efficient, <pkttype><length><info> string processing in C is always ugly..... I might get a chance to mess more with it later, my main application uses between 10s and 1000s of textures!! so it is a great way to stress it... also a sorting option like top would be nice ;-) I would like to see something like this make its way into the DRI at some stage, it could be built in at compile time if someone was worried about the overhead it might bring.. Regards, Dave. -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie pam_smb / Linux DECstation / Linux VAX / ILUG person |
From: Owen T. <ot...@re...> - 2004-11-08 04:59:54
|
On Mon, 2004-11-08 at 02:43 +0000, Dave Airlie wrote: > > > I've just patched this into the radeon driver (and texmem.c) > > > http://freedesktop.org/~airlied/patches/dri/radeon_texturetop.patch >=20 > Okay the patch above should now work on the radeon driver, I had forgotte= n > a call to update the texture.... >=20 > Just some comments and criticisms from a quick look over it: >=20 > The protocol processing code in the texturetop (server code - it > is called client code but it listens on a socket which makes it a server > in my mind) I reversed the client/server setup at one point in the development,=20 and it is still the "server" that sends commands and the client that responds. :-) Yes, that file was confusingly named and I've renamed it to texprof-lib.[ch] I tried to say "application" "profiler" most places in the docs. > , is fragile, starting manytex with -n 99 makes it crap out on > my system, Ah, in addition to the bug you pointed out with it incorrectly reporting overlong commands, there was another bug when a \r occurred right at the end of the buffer. I also fixed an obvious problem with manytex -n 99 - it wasn't truncating the list of textures to fit into the screen. All three problems are fixed in the texturetop-1.1 I just put up on the texturetop web page. > I think the processing of the buffer from the client isn't > robust enough, it should probably use a proper circular buffer or a state > machine based decode.. I wrote it pretty carefully... I'm not saying that there aren't more bugs, but it should be fundamentally pretty sound. (It is a simple state machine over the lines if you look inside textprof-lib.c. Each line has a fixed format.) > I also would wonder if a packet protocol might not be better placed for a > protocol that goes between two processes where no human is going to be > looking at it,, it certainly would be more efficient, >=20 > <pkttype><length><info> >=20 > string processing in C is always ugly..... A binary protocol would certainly be more efficient. I chose a text based protocol mostly because it's a whole lot easier to debug - if an error occurs, just print out the line that caused the error. And despite the poor string processing facilities of C, it's still probably a bit easier to write than a binary protocol. But if efficiency turns out to be a problem, I'd certainly welcome someone reworking it to use a binary protocol. > I might get a chance to mess more with it later, my main application uses > between 10s and 1000s of textures!! so it is a great way to stress it... > also a sorting option like top would be nice ;-) That would be nice. The main difficulty is actually not the sorting, but taking keystrokes, so far texturetop has the most simplistic output routines you can imagine, and doesn't do input at all; not that it is that hard to figure out the necessary tcsetattr() invocation. (At some point using curses rather than hardcoding VT100 escape sequences would be a win, but I wanted to avoid dragging in an extra dependency...) Anyways many thanks for trying this out and for the feedback. Regards, Owen |
From: Ian R. <id...@us...> - 2004-11-08 19:32:37
|
Owen Taylor wrote: > Over a last few days, I've been working on a tool to allow watching > a live view of the texture usage of a DRI application. I'm not going to > go into a lot of detail here - there is a more complete description at: > > http://fishsoup.net/software/texturetop/ > > And an even more complete README files linked to from here, but the 10 > second description of the change to the DRI source code is this: This is something I've been thinking about ever since I saw the profiling tools in Nvidia's drivers at SIGGRAPH. There's a LOT of information that would be useful to get out of the driver about performance: - texture use / abuse - number of primitives in each begin / end block - operations causing stalls in the driver (e.g., places where the driver has to explicitly wait on the hardware). - time to render each frame - etc. We could emit this data as a per-frame (where a frame ends with a SwapBuffers call on a double buffered visual or glFinish on a single buffered visual) XML document. Each context would have a named pipe where some app could read the data. I hadn't really thought it through enough (or had time to work on it) to make worth posting to the list about it. It sounds like your texturetop interface is in a similar vein. |
From: Allen A. <ak...@po...> - 2004-11-08 21:49:49
|
On Mon, Nov 08, 2004 at 11:32:24AM -0800, Ian Romanick wrote: | This is something I've been thinking about ever since I saw the | profiling tools in Nvidia's drivers at SIGGRAPH. There's a LOT of | information that would be useful to get out of the driver about performance Have you taken a look at the SGIX_instruments extension? It provides a framework that's intended for gathering profiling information asynchronously. The idea was that you'd add separate extensions that defined the actual instrumentation (SGIX_ir_instrument1 was an early example). I searched my archives for things I'd written on this subject in the past. The following is probably the most comprehensive summary. Some of it's out-of-date now, or has implications for hardware design that's out of our control, but some of it still looks useful. Allen Purposes of Instrumentation Tuning Analyzing the app or database to improve overall performance and/or rendering quality. Typically done during the development phase. Examples: determining what percentage of triangles are clipped, or how well texture memory is utilized. Load Monitoring Gathering information to modify the behavior of the app or the structure of the database dynamically, to maintain a constant frame-rate. Typically done in real-time by production apps. Examples: determining how much time is spent in geometric processing and how much time in pixel-fill, in order to choose object level-of-detail. Debugging/Testing Graphics systems are extremely complex, and their behavior isn't always predictable. We can anticipate a need for machine-specific instrumentation in order to understand surprisingly high or low performance of an application, or for use during driver development. Infrastructure The SGIX_instruments extension provides scaffolding for pipeline instrumentation. The framework allows the app to: Specify a buffer into which measurements will be delivered (asynchronously) by the pipe. Enable/disable an arbitrary collection of instruments. Start/stop/snapshot measurements by the currently-enabled set of instruments. Label a measurement with a user-selectable marker. Poll or wait for completion of a particular measurement. We must write one or more new extensions to define instruments that fit into the SGIX_instruments framework. This outline sketches some of the instruments that might be appropriate. Since some measurements are performed by real-time apps, it's important to keep the overhead low. The asynchronous delivery scheme helps with this, but it's also desirable to keep other issues in mind (for example, avoid flushing the pipe if at all possible). Suggested Instruments Rendering Statistics Number of bytes of data sent to pipe Number of bytes of data sent from pipe These are used to identify data transfer bottlenecks arising from geometry-path commands, pixel-path commands, and texture management. Number of geometric primitives sent to pipe Number of geometric primitives trivially accepted or rejected Number of geometric primitives subjected to 3D clipping Number of geometric primitives resulting from 3D clipping Number of geometric primitives face-culled Number of matrix ops sent to pipe These measure culling effectiveness and determine the cause of geometry-processing bottlenecks (e.g., too many vertices, too much clipping, or too many attribute changes). Number of DrawPixels commands sent to pipe Number of Bitmap commands sent to pipe Number of ReadPixels commands sent to pipe Number of CopyPixels commands sent to the pipe Together with the data transfer statistics, these help determine whether pixel-oriented apps are running into data transfer or pixel operation setup bottlenecks. Number of MakeCurrent/MakeCurrentRead commands executed This should help determine when apps are using more than the optimal number of contexts, and thus causing an inordinate number of context switches. Number of fragments generated, for each rasterizer Number of fragments passing depth test, for each rasterizer Together with other statistics, these help estimate average triangle size, depth complexity, and effectiveness of depth sorting. Open Issues: Is there a way to track the number of bytes processed by CopyPixels-style operations? These aren't accounted-for by the transfers to and from the pipe. Texture Statistics Number of texture binds performed Pinpoints an important attribute-change bottleneck. Number of TexImage/TexSubImage commands Number of CopyTexImage/CopyTexSubImage commands Number of texture downloads initiated by texture manager Number of GetTexImage commands Number of texture uploads initiated by texture manager Together with other stats, determines cost of texture management operations. Texture memory utilization Initial/Max/Min/Final fraction of texture memory in use over the measurement interval. Open issues: Number of texture fetches, per rasterizer? Timing Measurements Return these times for all commands appearing between two ``bracketing'' commands issued by the app: Host CPU time (usecs) Geometry (total for vector and scalar units) processing time (usecs) Rasterization (for each rasterizer) processing time (usecs) Wall clock time (usecs) Note that the above measurements should reflect the ``useful work'' performed by the associated pipe stages; they should be repeatable no matter what is in the pipe before the first bracketing command is issued and no matter what is placed in the pipe after the second bracketing command is issued. (Thus, counting FIFO full/empty states isn't sufficient.) Instruments NOT Recommended Number of FIFO high-water interrupts Not sure this is needed. Provided we do a good job of accounting for time spent in each stage of the pipe, that accounting should be of more use than the raw number of interrupts, and interpreting it should involve less system-dependent code. Number of graphics context switches Superseded by recording the number of MakeCurrent commands (which should be more useful on a per-context basis than the global number of context switches per pipe). Number of geometric primitives scissored See note under Issues/Resolutions below. Number of bytes transferred due to DrawPixel/Bitmap commands Number of bytes transferred due to ReadPixel commands Number of bytes transferred due to CopyPixels commands Number of bytes of texture data transferred as a result of TexImage, CopyTexImage, GetTexImage, etc. These seem reasonable, but I suspect we'll get adequate bang-for-the-buck just by counting the number of bytes transferred to/from the pipe. (Tracking bytes transferred for Copy* operations is an open issue.) Coarse Z-culling stats of some kind? My current guess is that if we can provide statistics on number of fragments generated and the number of fragments passing the depth test, it's unlikely we'll need more stats on coarse Z-culling. Issues/Resolutions In principle, the application can handle some of the measurements described above (counting the number of times a given command is executed, for example). Should we bother implementing instruments to capture such measurements? I believe we should. Although it makes good design sense to avoid duplicating what's easily accomplished in the apps, there are two problems with requiring users to make measurements on their own: (1) Doing so could require wholesale changes to source code. (Consider what would be needed to handle display lists correctly.) It's unlikely many users would do this. (2) Users typically don't have access to the source code for high-level libraries that issue OpenGL commands, so requiring source code changes makes it impractical for them to measure the commands executed by those libraries. Why not use a library like GLS or a utility like ogldebug to trace OpenGL commands and make such measurements? Good arguments have been made for this, but I'm not completely convinced. In some cases, using GLS or ogldebug mitigates the problems mentioned above. For example, it would be easier to maintain counts of the number of times a command is executed, since no access to source code is needed. (Handling display lists correctly seems possible, though it would require a good bit of work, especially for shared dlists.) There are problems merging the results of counts from the tracing utilities with timing measurements made by other instruments. The tracing utilities would need to interpret the instrumentation commands to know when to start and stop counting. The counts wouldn't be available to the application under test, so it couldn't make on-the-fly decisions based on them. Also, in many cases I suspect it's more work to put this functionality into the tracing utilities than it is to fold the functionality into the instruments. Counting pixel and texture commands might be accomplished with just a few lines of microcode, for example. It's difficult to measure the number of scissored geometric primitives, because a primitive may be scissored in one rasterizer but not in others. Determining which primitives have been scissored essentially requires tagging each primitive so that the status from all rasterizers can be combined meaningfully. Good point. That statistic has been dropped from the current proposal. It would be worthwhile to consider instruments that would help debug performance problems, but would not necessarily be exposed for general use. (A count of the number of cycles for which each type of memory request [texture, video, command fifo, etc.] stalls, for example.) Yes. The proposal now mentions a ``Debug/Test'' category of instruments. Beware of adding readable hardware counters, particularly when they affect multiple blocks of logic and software (consider testability, new special command packets that would be required, context switching, etc.). True. Not all of these instruments will be practical. For multiple geometry engines, some measurements will need to be maintained on a per-GE basis. The extension spec must reflect this (as it must reflect the existence of multiple rasterizers). |
From: Ian R. <id...@us...> - 2004-11-09 00:56:36
|
Allen Akin wrote: > On Mon, Nov 08, 2004 at 11:32:24AM -0800, Ian Romanick wrote: > | This is something I've been thinking about ever since I saw the > | profiling tools in Nvidia's drivers at SIGGRAPH. There's a LOT of > | information that would be useful to get out of the driver about performance > > Have you taken a look at the SGIX_instruments extension? It provides a > framework that's intended for gathering profiling information > asynchronously. The idea was that you'd add separate extensions that > defined the actual instrumentation (SGIX_ir_instrument1 was an early > example). I looked at those extensions once a long time ago. My impression then, and even now at re-examining them, is: Yuck! I have a number of "issues" with those extensions in particular and with similar profiling methods in general. I'll stick to the general issues here. ;) The biggest problem with any profiling technique like this is that it is very, very intrusive to the source code of the application. The application has to be coded to measure itself, and it has to know what and how to measure. This would be like putting code in an app to do RDTSC (or other MSR reading) to do x86 instruction profiling. It's tough to change what is being measured (e.g., instruction timings, cache misses, etc.), and you end up with a bunch of crap, that quickly becomes outdated, in the source code. People do it, but usually only in limited circumstances (e.g., the measure the performance of one specific routine under controled inputs). People much prefer to use tools like oprofile or VTune. The other really big problem with that type of profiling for GL is that only the application writer can use it. Presumably all the instrumentation code is removed from the release binaries, so driver writers can't hook profiling information to improve the drivers. Sure, we could put another mechanism in the driver, but then we have two mechanisms that we have to maintain. Looking at GL profiling from an oprofile point of view, the ideal situation would be to have some sort of GUI (that could be run on a different machine) where the user could select the information to gather, then start recording. The user could then run whatever test they wanted, and stop recording when the test was done. Then a graphics of per-frame statistics could be presented. An obvious choice would be the per-frame time. The user could then select specific frames (or ranges of frames) to view more detailed statistics for that frame. Obviously, this is a MUCH bigger project than something like texturetop. |
From: Allen A. <ak...@po...> - 2004-11-09 18:56:35
|
On Mon, Nov 08, 2004 at 04:56:15PM -0800, Ian Romanick wrote: | The biggest problem with any profiling technique like this is that it is | very, very intrusive to the source code of the application. ... Well, there are several classes of apps that need immediate performance feedback to determine how to behave. Those include games and simulators, of course, but they also include non-realtime apps that want to measure performance to decide which of several rendering paths to use. (It seems likely to me that some desktop libraries will fall into the latter class.) For those apps, instrumentation isn't intrusive; it's a fundamental part of how they work. | ... This would be like putting code in an app to do | RDTSC (or other MSR reading) to do x86 instruction profiling. ... | ... People much prefer to use tools like oprofile | or VTune. You need hardware-friendly low-level mechanisms to implement user-friendly high-level tools that will do something correct and useful. I thought the SGIX_instruments stuff would be helpful because it makes suggestions about design (e.g. measurement intervals, labelling, asynchronous data delivery, accuracy) and things that are worth measuring (e.g. command counts, data transfer sizes). But maybe not. | The other really big problem with that type of profiling for GL is that | only the application writer can use it. ... SGIX_instruments restricts queries to the current context. But if you're willing to do the necessary locking in the driver and/or hardware, you can make queries on behalf of other contexts (or of performance across all contexts). That's a problem you have to solve for external profilers, too, so I don't think it's a make-or-break issue. Allen |
From: Ian R. <id...@us...> - 2004-11-12 22:55:25
|
Allen Akin wrote: > On Mon, Nov 08, 2004 at 04:56:15PM -0800, Ian Romanick wrote: > | The biggest problem with any profiling technique like this is that it is > | very, very intrusive to the source code of the application. ... > > Well, there are several classes of apps that need immediate performance > feedback to determine how to behave. Those include games and > simulators, of course, but they also include non-realtime apps that want > to measure performance to decide which of several rendering paths to > use. (It seems likely to me that some desktop libraries will fall into > the latter class.) > > For those apps, instrumentation isn't intrusive; it's a fundamental part > of how they work. That's a good point. The timing was somewhat ironic, too. The same day you sent this message, I was at a GLSL seminar in Portland by Randi Rost (3dlabs). During one of the breaks, the issue of "time" in OpenGL came up. We had a bit of a side discussion about apps that need to be real-time (i.e., hit a target framerate no matter what). Other than WGL_I3D_swap_frame_usage / GLX_MESA_swap_frame_usage, there isn't anything in OpenGL to help applications with this problem. This is the "Load Monitoring" problem from your original e-mail. My mind is pretty well stuck in the "Tuning" problem. In truth, there really isn't anything in OpenGL to help with either problem. :( http://freedesktop.org/cgi-bin/viewcvs.cgi/mesa/Mesa/docs/MESA_swap_frame_usage.spec?view=markup http://oss.sgi.com/projects/ogl-sample/registry/I3D/wgl_swap_frame_usage.txt > | ... This would be like putting code in an app to do > | RDTSC (or other MSR reading) to do x86 instruction profiling. ... > | ... People much prefer to use tools like oprofile > | or VTune. > > You need hardware-friendly low-level mechanisms to implement > user-friendly high-level tools that will do something correct and > useful. > > I thought the SGIX_instruments stuff would be helpful because it makes > suggestions about design (e.g. measurement intervals, labelling, > asynchronous data delivery, accuracy) and things that are worth > measuring (e.g. command counts, data transfer sizes). > > But maybe not. I think this is one of the main differences between solving the "Load Monitoring" problem and the "Tuning" problem. The former problem requires real-time data collection and on-line analysis. The later does not require real-time data collection and prefers off-line analysis. SGIX_instruments would work for on-line analysis, and a driver streaming XML to a pipe / file / network connection would work for off-line. > | The other really big problem with that type of profiling for GL is that > | only the application writer can use it. ... > > SGIX_instruments restricts queries to the current context. But if > you're willing to do the necessary locking in the driver and/or > hardware, you can make queries on behalf of other contexts (or of > performance across all contexts). That's a problem you have to solve > for external profilers, too, so I don't think it's a make-or-break > issue. |
From: Allen A. <ak...@po...> - 2004-11-13 00:08:24
|
On Fri, Nov 12, 2004 at 02:55:12PM -0800, Ian Romanick wrote: | ... We had a bit of a side discussion about apps that need to be | real-time (i.e., hit a target framerate no matter what). Other than | WGL_I3D_swap_frame_usage / GLX_MESA_swap_frame_usage, there isn't | anything in OpenGL to help applications with this problem. The Performer group experimented with SGIX_instruments, but the instrumentation available on the old machines was pretty crude, and I don't remember if Performer ever used it in production. I've talked to the 3Dlabs guys about timing in the past. They proposed some functionality for GL2, but there wasn't enough interest from the rest of the ARB to standardize it. Too bad; it's a fun problem and a good solution would have lots of uses. | I think this is one of the main differences between solving the "Load | Monitoring" problem and the "Tuning" problem. The former problem | requires real-time data collection and on-line analysis. The later does | not require real-time data collection and prefers off-line analysis. I think there's more overlap between the two because of the "Heisenbug" problem. That is, if data collection perturbs the timing of the app too much, then you can't use the data you collected to draw valid conclusions about the normal performance of the app. So I'd be inclined to shoot for one real-time data-collection mechanism that's used for both on-line and off-line analysis. I can see where you'd feel otherwise if you wanted to collect some really detailed data (e.g. the number of fragments generated for each polygon). But basic useful stuff like command counts, primitive counts, data transfer amounts, memory utilization, and time intervals could all be done real-time, if the collection mechanism is built that way. Allen |
From: Owen T. <ot...@re...> - 2004-11-09 00:57:43
|
On Mon, 2004-11-08 at 11:32 -0800, Ian Romanick wrote: > Owen Taylor wrote: > > Over a last few days, I've been working on a tool to allow watching > > a live view of the texture usage of a DRI application. I'm not going to > > go into a lot of detail here - there is a more complete description at: > > =20 > > http://fishsoup.net/software/texturetop/ > >=20 > > And an even more complete README files linked to from here, but the 10 > > second description of the change to the DRI source code is this: >=20 > This is something I've been thinking about ever since I saw the=20 > profiling tools in Nvidia's drivers at SIGGRAPH. There's a LOT of=20 > information that would be useful to get out of the driver about performan= ce: >=20 > - texture use / abuse > - number of primitives in each begin / end block > - operations causing stalls in the driver (e.g., places where the=20 > driver has to explicitly wait on the hardware). > - time to render each frame > - etc. >=20 > We could emit this data as a per-frame (where a frame ends with a=20 > SwapBuffers call on a double buffered visual or glFinish on a single=20 > buffered visual) XML document. Each context would have a named pipe=20 > where some app could read the data. >=20 > I hadn't really thought it through enough (or had time to work on it) to=20 > make worth posting to the list about it. It sounds like your texturetop=20 > interface is in a similar vein Yes, it's very much along those lines, though if you think of it as: app =3D(A)=3D> profile thread =3D(B)=3D> profiler I've designed things so lots of data gets pushed across (A), the profiler thread aggregates, and (B) is low bandwidth. The profiler can poll as infrequently as it likes. So you wouldn't want to accumulate per-frame data, since it could grow without bound. But I don't think that's a big problem. You can report data when polled only for the last frame. Or you can report min/max/average data for frames since the profiler last polled. Regards, Owen |