From: martin k. <blu...@gm...> - 2009-04-26 15:59:50
|
hello. I have been experiencing difficulties trying to get fixed-pipeline TCL and ARBVP1 hardware support for an ATI 9250 PCI board (chip id 0x5960) under Xorg 1.5.3 and FOSS ATI DRI edge 6.10, running on a ppc station. Is there such a know issue or does my problem present an isolated case? for the record, glxinfo -i reports nothing wrong on the subject: $ glxinfo -l <snip> OpenGL vendor string: Tungsten Graphics, Inc. OpenGL renderer string: Mesa DRI R200 20060602 TCL OpenGL version string: 1.3 Mesa 7.2 <snip> OpenGL limits: <snip> GL_MAX_ELEMENTS_VERTICES = 3000 GL_MAX_ELEMENTS_INDICES = 3000 <snip> GL_VERTEX_PROGRAM_ARB: GL_MAX_PROGRAM_INSTRUCTIONS_ARB = 128 GL_MAX_PROGRAM_NATIVE_INSTRUCTIONS_ARB = 128 GL_MAX_PROGRAM_TEMPORARIES_ARB = 128 GL_MAX_PROGRAM_NATIVE_TEMPORARIES_ARB = 12 GL_MAX_PROGRAM_PARAMETERS_ARB = 128 GL_MAX_PROGRAM_NATIVE_PARAMETERS_ARB = 192 GL_MAX_PROGRAM_ATTRIBS_ARB = 16 GL_MAX_PROGRAM_NATIVE_ATTRIBS_ARB = 12 GL_MAX_PROGRAM_ADDRESS_REGISTERS_ARB = 1 GL_MAX_PROGRAM_NATIVE_ADDRESS_REGISTERS_ARB = 1 GL_MAX_PROGRAM_LOCAL_PARAMETERS_ARB = 128 GL_MAX_PROGRAM_ENV_PARAMETERS_ARB = 128 GL_MAX_PROGRAM_ALU_INSTRUCTIONS_ARB = 128 GL_MAX_PROGRAM_TEX_INSTRUCTIONS_ARB = 128 GL_MAX_PROGRAM_TEX_INDIRECTIONS_ARB = 128 GL_MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB = 128 GL_MAX_PROGRAM_NATIVE_TEX_INSTRUCTIONS_ARB = 128 GL_MAX_PROGRAM_NATIVE_TEX_INDIRECTIONS_ARB = 128 but the exhibited performance degradation of more than a decimal order of magnitude compared to the same test case on a quite similar hw configuration, just under osx, suggests software emulation. here's the complete components version info as seen in Xorg.0.log: $ grep -A 6 -e "LoadModule: \"dri\"" -e "LoadModule: \"ati\"" -e "LoadModule: \"radeon\"" /var/log/Xorg.0.log (II) LoadModule: "dri" (II) Loading /usr/lib/xorg/modules/extensions//libdri.so (II) Module dri: vendor="X.Org Foundation" compiled for 1.5.3, module version = 1.0.0 ABI class: X.Org Server Extension, version 1.1 (II) Loading extension XFree86-DRI (II) LoadModule: "ati" (II) Loading /usr/lib/xorg/modules/drivers//ati_drv.so (II) Module ati: vendor="X.Org Foundation" compiled for 1.5.3, module version = 6.10.0 Module class: X.Org Video Driver ABI class: X.Org Video Driver, version 4.1 (II) LoadModule: "radeon" (II) Loading /usr/lib/xorg/modules/drivers//radeon_drv.so (II) Module radeon: vendor="X.Org Foundation" compiled for 1.5.3, module version = 6.10.0 Module class: X.Org Video Driver ABI class: X.Org Video Driver, version 4.1 thank you for your time, martin |
From: Roland S. <sr...@vm...> - 2009-04-27 16:59:23
|
On 26.04.2009 17:59, martin krastev wrote: > hello. > > I have been experiencing difficulties trying to get fixed-pipeline TCL > and ARBVP1 hardware support for an ATI 9250 PCI board (chip id 0x5960) > under Xorg 1.5.3 and FOSS ATI DRI edge 6.10, running on a ppc station. > Is there such a know issue or does my problem present an isolated > case? > > for the record, glxinfo -i reports nothing wrong on the subject: > > $ glxinfo -l > <snip> > OpenGL vendor string: Tungsten Graphics, Inc. > OpenGL renderer string: Mesa DRI R200 20060602 TCL > OpenGL version string: 1.3 Mesa 7.2 > <snip> > > but the exhibited performance degradation of more than a decimal order > of magnitude compared to the same test case on a quite similar hw > configuration, just under osx, suggests software emulation. > > here's the complete components version info as seen in Xorg.0.log: I'm not aware of any problems with hw TCL or ARB_vp on any r200 chip (and it shouldn't matter if pci or agp for this). You can use R200_DEBUG=fall env var to see if the code hits any fallbacks. Note that there are however cases the code is not optimized for and might get excessively slow (like doing CopyTexImage which isn't hardware accelerated). Roland |
From: martin k. <blu...@gm...> - 2009-04-28 17:46:58
|
Thank you, Roland, for the prompt reply. 'R200_DEBUG=fall' did not reveal any sw fallbacks, so my inital assumption of a potential TCL such occuring appears to be wrong. On the other hand, 'R200_DEBUG=all' showed something interesting. For a sequence of identical STATIC_DRAW-kind VBO-carried draws, of the form: for (unsigned i = 0; i < draw_iterations; i++) glDrawElements(GL_TRIANGLES, num_indices, GL_UNSIGNED_SHORT, 0); where the vertex/index buffers have been created as: glGenBuffersARB(2, vboId); glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboId[0]); glBufferDataARB(GL_ARRAY_BUFFER_ARB, vert_arr_size, vert_arr, GL_STATIC_DRAW_ARB); glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, vboId[1]); glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, idx_arr_size, idx_arr, GL_STATIC_DRAW_ARB); glVertexPointer(3, GL_FLOAT, sizeof_vertex, offs_pos); glNormalPointer( GL_FLOAT, sizeof_vertex, offs_nrm); glEnableClientState(GL_VERTEX_ARRAY); glEnableClientState(GL_NORMAL_ARRAY); 'R200_DEBUG=all' reports that the number of r200AllocDmaRegion() invocations scales with the value of draw_iterations from the draw snippet. Now, given that we are dealing with a single, supposedly static vertex buffer here, i find this number-of-draws-proportial DMA handling quite curious. What am i missing? Best regards, martin ps: since you mentioned the possibility of potential unpotimized paths in the DRI edge: the performance issues i've been referring to are strictly TCL-related. In all other aspects this DRI edge performs up to the hw specs. On Mon, Apr 27, 2009 at 12:58 PM, Roland Scheidegger <sr...@vm...> wrote: > I'm not aware of any problems with hw TCL or ARB_vp on any r200 chip > (and it shouldn't matter if pci or agp for this). You can use > R200_DEBUG=fall env var to see if the code hits any fallbacks. > Note that there are however cases the code is not optimized for and > might get excessively slow (like doing CopyTexImage which isn't hardware > accelerated). > > Roland > |
From: Roland S. <sr...@vm...> - 2009-04-29 00:43:09
|
On 28.04.2009 19:46, martin krastev wrote: > Thank you, Roland, for the prompt reply. > > 'R200_DEBUG=fall' did not reveal any sw fallbacks, so my inital > assumption of a potential TCL such occuring appears to be wrong. On > the other hand, 'R200_DEBUG=all' showed something interesting. > 'R200_DEBUG=all' reports that the number of r200AllocDmaRegion() > invocations scales with the value of draw_iterations from the draw > snippet. > > Now, given that we are dealing with a single, supposedly static vertex > buffer here, i find this number-of-draws-proportial DMA handling quite > curious. What am i missing? Nothing... The problem is the driver isn't really supporting VBO. Those are pretty much fake, hence vertices have to be retransmitted. This has to do with the memory management (there is none). Should hopefully get better some day. I think though this is indeed the problem. I sort of forgot about this (and can't remember all the details), but R200 PCI indeed seemed to perform quite badly due to slow dma for some odd reason (so switching agp mode with a agp card doesn't make much of a difference, but as soon as you use pci gart it gets a large hit), in scenarios using lots of vertices. So those fake vbos aren't too bad with agp (though presumably it's not ideal neither) but really suboptimal with pci. A 10 times hit sounds huge, but I guess if you're really completely limited by vertex throughput it isn't out of the question. Roland |
From: martin k. <blu...@gm...> - 2009-04-29 14:12:41
|
Thank you, Roland, that explains it all. When i said the code was running that much faster on almost the same configuration - that other config is AGPx4, with a native OSX driver edge for the rv280 (it's a mac). And yes, the vertex arrays are large. So in one case we have larger vertex arrays perpetually traversing a (66MHz) PCI bus, in the other - proper VBOs over an AGPx4 bus - that could easilly account for an order of magnitude, and then some, for this task. And this given that fillrate alone is better on the PCI ATI (later model, faster clocks, wider memory bus, etc). let's say i decided to hack a rudimentarty allocate-only/no-deallocate memory manager for STATIC_DRAW VBOs into the edge, not for upstreaming, just for my purposes here, where would you advise me to start looking at? Also, should i be able to do with just the present sources, no futher NDA'd specs required? Thank you again, martin |
From: Roland S. <sr...@vm...> - 2009-04-30 13:05:18
|
On 29.04.2009 16:12, martin krastev wrote: > Thank you, Roland, that explains it all. When i said the code was > running that much faster on almost the same configuration - that other > config is AGPx4, with a native OSX driver edge for the rv280 (it's a > mac). And yes, the vertex arrays are large. So in one case we have > larger vertex arrays perpetually traversing a (66MHz) PCI bus, in the > other - proper VBOs over an AGPx4 bus - that could easilly account for > an order of magnitude, and then some, for this task. And this given > that fillrate alone is better on the PCI ATI (later model, faster > clocks, wider memory bus, etc). > > let's say i decided to hack a rudimentarty allocate-only/no-deallocate > memory manager for STATIC_DRAW VBOs into the edge, not for > upstreaming, just for my purposes here, where would you advise me to > start looking at? Also, should i be able to do with just the present > sources, no futher NDA'd specs required? I think you wouldn't need any NDA's specs. If you want to make the index buffer also reusable, you need to use the INDX_BUFFER packet which isn't used in the r200 driver. However, r300 works just the same there (and does use it). I dunno where to start, though. The r300 driver once upon a time actually used (now defunct and ifdefed out) USER_BUFFERS code to implement vbo without real memory management. Maybe that's a start, r200 and r300 don't really differ there. Otherwise, it would probably be easier to use the radeon-rewrite branch along with the appropriate drm bits and just implement it for real... Roland |
From: martin k. <blu...@gm...> - 2009-05-01 02:48:33
|
ugh, i feel foolish for having overlooked the radeon-rewrite. it's a commendable effort on David Airlie's part and my time would be much better spent trying to contribute toward the finalization of that code, to the best of my abilities. thank you for the guidance. martin |