From: Adam J. <aj...@nw...> - 2005-10-14 20:47:25
|
On Thursday 13 October 2005 07:51, Michel D=C3=A4nzer wrote: > There's no question that the override is useful for developers, the > question is whether it isn't more harm- than useful for users. I've often thought it'd be nice to have the VideoRAM option in the config f= ile=20 be clamped to the max(user specified, driver probed), with some magic value= =20 the driver could specify to say it has no real idea how much vram is=20 available. > > And, the driver also limits texture memory to only be useable up to > > 128MB, and I think this is not necessary (as textures are always blitted > > using the gpu and the memory used by them never touched directly by the > > cpu) or is it? > > Indeed, that memory would probably be useful for textures for now, but > maybe CPU access to textures in the framebuffer will be necessary in the > future? I don't think so. =46or fixed function cards, the numbers I've been getting while playing wit= h=20 accelerating XGetImage and XPutImage in EXA suggest that even for fairly=20 small updates to offscreen images (about an 8x8 tile update or so), it's=20 faster to download the subimage you're interested in, modify it in host RAM= ,=20 and re-upload it, than it is to do CPU-driven access directly. XGetImage o= f=20 XYPixmaps is a good example, where DMAing the pixmap down from the=20 framebuffer and then converting ZPixmap to XYPixmap in host memory is betwe= en=20 3 to 12 times faster than the normal software path. =46or cards with useful fragment shaders, it'd be really really hot to see = the=20 server's fb layer implemented in fragment shaders and do even core X=20 rendering entirely on-card. This is basically the Quartz 2D Extreme model.= =20 Again, you need to get this data off the card sometimes for things like=20 glReadPixels or XGetImage, but that should really be done with DMA, or a=20 proper memcpy at minimum. Think of it as manual cache management. Block transfers are fairly quick, = and=20 modifying data within a memory domain is really fast, but single-word updat= es=20 between domains are just painful. So I guess to answer your question, memory outside the BAR is fine to only = use=20 for textures, because if the host really wants to modify them it should do = so=20 only between DFS and UTS pairs, and presumably the GPU can use its entire=20 address space for DMA sources and targets rather than just only the range=20 visible through the PCI bus aperture. =2D ajax |