Re: My experience with the r300 driver

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Thursday 13 October 2005 07:51, Michel D=C3=A4nzer wrote:
> There's no question that the override is useful for developers, the
> question is whether it isn't more harm- than useful for users.

I've often thought it'd be nice to have the VideoRAM option in the config f=
ile=20
be clamped to the max(user specified, driver probed), with some magic value=
=20
the driver could specify to say it has no real idea how much vram is=20
available.

> > And, the driver also limits texture memory to only be useable up to
> > 128MB, and I think this is not necessary (as textures are always blitted
> > using the gpu and the memory used by them never touched directly by the
> > cpu) or is it?
>
> Indeed, that memory would probably be useful for textures for now, but
> maybe CPU access to textures in the framebuffer will be necessary in the
> future?

I don't think so.

=46or fixed function cards, the numbers I've been getting while playing wit=
h=20
accelerating XGetImage and XPutImage in EXA suggest that even for fairly=20
small updates to offscreen images (about an 8x8 tile update or so), it's=20
faster to download the subimage you're interested in, modify it in host RAM=
,=20
and re-upload it, than it is to do CPU-driven access directly.  XGetImage o=
f=20
XYPixmaps is a good example, where DMAing the pixmap down from the=20
framebuffer and then converting ZPixmap to XYPixmap in host memory is betwe=
en=20
3 to 12 times faster than the normal software path.

=46or cards with useful fragment shaders, it'd be really really hot to see =
the=20
server's fb layer implemented in fragment shaders and do even core X=20
rendering entirely on-card.  This is basically the Quartz 2D Extreme model.=
 =20
Again, you need to get this data off the card sometimes for things like=20
glReadPixels or XGetImage, but that should really be done with DMA, or a=20
proper memcpy at minimum.

Think of it as manual cache management.  Block transfers are fairly quick, =
and=20
modifying data within a memory domain is really fast, but single-word updat=
es=20
between domains are just painful.

So I guess to answer your question, memory outside the BAR is fine to only =
use=20
for textures, because if the host really wants to modify them it should do =
so=20
only between DFS and UTS pairs, and presumably the GPU can use its entire=20
address space for DMA sources and targets rather than just only the range=20
visible through the PCI bus aperture.

=2D ajax