Re: TTM merging?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Wed, 2008-05-14 at 02:33 +0200, Thomas Hellström wrote:
> > The real question is whether TTM suits the driver writers for use in Linux 
> > desktop and embedded environments, and I think so far I'm not seeing 
> > enough positive feedback from the desktop side.
> >   
> I actually haven't seen much feedback at all. At least not on the 
> mailing lists.
> Anyway we need to look at the alternatives which currently is GEM.
> 
> GEM, while still in development basically brings  us back to the 
> functionality of TTM 0.1, with added paging support but without 
> fine-grained locking and  caching policy support.
> 
> I might have misunderstood things but quickly browsing the code raises 
> some obvious questions:
> 
> 1) Some AGP chipsets don't support page addresses > 32bits. GEM objects 
> use GFP_HIGHUSER, and it's hardcoded into the linux swap code.

The obvious solution here is what many DMA APIs do for IOMMUs that can't
address all of memory -- keep a pool of pages within the addressable
range and bounce data through them.  I think the Linux kernel even has
interfaces to support us in this.  Since it's not going to be a very
common case, we may not care about the performance.  If we do find that
we care about the performance, we should first attempt to get what we
need into the linux kernel so we don't have to duplicate code, and only
if that fails do the duplication.

I'm pretty sure the AGP chipsets versus >32-bits pages danger has been
overstated, though.  Besides the fact that you needed to load one of
these older supposed machines with a full 4GB of memory (well,
theoretically 3.5GB but how often can you even boot a system with a 2,
1, .5gb combo?), you also need a chipset that does >32-bit addressing.

At least all AMD and Intel chipsets don't appear to have this problem in
the survey I did last night, as they've either got >32-bit chipset and
>32-bit gart, or 32-bit chipset and 32-bit gart.  Basically all I'm
worried about is ATI PCI[E]GART at this point.

http://dri.freedesktop.org/wiki/GARTAddressingLimits

<snip bits that have been covered in other mails>

> 5) What's protecting i915 GEM object privates and lists in a 
> multi-threaded environment?

Nothing at the moment.  That's my current project.  dev->struct_mutex is
the plan -- I don't want to see finer-grained locking until we show that
contention on that locking is an issue.  Fine-grained locking takes
significant care, and there's a lot more important performance
improvements to work on before then.

> 6) Isn't do_mmap() strictly forbidden in new drivers? I remember seeing 
> some severe ranting about it on the lkml?

We've talked it over with Arjan, and until we can use real fds as our
handles to objects, he thought it sounded OK.  But apparently Al Viro's
working on making it so that allocating a thousand fds would be feasible
for us.  At that point mmap/pread/pwrite/close ioctls could be replaced
with the syscalls they were named for, and the kernel guys love us.

> TTM is designed to cope with most hardware quirks I've come across with 
> different chipsets so far, including Intel UMA, Unichrome, Poulsbo, and 
> some other ones. GEM basically leaves it up to the driver writer to 
> reinvent the wheel..

The problem with TTM is that it's designed to expose one general API for
all hardware, when that's not what our drivers want.  The GPU-GPU cache
handling for intel, for example, mapped the hardware so poorly that
every batch just flushed everything.  Bolting on the clflush-based
cpu-gpu caching management for our platform recovered a lot of
performance, but we're still having to reuse buffers in userland at a
memory cost because allocating buffers is overly expensive for the
general supporting-everybody (but oops, it's not swappable!) object
allocator.

We're trying to come at it from the other direction: Implement one
driver well.  When someone else implements another driver and finds that
there's code that should be common, make it into a support library and
share it.

I actually would have liked the whole interface to userland to be
driver-specific with a support library for the parts we think other
people would want, but DRI2 wants to use buffer objects for its shared
memory transport and I didn't want to rock its boat too hard, so the
ioctls that should be supportable for everyone got moved to generic.

If the implementation of those ioctls in generic code doesn't work for
some drivers (say, early shmfs object creation turns out to be a bad
idea for VRAM drivers), I'll happily push it out to the driver.

-- 
Eric Anholt                             anholt@FreeBSD.org
er...@an...                         eri...@in...