From: Eric A. <er...@an...> - 2008-05-14 18:31:44
|
On Wed, 2008-05-14 at 02:33 +0200, Thomas Hellström wrote: > > The real question is whether TTM suits the driver writers for use in Linux > > desktop and embedded environments, and I think so far I'm not seeing > > enough positive feedback from the desktop side. > > > I actually haven't seen much feedback at all. At least not on the > mailing lists. > Anyway we need to look at the alternatives which currently is GEM. > > GEM, while still in development basically brings us back to the > functionality of TTM 0.1, with added paging support but without > fine-grained locking and caching policy support. > > I might have misunderstood things but quickly browsing the code raises > some obvious questions: > > 1) Some AGP chipsets don't support page addresses > 32bits. GEM objects > use GFP_HIGHUSER, and it's hardcoded into the linux swap code. The obvious solution here is what many DMA APIs do for IOMMUs that can't address all of memory -- keep a pool of pages within the addressable range and bounce data through them. I think the Linux kernel even has interfaces to support us in this. Since it's not going to be a very common case, we may not care about the performance. If we do find that we care about the performance, we should first attempt to get what we need into the linux kernel so we don't have to duplicate code, and only if that fails do the duplication. I'm pretty sure the AGP chipsets versus >32-bits pages danger has been overstated, though. Besides the fact that you needed to load one of these older supposed machines with a full 4GB of memory (well, theoretically 3.5GB but how often can you even boot a system with a 2, 1, .5gb combo?), you also need a chipset that does >32-bit addressing. At least all AMD and Intel chipsets don't appear to have this problem in the survey I did last night, as they've either got >32-bit chipset and >32-bit gart, or 32-bit chipset and 32-bit gart. Basically all I'm worried about is ATI PCI[E]GART at this point. http://dri.freedesktop.org/wiki/GARTAddressingLimits <snip bits that have been covered in other mails> > 5) What's protecting i915 GEM object privates and lists in a > multi-threaded environment? Nothing at the moment. That's my current project. dev->struct_mutex is the plan -- I don't want to see finer-grained locking until we show that contention on that locking is an issue. Fine-grained locking takes significant care, and there's a lot more important performance improvements to work on before then. > 6) Isn't do_mmap() strictly forbidden in new drivers? I remember seeing > some severe ranting about it on the lkml? We've talked it over with Arjan, and until we can use real fds as our handles to objects, he thought it sounded OK. But apparently Al Viro's working on making it so that allocating a thousand fds would be feasible for us. At that point mmap/pread/pwrite/close ioctls could be replaced with the syscalls they were named for, and the kernel guys love us. > TTM is designed to cope with most hardware quirks I've come across with > different chipsets so far, including Intel UMA, Unichrome, Poulsbo, and > some other ones. GEM basically leaves it up to the driver writer to > reinvent the wheel.. The problem with TTM is that it's designed to expose one general API for all hardware, when that's not what our drivers want. The GPU-GPU cache handling for intel, for example, mapped the hardware so poorly that every batch just flushed everything. Bolting on the clflush-based cpu-gpu caching management for our platform recovered a lot of performance, but we're still having to reuse buffers in userland at a memory cost because allocating buffers is overly expensive for the general supporting-everybody (but oops, it's not swappable!) object allocator. We're trying to come at it from the other direction: Implement one driver well. When someone else implements another driver and finds that there's code that should be common, make it into a support library and share it. I actually would have liked the whole interface to userland to be driver-specific with a support library for the parts we think other people would want, but DRI2 wants to use buffer objects for its shared memory transport and I didn't want to rock its boat too hard, so the ioctls that should be supportable for everyone got moved to generic. If the implementation of those ioctls in generic code doesn't work for some drivers (say, early shmfs object creation turns out to be a bad idea for VRAM drivers), I'll happily push it out to the driver. -- Eric Anholt anholt@FreeBSD.org er...@an... eri...@in... |