Re: KGI Memory management

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Wed, 15 May 2002, Nicolas Souchu wrote:

> On Tue, May 14, 2002 at 04:26:09PM -0400, Filip Spacek wrote:
> > I should probably note that my foremost goal here is to create some sort
> > of allocation interface that the kernel could actually enforce. I wanted
> > the kernel to ensure fair and protected utilization of video card
> > resources by multiple processes by means of (possibly somewhat limited)
> > virtualization of these resources. Only once I started thinking about it
> > some more I figured that it would be really cool if the kernel could do
> > all the swapping in and out of different regions/types of memory.
>
> Yes, supposing that you have processes/threads using more video memory than
> the physically present on the board. Or is it just for protection has you
> suggest it later (which is not evident with SMP)?
>
> But why using more video RAM than available? For hidden applications maybe.

Protection is certainly one main issue. The other is virtualization. The
kernel needs to be able to fairly distribute the available video ram
between all the programs using the graphics accelerator at the same time
(i.e. in the case of two windowed applications using some sort of direct
rendering)

> > So assuming that KGI exposes the above memory regions, lets look at the
> > generic auxiliary pixbufs. How would an application allocate these
> > buffers? Would the kernel simply hand out video RAM util there's none
> > left? This would allow process to monopolize all of the video card
> > resources. Furthermore how would the kernel protect this memory from other
> > processes? Most PC cards can render pretty much anywhere into the
> > framebuffer and I think that it is highly undesirable to allow one process
> > to draw a line across some other processes vertex data stored in the video
> > memory.
> >
> > The issue that comes up over and over is that the video card does not have
> > virtual memory and so giving an application access to the accelerator
> > pretty much gives it access to the whole video memory. The only way to
> > protect the memory and ensure that it is fairly used (by swapping the
> > contents of the video memory in and out) is to monitor the accelerator
> > command stream (which has to be done on many graphics cards anyways for
> > security reasons) and software page fault on any offending command.
>
> With CPU MMU, the issue is overlapping of video regions by 4kb pages.
>
> One could imagine different level of protection and different APIs to reach
> the graphic resources. Considering the framebuffer. Some threads need very
> fast execution so they have complete access to the region they need by
> group of 4kb pages. This area should never overlap the region of a process
> of the same or higher priority (a general rule for the windowing system).
> If the area overlaps a region with less priority then why not considering
> that the lower thread access HW through an API, eventually hidden by the
> same API provided to the high thread, that would be executed in the context
> of the higher thread?

That is an interesting point. Since we would implement the protection in
software, some other scheme might be advantageous in other sitations.

Unfortunately in most cases we don't really have a choice and have to
follow exactly what the CPU does. The reason is that direct writing to the
video memory (framebuffer) is nonsense. Most drawing operations can be
done using the accelerator and even the stuff that needs to be drawn by
hand should not be drawn pixel by pixel because the local bus is usually
way too slow. The fastest way is to DMA whole chunks at a time. So the
application can't be just handed out pieces of the video ram, because it
woudn't know what to do with it. The app cannot initiate DMA transfer from
the userspace, and just memcpying stuff into the video ram is a terrible
waste of system resources.

Instead when an app asks for video ram, it gets system ram instead. It can
happily use the CPU to load up (for example) a texture from a file into
this memory. When KGI detects that the application tried to use this
memory for texturing it will DMA it to the video ram and set the system
memory pages as read only (this way when the app writes to the texture we
get a page fault and we will know that the copy present in the video ram
is no longer valid). The fact that the application operates on system
memory effectively restricts the sofware video memory management to the
same scheme CPU uses (4k pages).

It might seem that this is wasteful and that the data is uselessly present
in two places, but in reality the scheme isn't worse than any other
alternative. Imagine that the application is given out video ram directly
and only up to the amount available. If the app has more textures than
video ram available, it will have to keep all its textures in system ram
and selectively upload them to the video ram so the duplication is present
in this scheme too.

> [...]
> > > 	I feel that this is too generic for in-kernel support.  Keep in
> > > mind that for all the hairy strangeness we are dealing with here, we _are_
> > > still only discussing memory mapping, typing, cacheing, and allocating.
> > > Bins and allocs with enough granularity should suffice for pretty much all
> > > video cards PCI and newer.
> >
> > I agree that it exposes awfuly lot about the use of the memory. However I
> > see no other way. The problem is that since the graphics card will not
> > inform the CPU (in some hardware way) about the memory it uses, there is
> > no way to load the graphics data on demand. Ideally the application would
> > just allocate a big chunk of memory where it would store its textures (for
> > example) and when the GPU actualy tries to texture using this data it
> > would cause a page fault of sorts which would make the kernel load the
> > appropriate data into the video ram. Unfortunately the only information
> > the kernel can gather is that the GPU is going to be using data starting
> > at a certain location so it becomes necessary for it to also know the size
> > of the data object stored there so that it can ensure that it is present
> > entirely at the appropriate place.
>
> But supposing you have a bi-cpu machine which are widely available nowdays.
> If you dedicate a cpu to graphic processing (additionaly to the graphic one)
> don't you get the expected facilities?
>
> --
> Nicholas Souchu - ns...@fr... - nsouch@FreeBSD.org
>

-Filip