From: Daryll S. <da...@va...> - 2000-05-22 14:47:18
|
I've been thinking about this and talking to a few people about it, and the more I think about it the more I think getting rid of the X server is the wrong answer. Keith Packard has been working on a project to shrink the X server. It is included in the XFree86 work. I'm told that it 600k on the Itsy. (The Itsy is Compaq's prototype Strong ARM PDA, which has 32MB) It seems that's reasonable for even embedded devices. He does it by using a much simplier intermediate layer in the X server. Therefore it seems to me to be better to use that as a base, and adapt that as needed. It gets you all of X, but you don't need to use that if you don't want to. You will have it there if you do. - |Daryll |
From: Michael M. M. <mor...@hy...> - 2000-05-22 22:16:34
|
Daryll Strauss wrote: > > I've been thinking about this and talking to a few people about it, and > the more I think about it the more I think getting rid of the X server > is the wrong answer. > > Keith Packard has been working on a project to shrink the X server. It > is included in the XFree86 work. I'm told that it 600k on the Itsy. (The > Itsy is Compaq's prototype Strong ARM PDA, which has 32MB) It seems > that's reasonable for even embedded devices. He does it by using a much > simplier intermediate layer in the X server. > > Therefore it seems to me to be better to use that as a base, and adapt > that as needed. It gets you all of X, but you don't need to use that if > you don't want to. You will have it there if you do. > Off the top of my head, here are the main reasons we would be interested in a non-X version of DRI: 1) ** KEY ISSUE ** Eliminate any performance bottlenecks the XServer may be causing. Since we are 3D only, any extraneous locking/unlocking, periodic refreshes of the (hidden) 2D portion of the display, etc., will cause unexpected slowdowns. 2) Eliminate wasted system memory requirements. 3) Eliminate on-card font/pixmap/surface/etc caches that just waste memory. 4) Eliminate the need for extra peripherals, such as mice. 5) Reduction in the amount of software necessary to install/maintain on a customer's system. Certainly none of my customers would have been able to install XFree 4.0 on their own. There are probably more. In the end, we want the fastest possible pipeline from our app to the 3D graphics hardware, with the wholesale elimination of those things that are not required, or stand in the way of performance. Maybe tinyX will achieve the above... do you have an URL for that project? What is not clear to me, is what the impact of the Xserver is on the 3D graphics pipeline. If it is "none", then there is little need for further work (in our case). --- Michael M. Morrison VP/Chief Technical Officer Hyperion Technologies, Inc. |
From: Daryll S. <da...@va...> - 2000-05-22 23:37:09
|
On Mon, May 22, 2000 at 04:34:01PM -0600, Michael M. Morrison wrote: > Off the top of my head, here are the main reasons we would be interested > in a non-X version of DRI: > > 1) ** KEY ISSUE ** Eliminate any performance bottlenecks the XServer may > be causing. Since we are 3D only, any extraneous locking/unlocking, > periodic refreshes of the (hidden) 2D portion of the display, etc., will > cause unexpected slowdowns. If the X server never does any drawing then the overhead is minimal. Locking is done around Glide operations. A lock check is a single Check And Set (CAS) instruction. Assuming your 3D window covers the X root, then there are no 2D portions to redisplay. > 2) Eliminate wasted system memory requirements. Yes, there will be some resources from the X server, but I think not much. > 3) Eliminate on-card font/pixmap/surface/etc caches that just waste > memory. If you don't use them they aren't taking any resources. Right now, there is a small pixmap cache that's staticall added to 2D. Removing that is a trivial code change. Making it dynamic (3D steals it away from 2D) is not too tough and a better solution than any static allocation. > 4) Eliminate the need for extra peripherals, such as mice. Allowing operations without a mouse should be trivial if it isn't a configuration option already. > 5) Reduction in the amount of software necessary to install/maintain on > a customer's system. Certainly none of my customers would have been > able to install XFree 4.0 on their own. XFree 4.0 installs with appropriate packaging are trivial. What you're saying is that no one has done the packaging work for you, and that's correct. If you create your own stripped DRI version you'll be in for a lot more packaging work on your own. > There are probably more. In the end, we want the fastest possible > pipeline from our app to the 3D graphics hardware, with the wholesale > elimination of those things that are not required, or stand in the way > of performance. Maybe tinyX will achieve the above... do you have an > URL for that project? I tried to track one down, but couldn't find it. I know I've seen it posted recently. Maybe someone else here can point us at it. > What is not clear to me, is what the impact of the Xserver is on the 3D > graphics pipeline. If it is "none", then there is little need for > further work (in our case). The answer is very little. Essentially none in the 3D pipeline. Some resources, but I think not much. Realize the CAS is in the driver, so you're looking at creating a custom version of that as well. I think effort spent avoiding the CAS, creating your own window system binding for GL, and moving the DRI functionality out of the X server would be much better spent optimizing Mesa and the driver instead. You have to focus resources where they provide the biggest payoff. - |Daryll |
From: Doug R. <df...@ca...> - 2000-05-23 10:47:39
|
On Mon, 22 May 2000, Daryll Strauss wrote: > > There are probably more. In the end, we want the fastest possible > > pipeline from our app to the 3D graphics hardware, with the wholesale > > elimination of those things that are not required, or stand in the way > > of performance. Maybe tinyX will achieve the above... do you have an > > URL for that project? > > I tried to track one down, but couldn't find it. I know I've seen it > posted recently. Maybe someone else here can point us at it. Isn't this something to do with the directory programs/hw/Xserver/hw/kdrive. That has directories for itsy and seems to be a self-contained X server using fb for rendering (fb is a replacement for cfb). -- Doug Rabson Mail: df...@ca... Technical Director, Qube Software Ltd. Phone: +44 20 7431 9995 |
From: Michael M. M. <mor...@hy...> - 2000-05-26 15:25:18
|
> If the X server never does any drawing then the overhead is > minimal. Locking is done around Glide operations. A lock check is a > single Check And Set (CAS) instruction. Assuming your 3D window covers > the X root, then there are no 2D portions to redisplay. How often are checks done to see if things need clipped/redrawn/redisplayed? Hopefully only if windows are moved/resized, etc. > > 2) Eliminate wasted system memory requirements. > > Yes, there will be some resources from the X server, but I think not much. Especially if we can take advantage of the smaller server. > > 3) Eliminate on-card font/pixmap/surface/etc caches that just waste > > memory. > > If you don't use them they aren't taking any resources. Right now, there > is a small pixmap cache that's staticall added to 2D. Removing that is a > trivial code change. Making it dynamic (3D steals it away from 2D) is > not too tough and a better solution than any static allocation. Agreed. How about the main X drawing surface? Are 2 extra "window sized" buffers allocated for primary and secondary buffers in a page-flipping configuration? > Allowing operations without a mouse should be trivial if it isn't a > configuration option already. I guess I was unaware that X could run without an InputDevice being defined. I guess I did not check the docs. > XFree 4.0 installs with appropriate packaging are trivial. What you're > saying is that no one has done the packaging work for you, and that's > correct. If you create your own stripped DRI version you'll be in for a > lot more packaging work on your own. "Appropriate" and "trivial" are certainly based on the skillset of the installer and those that created the packages. My install of XFree4.0 using the packages from rawhide did not go smoothly during the upgrade. I guess I should have unintstalled the previous version before I started, but I did not want to be left with more messing around if it didn't work. It seems to me that only having to package my app + the kernel driver + dri module + fakeX module + glx(fakeglx) module + /lib/libGL*.so is pretty trivial. Am I missing something? My customers will not be compiling anything on their machines. > The answer is very little. Essentially none in the 3D pipeline. Some > resources, but I think not much. Realize the CAS is in the driver, so > you're looking at creating a custom version of that as well. I think > effort spent avoiding the CAS, creating your own window system binding > for GL, and moving the DRI functionality out of the X server would be > much better spent optimizing Mesa and the driver instead. You have to > focus resources where they provide the biggest payoff. I am in total agreement here. With regards to this, is anyone working on adding SSE support to the transform/lighting code in Mesa? --- Michael M. Morrison VP/Chief Technical Officer Hyperion Technologies, Inc. |
From: Gareth H. <ga...@pr...> - 2000-05-26 15:41:48
|
"Michael M. Morrison" wrote: > > I am in total agreement here. With regards to this, is anyone working > on adding SSE support to the transform/lighting code in Mesa? Yes. There is already SSE transform code in Mesa, and I've just submitted a patch to the 2.4.0-test1 kernel that enable user-space apps to use SSE. That is, get the next 2.4.0-test kernel and you'll be able to use the SSE-optimized asm routines in Mesa. On the other hand, this will only give a limited benefit (lighting is another matter), as the most time is spent in the triangle setup and rendering functions. I'm working on SSE code for that at the moment, which is what prompted the kernel work. -- Gareth |
From: Daryll S. <da...@va...> - 2000-05-26 16:08:50
|
On Fri, May 26, 2000 at 09:43:43AM -0600, Michael M. Morrison wrote: > How often are checks done to see if things need > clipped/redrawn/redisplayed? Hopefully only if windows are > moved/resized, etc. The locking system is designed to be highly efficient. It is based on a two tiered lock. Basically it works like this: The client wants the lock. The use the CAS (I was corrected that the instruction is compare and swap, I knew that was the functionality, but I got the name wrong) If the client was the last application to hold the lock, you're done you move on. If it wasn't the last one, then we use an IOCTL to the kernel to arbitrate the lock. In this case some or all of the state on the card may have changed. The shared memory carries a stamp number for the X server. When the X server does a window operation it increments the stamp. If the client sees that the stamp has changed, it uses a DRI X protocol request to get new window location and clip rects. This only happens on a window move. Assuming your clip rects/window position hasn't changed, the redisplay happens entirely in the client. The client may have other state to restore as well. In the case of the tdfx driver we have three more flags for command fifo invalid, 3D state invalid, textures invalid. If those are set the corresponding state is restored. So, if the X server wakes up to process input, it current grabs the lock but doesn't invalidate any state. I'm actually fixing this now so that it doesn't grab the lock for input processing. If the X server draws, it grabs the lock and invalidates the command fifo. If the X server moves a window, it grabs the lock, updates the stamp, and invalidates the command fifo. If another 3D app runs, it grabs the lock, invalidates the command fifo, invalidates the 3D state and possibly invalidates the texture state. > Especially if we can take advantage of the smaller server. Right. > Agreed. How about the main X drawing surface? Are 2 extra "window > sized" buffers allocated for primary and secondary buffers in a > page-flipping configuration? Right now, we don't do page flipping at all. Everything is a blit from back to front. The biggest problem with page flipping is detecting when you're in full screen mode, since OpenGL doesn't really have a concept of full screen mode. We want a solution that works for existing games. So we've been designing a solution for it. It should get implemented fairly soon since we need it for antialiasing on the V5. In the current implementation the X front buffer is the 3D front buffer. When we do page flipping we'll continue to do the same thing. Since you have an X window that covers the screen it is safe for us to use the X surface's memory. Then we'll do page flipping. The only issue will be falling back to blitting if the window is ever moved from covering the whole screen. > I guess I was unaware that X could run without an InputDevice being > defined. I guess I did not check the docs. I'm not sure it will, but it does seem like a reasonable thing to want. If that doesn't work, it could be added. > "Appropriate" and "trivial" are certainly based on the skillset of the > installer and those that created the packages. My install of XFree4.0 > using the packages from rawhide did not go smoothly during the upgrade. > I guess I should have unintstalled the previous version before I > started, but I did not want to be left with more messing around if it > didn't work. Again, that depends a lot on who packaged them and how. > It seems to me that only having to package my app + the kernel driver + > dri module + fakeX module + glx(fakeglx) module + /lib/libGL*.so is > pretty trivial. Am I missing something? My customers will not be > compiling anything on their machines. First, the kernel drivers are included in the stock 2.3.xx (soon to be 2.4) kernels. So hopefully users just get them by default. Second, whether or not users have to compile anything is a support issue. Kernel changes will require compiling by someone to keep them current with the kernel. As a distributor of a kernel module you can compile it for all the versions of Linux you support and/or you can provide source. Someone has to do the compile the question is whether you do it or they do and how much flexibility you want to leave them in their system configuration. So, the final solution is that distributions roll in XFree 4.0 and the 2.4 kernel and everything is hunky dorey. The user doesn't have to do anything to get 3D. As the drivers progress pieces can be individually updated. Even in the non-ideal world (where the distribution hasn't done everything) the installation should just be rpm -Uvh XFree(a few of them) GLU GLUT KernelDriver. Your solution means YOU have to package and maintain all those pieces you mentioned. They will not be rolled into the common base, because they are specific to your solution. My comment about packaging is that your packaging requirement goes up because you can't rely on the community to do more of the work for you. Also, your solution include "kernel driver" How do you avoid having them compile it? If you're willing to restrict the users choice of kernel/distribution/etc (which isn't unreasonable in some cases) then distributing a binary is easy in either case. > I am in total agreement here. With regards to this, is anyone working > on adding SSE support to the transform/lighting code in Mesa? SSE stuff was somewhat broken in the kernels until recently. In fact, we (Gareth Hughes to be precise) just submitted a big kernel patch that should fully support SSE. I don't know if anyone is working on them for Mesa, I haven't seen much in that area lately. I'd start with profiling your app against the current Mesa base, to decide where the optimization effort should go. I'm not convinced SSE is the next right step. There may be more fundamental optimizations to do first. We haven't spent a much time on optimizing it. - |Daryll |
From: Michael M. M. <mor...@hy...> - 2000-05-26 18:04:30
|
Daryll Strauss wrote: > > The locking system is designed to be highly efficient. It is based on a > two tiered lock. Basically it works like this: > > The client wants the lock. The use the CAS (I was corrected that the > instruction is compare and swap, I knew that was the functionality, but > I got the name wrong) If the client was the last application to hold the > lock, you're done you move on. Excellent. So since our app will be the only X app running (we hopefully won't need a window manager) then after initial window creation and placement, we will always have the lock. > If it wasn't the last one, then we use an IOCTL to the kernel to > arbitrate the lock. > > In this case some or all of the state on the card may have > changed. The shared memory carries a stamp number for the X server. When > the X server does a window operation it increments the stamp. If the > client sees that the stamp has changed, it uses a DRI X protocol request > to get new window location and clip rects. This only happens on a window > move. Assuming your clip rects/window position hasn't changed, the > redisplay happens entirely in the client. > > The client may have other state to restore as well. In the case of > the tdfx driver we have three more flags for command fifo invalid, 3D > state invalid, textures invalid. If those are set the corresponding > state is restored. > > So, if the X server wakes up to process input, it current grabs the lock > but doesn't invalidate any state. I'm actually fixing this now so that > it doesn't grab the lock for input processing. Once that is complete, there will be no other reasons for the Xserver to grab the lock, it appears. Cool. > In the current implementation the X front buffer is the 3D front > buffer. When we do page flipping we'll continue to do the same > thing. Since you have an X window that covers the screen it is safe for > us to use the X surface's memory. Then we'll do page flipping. The only > issue will be falling back to blitting if the window is ever moved from > covering the whole screen. Excellent. Since we will never move the window, we should never have a problem. > > I guess I was unaware that X could run without an InputDevice being > > defined. I guess I did not check the docs. > > I'm not sure it will, but it does seem like a reasonable thing to > want. If that doesn't work, it could be added. I'll look into it.. If it is not there, I'll attempt to add it. > First, the kernel drivers are included in the stock 2.3.xx (soon to be > 2.4) kernels. So hopefully users just get them by default. > > Second, whether or not users have to compile anything is a support > issue. Kernel changes will require compiling by someone to keep them > current with the kernel. As a distributor of a kernel module you can > compile it for all the versions of Linux you support and/or you can > provide source. Someone has to do the compile the question is whether > you do it or they do and how much flexibility you want to leave them in > their system configuration. I am in agreement for 99% of the world. However, our systems are turn-key. We use a particular system configuration and software installation, we install all the software, and then ship it to the customer. All our users know (and really care to know) is that they have a rack of black PC's in the corner that draw the graphics. If they get in and try to fart around with the configuration, we take their warranty away... :) Our operator's console software connects via a socket to the remote rack, sends databases, runs the simulation, and retrieves collected data. The users don't have to log into these machines at all. It is all voodoo to them (Voodoo2, actually). > So, the final solution is that distributions roll in XFree 4.0 and the > 2.4 kernel and everything is hunky dorey. The user doesn't have to do > anything to get 3D. As the drivers progress pieces can be individually > updated. Even in the non-ideal world (where the distribution hasn't done > everything) the installation should just be rpm -Uvh XFree(a few of > them) GLU GLUT KernelDriver. And, again, this is more important on our end, since we will be doing the installations, and etc. But it will certainly be nice when a distribution rolls everything in... > Your solution means YOU have to package and maintain all those pieces > you mentioned. They will not be rolled into the common base, because > they are specific to your solution. My comment about packaging is that > your packaging requirement goes up because you can't rely on the > community to do more of the work for you. Also, your solution include That is where I was hoping for a "following" of people for removing the Xserver, for those who thought it was necessary. That way there would be something official, instead of Mike Morrison doing it all himself (which, in the end, he probably wouldn't if he was by himself :) Regardless, it appears that it is a moot point, since you have convinced me that it is not necessary anyway. > "kernel driver" How do you avoid having them compile it? If you're > willing to restrict the users choice of kernel/distribution/etc (which > isn't unreasonable in some cases) then distributing a binary is easy in > either case. Once the machine ships, it is a HUGE deal to have to upgrade the kernel, and we usually send a new hard drive, rather than allowing the potential of a customer to mess up an upgrade. In the past 6 months, this has only been required once. Again, we maintain everything here, and just send the deltas that are required. > > I am in total agreement here. With regards to this, is anyone working > > on adding SSE support to the transform/lighting code in Mesa? > > SSE stuff was somewhat broken in the kernels until recently. In fact, we > (Gareth Hughes to be precise) just submitted a big kernel patch that > should fully support SSE. I don't know if anyone is working on them for > Mesa, I haven't seen much in that area lately. > > I'd start with profiling your app against the current Mesa base, to > decide where the optimization effort should go. I'm not convinced SSE is > the next right step. There may be more fundamental optimizations to do > first. We haven't spent a much time on optimizing it. > I guess I didn't mean to imply that SSE was the "next step". The guys from Gemini (OpenGVS crew) said that it has provided one of the single largest performance increases to-date in their SimGL driver (which rides on top of Glide or D3D). We have actually been working to add SSE support to SimGL under Linux. However, since GCC does not currently support the intel "SSE Intrinsics" (which GVS used), nor does it seem to properly support __attribute__ ((aligned)) for our typedef of __m128, we decided to proceed down the path of NASM and assembly, where possible. We may end up bailing on this anyway, since other solutions have presented themselves. --- Michael M. Morrison VP/Chief Technical Officer Hyperion Technologies, Inc. |
From: David D. <da...@xf...> - 2000-05-26 20:21:42
|
On Fri, May 26, 2000 at 09:43:43AM -0600, Michael M. Morrison wrote: >> Allowing operations without a mouse should be trivial if it isn't a >> configuration option already. > >I guess I was unaware that X could run without an InputDevice being >defined. I guess I did not check the docs. There's a "void" input device in 4.0 that can be used as the core pointer and/or keyboard. It's purpose is to allow the server to operate without one or both core input devices. David -- David Dawes Email: dawes@XFree86.org Co-founder/President, The XFree86 Project, Inc Phone: +1 813 789 6919 http://www.xfree86.org/ Fax: +61 2 9897 3755 |