Re: [VirtualGL-Devel] Get pbuffer in turbovnc when VGL_READBACK=0
3D Without Boundaries
Brought to you by:
dcommander
From: DRC <dco...@us...> - 2013-07-17 21:10:19
|
The problem with what you're proposing is that, if the pixels are not read back by VirtualGL, then the VNC X server will not have a copy of the pixels in its virtual framebuffer. The lossless refresh, ALR, and interframe comparison features wouldn't work properly, since those features require that the pixels in the virtual FB be up to date. Further, if the window manager or another app attempted to obtain the 3D pixels via XGetImage() or XCopyArea(), they would get back bogus data. Further, since TurboVNC's encoder is out of process from VirtualGL, synchronization would be an issue. Ideally, you'd have VirtualGL create a PBO, then you would copy the pixels into the PBO and pass the PBO handle to TurboVNC, letting it read/compress the pixels out of the PBO while the rendering thread has moved on to the next frame. You'd still need synchronization, however, because you don't want VGL to create more PBOs than necessary. You want it to create a pool of 2 or 3 of them and reuse them, which requires blocking until TurboVNC has finished with a particular PBO handle. If I were designing such a system, I'd do it as follows: (1) Develop a custom X extension that works similarly to MIT-SHM but uses GPU memory (via a PBO) instead of POSIX shared memory. You would need an equivalent of XShmPutImage() ("XGPUPutImage()" or "XPBOPutImage()" or whatnot.) Within the body of this PutImage() function, TurboVNC would synchronize the pixels between a specified PBO and the VNC virtual framebuffer, then it would compress/send the given pixels. It would be easiest if the compression took place within the body of the XPBOPutImage() function. That way, we wouldn't have to have separate synchronization functions to allow VGL and TVNC to lock the PBO, nor would we have to track the PBO region separately within Xvnc so that it could be handled by a different codec whenever the deferred updates are processed. (2) Develop an image transport extension in VGL that calls the hypothetical X extension. Since the proposed PutImage() function would be synchronous, the image transport extension would have to work like the existing X11 transport, calling the PutImage() function in a separate thread. (3) Extend TurboVNC such that it can compress a specific region of the virtual framebuffer from a PBO source instead of from the virtual FB. Note that this doesn't eliminate the need to synchronize the pixels from the PBO to the virtual FB, but it eliminates the need to copy the pixels back down to the GPU for compression. This solution is still more of a hack than I would prefer. Essentially all you're eliminating is a single buffer copy, and that overhead may not be very much if you used the GPU from within libjpeg-turbo instead of at a higher level. The idea is that, if you are using the GPU at the low level, you are copying data to it in very small chunks, and you can hide that transfer time behind the time taken to do other operations. For instance, there is a proposed patch for libjpeg-turbo that uses OpenCL for doing certain decode operations, and the patched code is able to pipeline the GPU operations with Huffman decoding (Huffman is best left on the CPU due to the algorithm's lack of parallelism.) In short, the foremost question on my mind is whether, despite the fact that you would incur an additional buffer copy, it might still be better from a performance point of view to do GPU compression at a lower level, within libjpeg-turbo. It would certainly make things tons easier, since none of the hacks proposed above would be necessary. The other thing is-- JPEG is not always the most appropriate compression algorithm. TurboVNC only uses it for areas of the display that have high numbers of unique colors. If you are using a CAD app or something else that doesn't generate images with a lot of unique colors, then a good portion of what you're sending to the client may actually be indexed color rather than JPEG. This is another reason why implementing GPU compression at the codec level may be a better idea-- it wouldn't interfere with the existing encoding method selection mechanism in TurboVNC. On 6/21/13 12:34 AM, Bharatkumar Sharma wrote: > The final Goal is to do JPEG compression done by TurboVNC using libjpeg > on GPU and not on CPU and in order to do that below is the given setup. > We have a setup where VirtualGL interrupts all GLX calls and renders > offline in a pbuffer on a NVIDIA GPU. In normal setup VirtualGL reads > this rendered image and then this image is taken by TurboVNC. TurboVNC > does compression of this image using libjpeg and sends it over to the > client side. Now we want to fasten the process of compressing the image > using CUDA on NVIDIA GPU. > In order to do that I need to send this image to GPU and run the > parallel compression algorithm and get back the compressed image to CPU > to be sent to client. In order to save this extra effort of transferring > data back and forth between GPU we thought VirtualGL should not read > back the pbuffer and TurboVNC directly takes this pbuffer to do the > compression on GPU. > This saves 2 copies to the GPU. So I set VGL_READBACK=0 so that > VirtualGL does not read back the pbuffer. Now the question is how does > this pbuffer is accessible to TurboVNC? > As I said I am new to VirtualGL and TurboVNC so kindly suggest the > appropriate way of doing this. > > > On Thu, Jun 20, 2013 at 8:16 PM, DRC <dco...@us... > <mailto:dco...@us...>> wrote: > > Please explain what you're trying to accomplish. > > On Jun 20, 2013, at 1:54 AM, Bharatkumar Sharma > <bha...@gm... <mailto:bha...@gm...>> > wrote: > > > Hi, > > > > I am new to VirtualGL and TurboVNC. We have requirement that VirtualGL should not readback the rendered image and TurboVNC before compressing the rendered image should get handle to this pbuffer. > > After reading the VirtualGL guide I see that setting VGL_READBACK will solve the first part of problem where the rendered image is not read back. > > But how to get handle to the pbuffer in TuboVNC before compression and sending to the client part is not very clear to me. > > > > In my knowledge pbuffer cannot be shared acroos process. I saw a similar approach used by ParaView where they create a wrapper around swapbuffer but I am not sure how to implement this. > > > > Regards, > > Bharat > > ------------------------------------------------------------------------------ > > This SF.net email is sponsored by Windows: > > > > Build for Windows Store. > > > >http://p.sf.net/sfu/windows-dev2dev > > _______________________________________________ > > VirtualGL-Devel mailing list > >Vir...@li... > <mailto:Vir...@li...> > >https://lists.sourceforge.net/lists/listinfo/virtualgl-devel > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > VirtualGL-Devel mailing list > Vir...@li... > <mailto:Vir...@li...> > https://lists.sourceforge.net/lists/listinfo/virtualgl-devel > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > > > > _______________________________________________ > VirtualGL-Devel mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-devel > |