Re: [VirtualGL-Devel] Get pbuffer in turbovnc when VGL_READBACK=0

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

The problem with what you're proposing is that, if the pixels are not 
read back by VirtualGL, then the VNC X server will not have a copy of 
the pixels in its virtual framebuffer.  The lossless refresh, ALR, and 
interframe comparison features wouldn't work properly, since those 
features require that the pixels in the virtual FB be up to date. 
Further, if the window manager or another app attempted to obtain the 3D 
pixels via XGetImage() or XCopyArea(), they would get back bogus data.

Further, since TurboVNC's encoder is out of process from VirtualGL, 
synchronization would be an issue.  Ideally, you'd have VirtualGL create 
a PBO, then you would copy the pixels into the PBO and pass the PBO 
handle to TurboVNC, letting it read/compress the pixels out of the PBO 
while the rendering thread has moved on to the next frame.  You'd still 
need synchronization, however, because you don't want VGL to create more 
PBOs than necessary.  You want it to create a pool of 2 or 3 of them and 
reuse them, which requires blocking until TurboVNC has finished with a 
particular PBO handle.

If I were designing such a system, I'd do it as follows:

(1) Develop a custom X extension that works similarly to MIT-SHM but 
uses GPU memory (via a PBO) instead of POSIX shared memory.  You would 
need an equivalent of XShmPutImage() ("XGPUPutImage()" or 
"XPBOPutImage()" or whatnot.)  Within the body of this PutImage() 
function, TurboVNC would synchronize the pixels between a specified PBO 
and the VNC virtual framebuffer, then it would compress/send the given 
pixels.  It would be easiest if the compression took place within the 
body of the XPBOPutImage() function.  That way, we wouldn't have to have 
separate synchronization functions to allow VGL and TVNC to lock the 
PBO, nor would we have to track the PBO region separately within Xvnc so 
that it could be handled by a different codec whenever the deferred 
updates are processed.

(2) Develop an image transport extension in VGL that calls the 
hypothetical X extension.  Since the proposed PutImage() function would 
be synchronous, the image transport extension would have to work like 
the existing X11 transport, calling the PutImage() function in a 
separate thread.

(3) Extend TurboVNC such that it can compress a specific region of the 
virtual framebuffer from a PBO source instead of from the virtual FB. 
Note that this doesn't eliminate the need to synchronize the pixels from 
the PBO to the virtual FB, but it eliminates the need to copy the pixels 
back down to the GPU for compression.

This solution is still more of a hack than I would prefer.  Essentially 
all you're eliminating is a single buffer copy, and that overhead may 
not be very much if you used the GPU from within libjpeg-turbo instead 
of at a higher level.  The idea is that, if you are using the GPU at the 
low level, you are copying data to it in very small chunks, and you can 
hide that transfer time behind the time taken to do other operations. 
For instance, there is a proposed patch for libjpeg-turbo that uses 
OpenCL for doing certain decode operations, and the patched code is able 
to pipeline the GPU operations with Huffman decoding (Huffman is best 
left on the CPU due to the algorithm's lack of parallelism.)

In short, the foremost question on my mind is whether, despite the fact 
that you would incur an additional buffer copy, it might still be better 
from a performance point of view to do GPU compression at a lower level, 
within libjpeg-turbo.  It would certainly make things tons easier, since 
none of the hacks proposed above would be necessary.

The other thing is-- JPEG is not always the most appropriate compression 
algorithm.  TurboVNC only uses it for areas of the display that have 
high numbers of unique colors.  If you are using a CAD app or something 
else that doesn't generate images with a lot of unique colors, then a 
good portion of what you're sending to the client may actually be 
indexed color rather than JPEG.  This is another reason why implementing 
GPU compression at the codec level may be a better idea-- it wouldn't 
interfere with the existing encoding method selection mechanism in TurboVNC.

On 6/21/13 12:34 AM, Bharatkumar Sharma wrote:
> The final Goal is to do JPEG compression done by TurboVNC using libjpeg
> on GPU and not on CPU and in order to do that below is the given setup.
> We have a setup where VirtualGL interrupts all GLX calls and renders
> offline in a pbuffer on a NVIDIA GPU. In normal setup VirtualGL reads
> this rendered image and then this image is taken by TurboVNC. TurboVNC
> does compression of this image using libjpeg and sends it over to the
> client side. Now we want to fasten the process of compressing the image
> using CUDA on NVIDIA GPU.
> In order to do that I need to send this image to GPU and run the
> parallel compression algorithm and get back the compressed image to CPU
> to be sent to client. In order to save this extra effort of transferring
> data back and forth between GPU we thought VirtualGL should not read
> back the pbuffer and TurboVNC directly takes this pbuffer to do the
> compression on GPU.
> This saves 2 copies to the GPU. So I set VGL_READBACK=0 so that
> VirtualGL does not read back the pbuffer. Now the question is how does
> this pbuffer is accessible to TurboVNC?
> As I said I am new to VirtualGL and TurboVNC so kindly suggest the
> appropriate way of doing this.
>
>
> On Thu, Jun 20, 2013 at 8:16 PM, DRC <dco...@us...
> <mailto:dco...@us...>> wrote:
>
>     Please explain what you're trying to accomplish.
>
>     On Jun 20, 2013, at 1:54 AM, Bharatkumar Sharma
>     <bha...@gm... <mailto:bha...@gm...>>
>     wrote:
>
>     > Hi,
>     >
>     > I am new to VirtualGL and TurboVNC. We have requirement that VirtualGL should not readback the rendered image and TurboVNC before compressing the rendered image should get handle to this pbuffer.
>     > After reading the VirtualGL guide I see that setting VGL_READBACK will solve the first part of problem where the rendered image is not read back.
>     > But how to get handle to the pbuffer in TuboVNC before compression and sending to the client part is not very clear to me.
>     >
>     > In my knowledge pbuffer cannot be shared acroos process. I saw a similar approach used by ParaView where they create a wrapper around swapbuffer but I am not sure how to implement this.
>     >
>     > Regards,
>     > Bharat
>     > ------------------------------------------------------------------------------
>     > This SF.net email is sponsored by Windows:
>     >
>     > Build for Windows Store.
>     >
>     >http://p.sf.net/sfu/windows-dev2dev
>     > _______________________________________________
>     > VirtualGL-Devel mailing list
>     >Vir...@li...
>     <mailto:Vir...@li...>
>     >https://lists.sourceforge.net/lists/listinfo/virtualgl-devel
>
>     ------------------------------------------------------------------------------
>     This SF.net email is sponsored by Windows:
>
>     Build for Windows Store.
>
>     http://p.sf.net/sfu/windows-dev2dev
>     _______________________________________________
>     VirtualGL-Devel mailing list
>     Vir...@li...
>     <mailto:Vir...@li...>
>     https://lists.sourceforge.net/lists/listinfo/virtualgl-devel
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
>
>
>
> _______________________________________________
> VirtualGL-Devel mailing list
> Vir...@li...
> https://lists.sourceforge.net/lists/listinfo/virtualgl-devel
>

Re: [VirtualGL-Devel] Get pbuffer in turbovnc when VGL_READBACK=0

3D Without Boundaries

Re: [VirtualGL-Devel] Get pbuffer in turbovnc when VGL_READBACK=0