Thread: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

3D Without Boundaries

Brought to you by: dcommander

virtualgl-users

[VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: Göbbert, J. H. <goe...@vr...> - 2014-10-02 18:11:45

Hi VirtualGL,

I am using virtualGL since some years now to get 3d-accelerated remote visualization possible via TurboVNC on our frond-nodes of a compute-cluster via.
Just recently I was asked why remote 3d-accelerated desktop scenario is not possible with multiple 3d-accelerated X-servers+VNC, each dedicated to a single user?

I cannot answer this question as I would like to, as it seems to run fine:
We tested to run multiple 3d-accelerated X-servers on the same machine with a single GPU without any problems.
glxgears showed 600 frames per second on both at the same time -> both X-server where 3d-accelerated.

Why shouldn´t I go for multiple 3D-X-server (one for each user)
and send its framebuffer via VNC to the workstations
instead of using VirtualGL?

Best,
Jens Henrik

--
Dipl.-Ing. Jens Henrik Göbbert

IT Center - Computational Science & Engineering
Computer Science Department - Virtual Reality Group
Jülich Aachen Research Alliance - JARA-HPC

IT Center
RWTH Aachen University
Seffenter Weg 23
52074 Aachen, Germany
Phone: +49 241 80-24381
goe...@vr...<mailto:goe...@vr...>
www.vr.rwth-aachen.de<http://www.vr.rwth-aachen.de>
http://www.jara.org<http://www.jara.org/>

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: Robert G. <ra...@rd...> - 2014-10-02 19:33:33

Someone please correct me if I am wrong here but isn't 600 fps low for 3D?
I get that same performance from software GLX from TigerVNC.  When running
the same tests on an actual hardware X server with 3D acceleration, I
numbers in the thousands.  The 600 fps is what I get from using CPU
resources to simulate 3D operations instead of a GPU actually doing the
rendering operations.

On Thu, Oct 2, 2014 at 1:36 PM, Göbbert, Jens Henrik <
goe...@vr...> wrote:

>  Hi VirtualGL,
>
>
>
> I am using virtualGL since some years now to get 3d-accelerated remote
> visualization possible via TurboVNC on our frond-nodes of a compute-cluster
> via.
>
> Just recently I was asked why remote 3d-accelerated desktop scenario is
> not possible with multiple 3d-accelerated X-servers+VNC, each dedicated to
> a single user?
>
>
>
> I cannot answer this question as I would like to, as it seems to run fine:
>
> We tested to run multiple 3d-accelerated X-servers on the same machine
> with a single GPU without any problems.
>
> glxgears showed 600 frames per second on both at the same time -> both
> X-server where 3d-accelerated.
>
>
>
> Why shouldn´t I go for multiple 3D-X-server (one for each user)
>
> and send its framebuffer via VNC to the workstations
>
> instead of using VirtualGL?
>
>
>
> Best,
>
> Jens Henrik
>
>
>
> --
>
> Dipl.-Ing. Jens Henrik Göbbert
>
>
>
> IT Center - Computational Science & Engineering
>
> Computer Science Department - Virtual Reality Group
>
> Jülich Aachen Research Alliance - JARA-HPC
>
>
>
> IT Center
>
> RWTH Aachen University
>
> Seffenter Weg 23
>
> 52074 Aachen, Germany
>
> Phone: +49 241 80-24381
>
> goe...@vr...
>
> www.vr.rwth-aachen.de
>
> http://www.jara.org
>
>
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> VirtualGL-Users mailing list
> Vir...@li...
> https://lists.sourceforge.net/lists/listinfo/virtualgl-users
>
>

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: DRC <dco...@us...> - 2014-10-02 20:26:55

For starters, GLXgears is not a GPU benchmark.  It is a CPU benchmark, 
because its geometry and window size are so small that its frame rate is 
almost entirely dependent upon CPU overhead.  Please (and I'm saying 
this to the entire 3D application community, not just you) stop quoting 
GLXgears frame rates as if they have any relevance to GPU performance.

GLXspheres (which is provided with VirtualGL) is a much better solution 
if you need something quick & dirty, but you also have to understand 
what it is you're benchmarking.  GLXspheres is designed primarily as an 
image benchmark for remote display systems, so it is meant to be limited 
by the drawing speed of the remote display solution, not by the 3D 
rendering speed.  On the K5000 that nVidia was kind enough to send me 
for testing, I can literally max out the geometry size on GLXspheres-- 
over a billion polys-- and it keeps chugging along at 300 fps, because 
it's using display lists by default (and thus, once the geometry is 
downloaded once to the GPU, subsequent frames just instruct the GPU to 
reuse that same geometry.)

Not every app uses display lists, though, so if you want to use 
GLXspheres as a quick & dirty OpenGL pipeline benchmark, then I suggest 
boosting its geometry size to 500,000 or a million polygons and enabling 
immediate mode (-m -p 500000).  This will give a better sense of what a 
"busy" immediate-mode OpenGL app might do.

When your benchmark is running at hundreds of frames per second, that's 
a clue that it isn't testing anything resembling a real-world use case. 
  In the real world, you're never going to see more than 60 fps because 
of your monitor's refresh rate, and most humans can't perceive any 
difference after 25-30 fps.  In real-world visualization scenarios, if 
things get too fast, then the engineers will just start using larger 
(more accurate) models.  :)

So why would you use VirtualGL?  Several reasons:

(1) The approach you're describing, in which multiple 3D X servers are 
served up with VNC, requires screen scraping.  Screen scraping 
periodically reads the pixels on the framebuffer and compares them 
against a snapshot of the pixels taken earlier.  There are some 
solutions-- the RealVNC/TigerVNC X.org module and x11vnc, for instance-- 
that are a little more sophisticated than just a plain screen scraper. 
They use the X Damage extension and other techniques to get hints as to 
which part of the display to read back, but these techniques don't work 
well (or sometimes at all) with hardware-accelerated 3D.  Either the 
OpenGL pixels don't update at all, or OpenGL drawing is out of sync with 
the delivery of pixels to the client (and thus you get tearing artifacts.)

I personally tested the version of x11vnc that ships with libvncserver 
0.9.9 (libvncserver 0.9.9 has the TurboVNC extensions, so at the library 
level at least, it's a fast solution.)  I observed bad tearing artifacts 
for a few seconds, and then it would hang because the X server got too 
busy processing the 3D drawing and couldn't spare any cycles for x11vnc 
(X servers are single-threaded.)  Turning off X Damage support in x11vnc 
made the solution at least usable, but without X Damage support, x11vnc 
is mainly just polling the display, so it will incur a lot of overhead. 
  This became particularly evident when using interactive apps. 
glxspheres -i

I couldn't get the TigerVNC 1.3.1 X.org module to work at all, and the 
TigerVNC 1.1.0 X.org module (the one that ships with RHEL 6) did not 
display any pixels from the OpenGL app.

(2) The ability to share a GPU among multiple users.  VirtualGL 
installations often have dozens of users sharing the GPU, because not 
all of them will be using it simultaneously, and even when they are, 
they might only need to process a small model that uses 1% of the GPU's 
capacity.  Like I said above, a K5000 pipe can process billions of 
polys/second.  That's the equivalent performance of at least a handful 
of desktop GPUs (if not more) combined.  It's a lot more cost-effective 
to buy beefy servers with really beefy multi-pipe GPU configurations and 
provision the servers to handle 40 or 50 users.  You can't do that if 
each user has a dedicated GPU, because you can't install 40 or 50 
dedicated GPUs into a single machine.

(3) Efficiency and cost.  VirtualGL and TurboVNC are only going to take 
up resources when they are running.  A full-blown X server has a much 
larger footprint.  The screen scraper will eat up CPU cycles even if the 
3D application is sitting there doing nothing, because the screen 
scraper is having to poll for changes in the pixels. 
TurboVNC/VirtualGL, on the other hand, will not take up CPU cycles 
unless the 3D application is actually drawing something.  Furthermore, 
if the user goes to lunch, their GPU is now sitting completely idle.  If 
the user only needs to process a 50,000-polygon model, then their 
dedicated GPU is being grossly underutilized.

On 10/2/14 12:36 PM, Göbbert, Jens Henrik wrote:
> Hi VirtualGL,
>
> I am using virtualGL since some years now to get 3d-accelerated remote
> visualization possible via TurboVNC on our frond-nodes of a
> compute-cluster via.
>
> Just recently I was asked why remote 3d-accelerated desktop scenario is
> not possible with multiple 3d-accelerated X-servers+VNC, each dedicated
> to a single user?
>
> I cannot answer this question as I would like to, as it seems to run fine:
>
> We tested to run multiple 3d-accelerated X-servers on the same machine
> with a single GPU without any problems.
>
> glxgears showed 600 frames per second on both at the same time –> both
> X-server where 3d-accelerated.
>
> Why shouldn´t I go for multiple 3D-X-server (one for each user)
>
> and send its framebuffer via VNC to the workstations
>
> instead of using VirtualGL?
>
> Best,
>
> Jens Henrik

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: Nathan K. <nat...@sp...> - 2014-10-02 22:14:00

On 02/10/14 04:26 PM, DRC wrote:
> On the K5000 that nVidia was kind enough to send me 
> for testing, I can literally max out the geometry size on GLXspheres-- 
> over a billion polys-- and it keeps chugging along at 300 fps, because 
> it's using display lists by default (and thus, once the geometry is 
> downloaded once to the GPU, subsequent frames just instruct the GPU to 
> reuse that same geometry.)

FYI I recently was testing the theoretical limit on a card and went down
the path of:
 `glxspheres -p 1000000`    "no difference"
 `glxspheres -p 10000000`   "hmmm, not breaking a sweat"
 `glxspheres -p 1000000000` "wow"

Then I took a trace and found out that the number of actual ROPs was no
different between 10 million and 1 billion. gluSphere() apparently hits
a limit on how much geometry it produces and won't go higher (increasing
window size didn't do anything; I didn't read the GLU source).

Bottom line:  `glxspheres -p 3500000` (which equates to a little over 14
millon ROPs per frame) is the highest load the stock glxspheres/libGLU
will produce.

-Nathan

-- 
Nathan Kidd         OpenText Connectivity Solutions   nk...@op...
Software Developer  http://connectivity.opentext.com  +1 905-762-6001

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: DRC <dco...@us...> - 2014-10-03 01:41:56

Well, gee, that makes a lot more sense.  Dean, if you're reading this, 
the last column in that chart I sent you a year ago should read "3.5 
million" and not "10 million."  :|

I made a note about that in the usage screen of GLXspheres.  I think I 
should further modify it so that it allows the sphere count to be 
adjusted.  It was kept static in order to guarantee that the same image 
was always generated.

Note that there are actually 61 spheres in the default configuration (20 
per ring + the 1 in the center), so apparently the polygon limit is 
around 60,000 per sphere.  It might simply be that the polygon count is 
clamped to a 16-bit value or something.

On 10/2/14 5:13 PM, Nathan Kidd wrote:
> On 02/10/14 04:26 PM, DRC wrote:
>> On the K5000 that nVidia was kind enough to send me
>> for testing, I can literally max out the geometry size on GLXspheres--
>> over a billion polys-- and it keeps chugging along at 300 fps, because
>> it's using display lists by default (and thus, once the geometry is
>> downloaded once to the GPU, subsequent frames just instruct the GPU to
>> reuse that same geometry.)
>
> FYI I recently was testing the theoretical limit on a card and went down
> the path of:
>   `glxspheres -p 1000000`    "no difference"
>   `glxspheres -p 10000000`   "hmmm, not breaking a sweat"
>   `glxspheres -p 1000000000` "wow"
>
> Then I took a trace and found out that the number of actual ROPs was no
> different between 10 million and 1 billion. gluSphere() apparently hits
> a limit on how much geometry it produces and won't go higher (increasing
> window size didn't do anything; I didn't read the GLU source).
>
> Bottom line:  `glxspheres -p 3500000` (which equates to a little over 14
> millon ROPs per frame) is the highest load the stock glxspheres/libGLU
> will produce.
>
> -Nathan
>

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: Nathan K. <nat...@sp...> - 2014-10-03 03:38:44

On 02/10/14 09:16 PM, DRC wrote:
> Note that there are actually 61 spheres in the default configuration (20 
> per ring + the 1 in the center), so apparently the polygon limit is 
> around 60,000 per sphere.  It might simply be that the polygon count is 
> clamped to a 16-bit value or something.

Ok, I won't be lazy.  `apt-get source libglu1-mesa` and a little
LibreOffice Calc later the restriction is quite plain:

quad.c:
#define CACHE_SIZE      240
...
gluSphere(GLUquadric *qobj, GLdouble radius, GLint slices, GLint stacks)
    if (slices >= CACHE_SIZE) slices = CACHE_SIZE-1;
    if (stacks >= CACHE_SIZE) stacks = CACHE_SIZE-1;

And the solid drawing path is roughly:
 (glVertex + glNormal) * (slice*2 + 2  + (stacks -2)*slices*2)

Thus max ROPs/sphere = 227532, * 61 spheres = 13879452 total ROPs.

Looks like the easiest path to increased poly count is to increase the
number of spheres.

-Nathan

-- 
Nathan Kidd         OpenText Connectivity Solutions   nk...@op...
Software Developer  http://connectivity.opentext.com  +1 905-762-6001

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: DRC <dco...@us...> - 2014-10-03 04:45:25

We crossed the streams.  I discovered exactly the same thing (refer to 
previous message.)


On 10/2/14 10:38 PM, Nathan Kidd wrote:
> On 02/10/14 09:16 PM, DRC wrote:
>> Note that there are actually 61 spheres in the default configuration (20
>> per ring + the 1 in the center), so apparently the polygon limit is
>> around 60,000 per sphere.  It might simply be that the polygon count is
>> clamped to a 16-bit value or something.
>
> Ok, I won't be lazy.  `apt-get source libglu1-mesa` and a little
> LibreOffice Calc later the restriction is quite plain:
>
> quad.c:
> #define CACHE_SIZE      240
> ...
> gluSphere(GLUquadric *qobj, GLdouble radius, GLint slices, GLint stacks)
>      if (slices >= CACHE_SIZE) slices = CACHE_SIZE-1;
>      if (stacks >= CACHE_SIZE) stacks = CACHE_SIZE-1;
>
> And the solid drawing path is roughly:
>   (glVertex + glNormal) * (slice*2 + 2  + (stacks -2)*slices*2)
>
> Thus max ROPs/sphere = 227532, * 61 spheres = 13879452 total ROPs.
>
> Looks like the easiest path to increased poly count is to increase the
> number of spheres.
>
> -Nathan
>

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: DRC <dco...@us...> - 2014-10-03 04:44:35

Confirmed in the libGLU source-- slices and stacks are clamped at 240, 
so 57600 polys per sphere is the max-- very nicely fits your observation 
(57600 * 61 spheres = 3513600 polys.)

I modified GLXspheres such that it calculates whether this limit will be 
reached, warns the user, and prints the actual polygon count, taking the 
limit into account.  I also added an option (-n) for increasing the 
sphere count, which enables polygon counts higher than 3.5 million.  For 
instance:

>glxspheres -n 240 -p 10000000
Polygons in scene: 10029456 (241 spheres * 41616 polys/spheres)
Visual ID of window: 0x2c
Context is Direct
OpenGL Renderer: Quadro K5000/PCIe/SSE2
92.948421 frames/sec - 103.730438 Mpixels/sec
95.584804 frames/sec - 106.672641 Mpixels/sec

> glxspheres -n 2400 -p 100000000
Polygons in scene: 99920016 (2401 spheres * 41616 polys/spheres)
Visual ID of window: 0x2c
Context is Direct
OpenGL Renderer: Quadro K5000/PCIe/SSE2
10.136359 frames/sec - 11.312176 Mpixels/sec
9.982275 frames/sec - 11.140219 Mpixels/sec

Much more along the lines of what I would expect from the K5000 (about a 
billion quads/second.  The press usually reports it at 1.8 billion 
tris/sec, so that number make sense.)

To pop the stack on the original poster's questions, at the OpenGL 
level, you can get linear or even super-linear scaling of the GPU 
resource among multiple users.  If I run 5 sessions of GLXspheres at a 
time, each will perform at about 200 million quads/second.  If I run 10, 
each will perform at about 100 million quads/second.  If each user is 
working with a 1-million-polygon model, then that's over 30 users at 30 
frames/second.  Obviously there will be other constraints on this in a 
real-world environment-- VirtualGL and TurboVNC have some CPU overhead 
to compress/deliver the 3D images to the client, users might be dealing 
with larger models, applications that use a lot of textures won't scale 
as well because they'll exhaust GPU memory, etc.  However, you're also 
not going to have all 30 users banging away all the time.  Some of them 
will be down the hall, some of them will be reading e-mail, some of them 
won't even be in the office, some will be staring at the model and 
making small changes rather than manipulating the entire scene.

On 10/2/14 5:13 PM, Nathan Kidd wrote:
> On 02/10/14 04:26 PM, DRC wrote:
>> On the K5000 that nVidia was kind enough to send me
>> for testing, I can literally max out the geometry size on GLXspheres--
>> over a billion polys-- and it keeps chugging along at 300 fps, because
>> it's using display lists by default (and thus, once the geometry is
>> downloaded once to the GPU, subsequent frames just instruct the GPU to
>> reuse that same geometry.)
>
> FYI I recently was testing the theoretical limit on a card and went down
> the path of:
>   `glxspheres -p 1000000`    "no difference"
>   `glxspheres -p 10000000`   "hmmm, not breaking a sweat"
>   `glxspheres -p 1000000000` "wow"
>
> Then I took a trace and found out that the number of actual ROPs was no
> different between 10 million and 1 billion. gluSphere() apparently hits
> a limit on how much geometry it produces and won't go higher (increasing
> window size didn't do anything; I didn't read the GLU source).
>
> Bottom line:  `glxspheres -p 3500000` (which equates to a little over 14
> millon ROPs per frame) is the highest load the stock glxspheres/libGLU
> will produce.
>
> -Nathan
>

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: DRC <dco...@us...> - 2014-10-03 04:51:06

On 10/2/14 11:44 PM, DRC wrote:
> To pop the stack on the original poster's questions, at the OpenGL
> level, you can get linear or even super-linear scaling of the GPU
> resource among multiple users.  If I run 5 sessions of GLXspheres at a
> time, each will perform at about 200 million quads/second.  If I run 10,
> each will perform at about 100 million quads/second.  If each user is
> working with a 1-million-polygon model, then that's over 30 users at 30
> frames/second.  Obviously there will be other constraints on this in a
> real-world environment-- VirtualGL and TurboVNC have some CPU overhead
> to compress/deliver the 3D images to the client, users might be dealing
> with larger models, applications that use a lot of textures won't scale
> as well because they'll exhaust GPU memory, etc.  However, you're also
> not going to have all 30 users banging away all the time.  Some of them
> will be down the hall, some of them will be reading e-mail, some of them
> won't even be in the office, some will be staring at the model and
> making small changes rather than manipulating the entire scene.

I should also mention that another constraint you'll have in a 
real-world environment is reading back the pixels, and you may exhaust 
your bus bandwidth before you actually exhaust your GPU processing 
power.  But my point is-- people throw quite a few users onto their 
GPUs.  Santos, one of our largest (if not our largest) installations, 
provisions about 13-16 users per high-end nVidia pipe, although the 
user workloads vary greatly (oil & gas apps run the gamut of everything 
from straight 2D X11 all the way to monster 3D visualization.)


> On 10/2/14 5:13 PM, Nathan Kidd wrote:
>> On 02/10/14 04:26 PM, DRC wrote:
>>> On the K5000 that nVidia was kind enough to send me
>>> for testing, I can literally max out the geometry size on GLXspheres--
>>> over a billion polys-- and it keeps chugging along at 300 fps, because
>>> it's using display lists by default (and thus, once the geometry is
>>> downloaded once to the GPU, subsequent frames just instruct the GPU to
>>> reuse that same geometry.)
>>
>> FYI I recently was testing the theoretical limit on a card and went down
>> the path of:
>>   `glxspheres -p 1000000`    "no difference"
>>   `glxspheres -p 10000000`   "hmmm, not breaking a sweat"
>>   `glxspheres -p 1000000000` "wow"
>>
>> Then I took a trace and found out that the number of actual ROPs was no
>> different between 10 million and 1 billion. gluSphere() apparently hits
>> a limit on how much geometry it produces and won't go higher (increasing
>> window size didn't do anything; I didn't read the GLU source).
>>
>> Bottom line:  `glxspheres -p 3500000` (which equates to a little over 14
>> millon ROPs per frame) is the highest load the stock glxspheres/libGLU
>> will produce.
>>
>> -Nathan
>>

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: Göbbert, J. H. <goe...@vr...> - 2014-10-03 05:14:00

Hi VirtualGL,

thanks for your detailed answer - we searched, but could not find a good explanaition like this:

> Efficiency and cost.  VirtualGL and TurboVNC are only going to take
> up resources when they are running.  A full-blown X server has a much
> larger footprint.  The screen scraper will eat up CPU cycles even if the
> 3D application is sitting there doing nothing, because the screen
> scraper is having to poll for changes in the pixels.

 best,
Jens Henrik

P.S.: You might want mention the 'screen scraper'-approach in the documentation/introduction/wiki and compare it with VirtualGL.

________________________________________
From: DRC [dco...@us...]
Sent: Thursday, October 02, 2014 10:26 PM
To: vir...@li...
Subject: Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

For starters, GLXgears is not a GPU benchmark.  It is a CPU benchmark,
because its geometry and window size are so small that its frame rate is
almost entirely dependent upon CPU overhead.  Please (and I'm saying
this to the entire 3D application community, not just you) stop quoting
GLXgears frame rates as if they have any relevance to GPU performance.

GLXspheres (which is provided with VirtualGL) is a much better solution
if you need something quick & dirty, but you also have to understand
what it is you're benchmarking.  GLXspheres is designed primarily as an
image benchmark for remote display systems, so it is meant to be limited
by the drawing speed of the remote display solution, not by the 3D
rendering speed.  On the K5000 that nVidia was kind enough to send me
for testing, I can literally max out the geometry size on GLXspheres--
over a billion polys-- and it keeps chugging along at 300 fps, because
it's using display lists by default (and thus, once the geometry is
downloaded once to the GPU, subsequent frames just instruct the GPU to
reuse that same geometry.)

Not every app uses display lists, though, so if you want to use
GLXspheres as a quick & dirty OpenGL pipeline benchmark, then I suggest
boosting its geometry size to 500,000 or a million polygons and enabling
immediate mode (-m -p 500000).  This will give a better sense of what a
"busy" immediate-mode OpenGL app might do.

When your benchmark is running at hundreds of frames per second, that's
a clue that it isn't testing anything resembling a real-world use case.
  In the real world, you're never going to see more than 60 fps because
of your monitor's refresh rate, and most humans can't perceive any
difference after 25-30 fps.  In real-world visualization scenarios, if
things get too fast, then the engineers will just start using larger
(more accurate) models.  :)

So why would you use VirtualGL?  Several reasons:

(1) The approach you're describing, in which multiple 3D X servers are
served up with VNC, requires screen scraping.  Screen scraping
periodically reads the pixels on the framebuffer and compares them
against a snapshot of the pixels taken earlier.  There are some
solutions-- the RealVNC/TigerVNC X.org module and x11vnc, for instance--
that are a little more sophisticated than just a plain screen scraper.
They use the X Damage extension and other techniques to get hints as to
which part of the display to read back, but these techniques don't work
well (or sometimes at all) with hardware-accelerated 3D.  Either the
OpenGL pixels don't update at all, or OpenGL drawing is out of sync with
the delivery of pixels to the client (and thus you get tearing artifacts.)

I personally tested the version of x11vnc that ships with libvncserver
0.9.9 (libvncserver 0.9.9 has the TurboVNC extensions, so at the library
level at least, it's a fast solution.)  I observed bad tearing artifacts
for a few seconds, and then it would hang because the X server got too
busy processing the 3D drawing and couldn't spare any cycles for x11vnc
(X servers are single-threaded.)  Turning off X Damage support in x11vnc
made the solution at least usable, but without X Damage support, x11vnc
is mainly just polling the display, so it will incur a lot of overhead.
  This became particularly evident when using interactive apps.
glxspheres -i

I couldn't get the TigerVNC 1.3.1 X.org module to work at all, and the
TigerVNC 1.1.0 X.org module (the one that ships with RHEL 6) did not
display any pixels from the OpenGL app.

(2) The ability to share a GPU among multiple users.  VirtualGL
installations often have dozens of users sharing the GPU, because not
all of them will be using it simultaneously, and even when they are,
they might only need to process a small model that uses 1% of the GPU's
capacity.  Like I said above, a K5000 pipe can process billions of
polys/second.  That's the equivalent performance of at least a handful
of desktop GPUs (if not more) combined.  It's a lot more cost-effective
to buy beefy servers with really beefy multi-pipe GPU configurations and
provision the servers to handle 40 or 50 users.  You can't do that if
each user has a dedicated GPU, because you can't install 40 or 50
dedicated GPUs into a single machine.

(3) Efficiency and cost.  VirtualGL and TurboVNC are only going to take
up resources when they are running.  A full-blown X server has a much
larger footprint.  The screen scraper will eat up CPU cycles even if the
3D application is sitting there doing nothing, because the screen
scraper is having to poll for changes in the pixels.
TurboVNC/VirtualGL, on the other hand, will not take up CPU cycles
unless the 3D application is actually drawing something.  Furthermore,
if the user goes to lunch, their GPU is now sitting completely idle.  If
the user only needs to process a 50,000-polygon model, then their
dedicated GPU is being grossly underutilized.

On 10/2/14 12:36 PM, Göbbert, Jens Henrik wrote:
> Hi VirtualGL,
>
> I am using virtualGL since some years now to get 3d-accelerated remote
> visualization possible via TurboVNC on our frond-nodes of a
> compute-cluster via.
>
> Just recently I was asked why remote 3d-accelerated desktop scenario is
> not possible with multiple 3d-accelerated X-servers+VNC, each dedicated
> to a single user?
>
> I cannot answer this question as I would like to, as it seems to run fine:
>
> We tested to run multiple 3d-accelerated X-servers on the same machine
> with a single GPU without any problems.
>
> glxgears showed 600 frames per second on both at the same time –> both
> X-server where 3d-accelerated.
>
> Why shouldn´t I go for multiple 3D-X-server (one for each user)
>
> and send its framebuffer via VNC to the workstations
>
> instead of using VirtualGL?
>
> Best,
>
> Jens Henrik

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
VirtualGL-Users mailing list
Vir...@li...
https://lists.sourceforge.net/lists/listinfo/virtualgl-users

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: DRC <dco...@us...> - 2014-10-03 05:45:40

A lot of what I posted here is just a more nuts-and-bolts version of the 
same information that is provided in the background article:

http://www.virtualgl.org/About/Background

The nuts and bolts change over time, and I don't really want to have to 
re-write that article every time they do.  The basic 10,000-foot 
managerial explanation has not changed in 10 years, and it's basically 
this:  VirtualGL enables multi-user GPU sharing/load balancing, whereas 
the screen scraping/dedicated GPU approach doesn't.


On 10/3/14 12:13 AM, Göbbert, Jens Henrik wrote:
> Hi VirtualGL,
>
> thanks for your detailed answer - we searched, but could not find a good explanaition like this:
>
>> Efficiency and cost.  VirtualGL and TurboVNC are only going to take
>> up resources when they are running.  A full-blown X server has a much
>> larger footprint.  The screen scraper will eat up CPU cycles even if the
>> 3D application is sitting there doing nothing, because the screen
>> scraper is having to poll for changes in the pixels.
>
>   best,
> Jens Henrik
>
> P.S.: You might want mention the 'screen scraper'-approach in the documentation/introduction/wiki and compare it with VirtualGL.
>
> ________________________________________
> From: DRC [dco...@us...]
> Sent: Thursday, October 02, 2014 10:26 PM
> To: vir...@li...
> Subject: Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers
>
> For starters, GLXgears is not a GPU benchmark.  It is a CPU benchmark,
> because its geometry and window size are so small that its frame rate is
> almost entirely dependent upon CPU overhead.  Please (and I'm saying
> this to the entire 3D application community, not just you) stop quoting
> GLXgears frame rates as if they have any relevance to GPU performance.
>
> GLXspheres (which is provided with VirtualGL) is a much better solution
> if you need something quick & dirty, but you also have to understand
> what it is you're benchmarking.  GLXspheres is designed primarily as an
> image benchmark for remote display systems, so it is meant to be limited
> by the drawing speed of the remote display solution, not by the 3D
> rendering speed.  On the K5000 that nVidia was kind enough to send me
> for testing, I can literally max out the geometry size on GLXspheres--
> over a billion polys-- and it keeps chugging along at 300 fps, because
> it's using display lists by default (and thus, once the geometry is
> downloaded once to the GPU, subsequent frames just instruct the GPU to
> reuse that same geometry.)
>
> Not every app uses display lists, though, so if you want to use
> GLXspheres as a quick & dirty OpenGL pipeline benchmark, then I suggest
> boosting its geometry size to 500,000 or a million polygons and enabling
> immediate mode (-m -p 500000).  This will give a better sense of what a
> "busy" immediate-mode OpenGL app might do.
>
> When your benchmark is running at hundreds of frames per second, that's
> a clue that it isn't testing anything resembling a real-world use case.
>    In the real world, you're never going to see more than 60 fps because
> of your monitor's refresh rate, and most humans can't perceive any
> difference after 25-30 fps.  In real-world visualization scenarios, if
> things get too fast, then the engineers will just start using larger
> (more accurate) models.  :)
>
> So why would you use VirtualGL?  Several reasons:
>
> (1) The approach you're describing, in which multiple 3D X servers are
> served up with VNC, requires screen scraping.  Screen scraping
> periodically reads the pixels on the framebuffer and compares them
> against a snapshot of the pixels taken earlier.  There are some
> solutions-- the RealVNC/TigerVNC X.org module and x11vnc, for instance--
> that are a little more sophisticated than just a plain screen scraper.
> They use the X Damage extension and other techniques to get hints as to
> which part of the display to read back, but these techniques don't work
> well (or sometimes at all) with hardware-accelerated 3D.  Either the
> OpenGL pixels don't update at all, or OpenGL drawing is out of sync with
> the delivery of pixels to the client (and thus you get tearing artifacts.)
>
> I personally tested the version of x11vnc that ships with libvncserver
> 0.9.9 (libvncserver 0.9.9 has the TurboVNC extensions, so at the library
> level at least, it's a fast solution.)  I observed bad tearing artifacts
> for a few seconds, and then it would hang because the X server got too
> busy processing the 3D drawing and couldn't spare any cycles for x11vnc
> (X servers are single-threaded.)  Turning off X Damage support in x11vnc
> made the solution at least usable, but without X Damage support, x11vnc
> is mainly just polling the display, so it will incur a lot of overhead.
>    This became particularly evident when using interactive apps.
> glxspheres -i
>
> I couldn't get the TigerVNC 1.3.1 X.org module to work at all, and the
> TigerVNC 1.1.0 X.org module (the one that ships with RHEL 6) did not
> display any pixels from the OpenGL app.
>
> (2) The ability to share a GPU among multiple users.  VirtualGL
> installations often have dozens of users sharing the GPU, because not
> all of them will be using it simultaneously, and even when they are,
> they might only need to process a small model that uses 1% of the GPU's
> capacity.  Like I said above, a K5000 pipe can process billions of
> polys/second.  That's the equivalent performance of at least a handful
> of desktop GPUs (if not more) combined.  It's a lot more cost-effective
> to buy beefy servers with really beefy multi-pipe GPU configurations and
> provision the servers to handle 40 or 50 users.  You can't do that if
> each user has a dedicated GPU, because you can't install 40 or 50
> dedicated GPUs into a single machine.
>
> (3) Efficiency and cost.  VirtualGL and TurboVNC are only going to take
> up resources when they are running.  A full-blown X server has a much
> larger footprint.  The screen scraper will eat up CPU cycles even if the
> 3D application is sitting there doing nothing, because the screen
> scraper is having to poll for changes in the pixels.
> TurboVNC/VirtualGL, on the other hand, will not take up CPU cycles
> unless the 3D application is actually drawing something.  Furthermore,
> if the user goes to lunch, their GPU is now sitting completely idle.  If
> the user only needs to process a 50,000-polygon model, then their
> dedicated GPU is being grossly underutilized.
>
>
> On 10/2/14 12:36 PM, Göbbert, Jens Henrik wrote:
>> Hi VirtualGL,
>>
>> I am using virtualGL since some years now to get 3d-accelerated remote
>> visualization possible via TurboVNC on our frond-nodes of a
>> compute-cluster via.
>>
>> Just recently I was asked why remote 3d-accelerated desktop scenario is
>> not possible with multiple 3d-accelerated X-servers+VNC, each dedicated
>> to a single user?
>>
>> I cannot answer this question as I would like to, as it seems to run fine:
>>
>> We tested to run multiple 3d-accelerated X-servers on the same machine
>> with a single GPU without any problems.
>>
>> glxgears showed 600 frames per second on both at the same time –> both
>> X-server where 3d-accelerated.
>>
>> Why shouldn´t I go for multiple 3D-X-server (one for each user)
>>
>> and send its framebuffer via VNC to the workstations
>>
>> instead of using VirtualGL?
>>
>> Best,
>>
>> Jens Henrik
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> VirtualGL-Users mailing list
> Vir...@li...
> https://lists.sourceforge.net/lists/listinfo/virtualgl-users
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> VirtualGL-Users mailing list
> Vir...@li...
> https://lists.sourceforge.net/lists/listinfo/virtualgl-users
>

Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

From: Göbbert, J. H. <goe...@vr...> - 2014-10-03 05:53:54

Yes, that link gives just the answer. Thanks.
I was searching the internet with the wrong keywords like "virtualGL vs. multiple remote 3D X-servers" or simular.

best,
Jens Henrik
________________________________________
From: DRC [dco...@us...]
Sent: Friday, October 03, 2014 7:45 AM
To: vir...@li...
Subject: Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers

A lot of what I posted here is just a more nuts-and-bolts version of the
same information that is provided in the background article:

http://www.virtualgl.org/About/Background

The nuts and bolts change over time, and I don't really want to have to
re-write that article every time they do.  The basic 10,000-foot
managerial explanation has not changed in 10 years, and it's basically
this:  VirtualGL enables multi-user GPU sharing/load balancing, whereas
the screen scraping/dedicated GPU approach doesn't.


On 10/3/14 12:13 AM, Göbbert, Jens Henrik wrote:
> Hi VirtualGL,
>
> thanks for your detailed answer - we searched, but could not find a good explanaition like this:
>
>> Efficiency and cost.  VirtualGL and TurboVNC are only going to take
>> up resources when they are running.  A full-blown X server has a much
>> larger footprint.  The screen scraper will eat up CPU cycles even if the
>> 3D application is sitting there doing nothing, because the screen
>> scraper is having to poll for changes in the pixels.
>
>   best,
> Jens Henrik
>
> P.S.: You might want mention the 'screen scraper'-approach in the documentation/introduction/wiki and compare it with VirtualGL.
>
> ________________________________________
> From: DRC [dco...@us...]
> Sent: Thursday, October 02, 2014 10:26 PM
> To: vir...@li...
> Subject: Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers
>
> For starters, GLXgears is not a GPU benchmark.  It is a CPU benchmark,
> because its geometry and window size are so small that its frame rate is
> almost entirely dependent upon CPU overhead.  Please (and I'm saying
> this to the entire 3D application community, not just you) stop quoting
> GLXgears frame rates as if they have any relevance to GPU performance.
>
> GLXspheres (which is provided with VirtualGL) is a much better solution
> if you need something quick & dirty, but you also have to understand
> what it is you're benchmarking.  GLXspheres is designed primarily as an
> image benchmark for remote display systems, so it is meant to be limited
> by the drawing speed of the remote display solution, not by the 3D
> rendering speed.  On the K5000 that nVidia was kind enough to send me
> for testing, I can literally max out the geometry size on GLXspheres--
> over a billion polys-- and it keeps chugging along at 300 fps, because
> it's using display lists by default (and thus, once the geometry is
> downloaded once to the GPU, subsequent frames just instruct the GPU to
> reuse that same geometry.)
>
> Not every app uses display lists, though, so if you want to use
> GLXspheres as a quick & dirty OpenGL pipeline benchmark, then I suggest
> boosting its geometry size to 500,000 or a million polygons and enabling
> immediate mode (-m -p 500000).  This will give a better sense of what a
> "busy" immediate-mode OpenGL app might do.
>
> When your benchmark is running at hundreds of frames per second, that's
> a clue that it isn't testing anything resembling a real-world use case.
>    In the real world, you're never going to see more than 60 fps because
> of your monitor's refresh rate, and most humans can't perceive any
> difference after 25-30 fps.  In real-world visualization scenarios, if
> things get too fast, then the engineers will just start using larger
> (more accurate) models.  :)
>
> So why would you use VirtualGL?  Several reasons:
>
> (1) The approach you're describing, in which multiple 3D X servers are
> served up with VNC, requires screen scraping.  Screen scraping
> periodically reads the pixels on the framebuffer and compares them
> against a snapshot of the pixels taken earlier.  There are some
> solutions-- the RealVNC/TigerVNC X.org module and x11vnc, for instance--
> that are a little more sophisticated than just a plain screen scraper.
> They use the X Damage extension and other techniques to get hints as to
> which part of the display to read back, but these techniques don't work
> well (or sometimes at all) with hardware-accelerated 3D.  Either the
> OpenGL pixels don't update at all, or OpenGL drawing is out of sync with
> the delivery of pixels to the client (and thus you get tearing artifacts.)
>
> I personally tested the version of x11vnc that ships with libvncserver
> 0.9.9 (libvncserver 0.9.9 has the TurboVNC extensions, so at the library
> level at least, it's a fast solution.)  I observed bad tearing artifacts
> for a few seconds, and then it would hang because the X server got too
> busy processing the 3D drawing and couldn't spare any cycles for x11vnc
> (X servers are single-threaded.)  Turning off X Damage support in x11vnc
> made the solution at least usable, but without X Damage support, x11vnc
> is mainly just polling the display, so it will incur a lot of overhead.
>    This became particularly evident when using interactive apps.
> glxspheres -i
>
> I couldn't get the TigerVNC 1.3.1 X.org module to work at all, and the
> TigerVNC 1.1.0 X.org module (the one that ships with RHEL 6) did not
> display any pixels from the OpenGL app.
>
> (2) The ability to share a GPU among multiple users.  VirtualGL
> installations often have dozens of users sharing the GPU, because not
> all of them will be using it simultaneously, and even when they are,
> they might only need to process a small model that uses 1% of the GPU's
> capacity.  Like I said above, a K5000 pipe can process billions of
> polys/second.  That's the equivalent performance of at least a handful
> of desktop GPUs (if not more) combined.  It's a lot more cost-effective
> to buy beefy servers with really beefy multi-pipe GPU configurations and
> provision the servers to handle 40 or 50 users.  You can't do that if
> each user has a dedicated GPU, because you can't install 40 or 50
> dedicated GPUs into a single machine.
>
> (3) Efficiency and cost.  VirtualGL and TurboVNC are only going to take
> up resources when they are running.  A full-blown X server has a much
> larger footprint.  The screen scraper will eat up CPU cycles even if the
> 3D application is sitting there doing nothing, because the screen
> scraper is having to poll for changes in the pixels.
> TurboVNC/VirtualGL, on the other hand, will not take up CPU cycles
> unless the 3D application is actually drawing something.  Furthermore,
> if the user goes to lunch, their GPU is now sitting completely idle.  If
> the user only needs to process a 50,000-polygon model, then their
> dedicated GPU is being grossly underutilized.
>
>
> On 10/2/14 12:36 PM, Göbbert, Jens Henrik wrote:
>> Hi VirtualGL,
>>
>> I am using virtualGL since some years now to get 3d-accelerated remote
>> visualization possible via TurboVNC on our frond-nodes of a
>> compute-cluster via.
>>
>> Just recently I was asked why remote 3d-accelerated desktop scenario is
>> not possible with multiple 3d-accelerated X-servers+VNC, each dedicated
>> to a single user?
>>
>> I cannot answer this question as I would like to, as it seems to run fine:
>>
>> We tested to run multiple 3d-accelerated X-servers on the same machine
>> with a single GPU without any problems.
>>
>> glxgears showed 600 frames per second on both at the same time –> both
>> X-server where 3d-accelerated.
>>
>> Why shouldn´t I go for multiple 3D-X-server (one for each user)
>>
>> and send its framebuffer via VNC to the workstations
>>
>> instead of using VirtualGL?
>>
>> Best,
>>
>> Jens Henrik
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> VirtualGL-Users mailing list
> Vir...@li...
> https://lists.sourceforge.net/lists/listinfo/virtualgl-users
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> VirtualGL-Users mailing list
> Vir...@li...
> https://lists.sourceforge.net/lists/listinfo/virtualgl-users
>

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
VirtualGL-Users mailing list
Vir...@li...
https://lists.sourceforge.net/lists/listinfo/virtualgl-users