Thread: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers
3D Without Boundaries
Brought to you by:
dcommander
|
From: Göbbert, J. H. <goe...@vr...> - 2014-10-02 18:11:45
|
Hi VirtualGL, I am using virtualGL since some years now to get 3d-accelerated remote visualization possible via TurboVNC on our frond-nodes of a compute-cluster via. Just recently I was asked why remote 3d-accelerated desktop scenario is not possible with multiple 3d-accelerated X-servers+VNC, each dedicated to a single user? I cannot answer this question as I would like to, as it seems to run fine: We tested to run multiple 3d-accelerated X-servers on the same machine with a single GPU without any problems. glxgears showed 600 frames per second on both at the same time -> both X-server where 3d-accelerated. Why shouldn´t I go for multiple 3D-X-server (one for each user) and send its framebuffer via VNC to the workstations instead of using VirtualGL? Best, Jens Henrik -- Dipl.-Ing. Jens Henrik Göbbert IT Center - Computational Science & Engineering Computer Science Department - Virtual Reality Group Jülich Aachen Research Alliance - JARA-HPC IT Center RWTH Aachen University Seffenter Weg 23 52074 Aachen, Germany Phone: +49 241 80-24381 goe...@vr...<mailto:goe...@vr...> www.vr.rwth-aachen.de<http://www.vr.rwth-aachen.de> http://www.jara.org<http://www.jara.org/> |
|
From: Robert G. <ra...@rd...> - 2014-10-02 19:33:33
|
Someone please correct me if I am wrong here but isn't 600 fps low for 3D? I get that same performance from software GLX from TigerVNC. When running the same tests on an actual hardware X server with 3D acceleration, I numbers in the thousands. The 600 fps is what I get from using CPU resources to simulate 3D operations instead of a GPU actually doing the rendering operations. On Thu, Oct 2, 2014 at 1:36 PM, Göbbert, Jens Henrik < goe...@vr...> wrote: > Hi VirtualGL, > > > > I am using virtualGL since some years now to get 3d-accelerated remote > visualization possible via TurboVNC on our frond-nodes of a compute-cluster > via. > > Just recently I was asked why remote 3d-accelerated desktop scenario is > not possible with multiple 3d-accelerated X-servers+VNC, each dedicated to > a single user? > > > > I cannot answer this question as I would like to, as it seems to run fine: > > We tested to run multiple 3d-accelerated X-servers on the same machine > with a single GPU without any problems. > > glxgears showed 600 frames per second on both at the same time -> both > X-server where 3d-accelerated. > > > > Why shouldn´t I go for multiple 3D-X-server (one for each user) > > and send its framebuffer via VNC to the workstations > > instead of using VirtualGL? > > > > Best, > > Jens Henrik > > > > -- > > Dipl.-Ing. Jens Henrik Göbbert > > > > IT Center - Computational Science & Engineering > > Computer Science Department - Virtual Reality Group > > Jülich Aachen Research Alliance - JARA-HPC > > > > IT Center > > RWTH Aachen University > > Seffenter Weg 23 > > 52074 Aachen, Germany > > Phone: +49 241 80-24381 > > goe...@vr... > > www.vr.rwth-aachen.de > > http://www.jara.org > > > > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Users mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-users > > |
|
From: DRC <dco...@us...> - 2014-10-02 20:26:55
|
For starters, GLXgears is not a GPU benchmark. It is a CPU benchmark, because its geometry and window size are so small that its frame rate is almost entirely dependent upon CPU overhead. Please (and I'm saying this to the entire 3D application community, not just you) stop quoting GLXgears frame rates as if they have any relevance to GPU performance. GLXspheres (which is provided with VirtualGL) is a much better solution if you need something quick & dirty, but you also have to understand what it is you're benchmarking. GLXspheres is designed primarily as an image benchmark for remote display systems, so it is meant to be limited by the drawing speed of the remote display solution, not by the 3D rendering speed. On the K5000 that nVidia was kind enough to send me for testing, I can literally max out the geometry size on GLXspheres-- over a billion polys-- and it keeps chugging along at 300 fps, because it's using display lists by default (and thus, once the geometry is downloaded once to the GPU, subsequent frames just instruct the GPU to reuse that same geometry.) Not every app uses display lists, though, so if you want to use GLXspheres as a quick & dirty OpenGL pipeline benchmark, then I suggest boosting its geometry size to 500,000 or a million polygons and enabling immediate mode (-m -p 500000). This will give a better sense of what a "busy" immediate-mode OpenGL app might do. When your benchmark is running at hundreds of frames per second, that's a clue that it isn't testing anything resembling a real-world use case. In the real world, you're never going to see more than 60 fps because of your monitor's refresh rate, and most humans can't perceive any difference after 25-30 fps. In real-world visualization scenarios, if things get too fast, then the engineers will just start using larger (more accurate) models. :) So why would you use VirtualGL? Several reasons: (1) The approach you're describing, in which multiple 3D X servers are served up with VNC, requires screen scraping. Screen scraping periodically reads the pixels on the framebuffer and compares them against a snapshot of the pixels taken earlier. There are some solutions-- the RealVNC/TigerVNC X.org module and x11vnc, for instance-- that are a little more sophisticated than just a plain screen scraper. They use the X Damage extension and other techniques to get hints as to which part of the display to read back, but these techniques don't work well (or sometimes at all) with hardware-accelerated 3D. Either the OpenGL pixels don't update at all, or OpenGL drawing is out of sync with the delivery of pixels to the client (and thus you get tearing artifacts.) I personally tested the version of x11vnc that ships with libvncserver 0.9.9 (libvncserver 0.9.9 has the TurboVNC extensions, so at the library level at least, it's a fast solution.) I observed bad tearing artifacts for a few seconds, and then it would hang because the X server got too busy processing the 3D drawing and couldn't spare any cycles for x11vnc (X servers are single-threaded.) Turning off X Damage support in x11vnc made the solution at least usable, but without X Damage support, x11vnc is mainly just polling the display, so it will incur a lot of overhead. This became particularly evident when using interactive apps. glxspheres -i I couldn't get the TigerVNC 1.3.1 X.org module to work at all, and the TigerVNC 1.1.0 X.org module (the one that ships with RHEL 6) did not display any pixels from the OpenGL app. (2) The ability to share a GPU among multiple users. VirtualGL installations often have dozens of users sharing the GPU, because not all of them will be using it simultaneously, and even when they are, they might only need to process a small model that uses 1% of the GPU's capacity. Like I said above, a K5000 pipe can process billions of polys/second. That's the equivalent performance of at least a handful of desktop GPUs (if not more) combined. It's a lot more cost-effective to buy beefy servers with really beefy multi-pipe GPU configurations and provision the servers to handle 40 or 50 users. You can't do that if each user has a dedicated GPU, because you can't install 40 or 50 dedicated GPUs into a single machine. (3) Efficiency and cost. VirtualGL and TurboVNC are only going to take up resources when they are running. A full-blown X server has a much larger footprint. The screen scraper will eat up CPU cycles even if the 3D application is sitting there doing nothing, because the screen scraper is having to poll for changes in the pixels. TurboVNC/VirtualGL, on the other hand, will not take up CPU cycles unless the 3D application is actually drawing something. Furthermore, if the user goes to lunch, their GPU is now sitting completely idle. If the user only needs to process a 50,000-polygon model, then their dedicated GPU is being grossly underutilized. On 10/2/14 12:36 PM, Göbbert, Jens Henrik wrote: > Hi VirtualGL, > > I am using virtualGL since some years now to get 3d-accelerated remote > visualization possible via TurboVNC on our frond-nodes of a > compute-cluster via. > > Just recently I was asked why remote 3d-accelerated desktop scenario is > not possible with multiple 3d-accelerated X-servers+VNC, each dedicated > to a single user? > > I cannot answer this question as I would like to, as it seems to run fine: > > We tested to run multiple 3d-accelerated X-servers on the same machine > with a single GPU without any problems. > > glxgears showed 600 frames per second on both at the same time –> both > X-server where 3d-accelerated. > > Why shouldn´t I go for multiple 3D-X-server (one for each user) > > and send its framebuffer via VNC to the workstations > > instead of using VirtualGL? > > Best, > > Jens Henrik |
|
From: Nathan K. <nat...@sp...> - 2014-10-02 22:14:00
|
On 02/10/14 04:26 PM, DRC wrote: > On the K5000 that nVidia was kind enough to send me > for testing, I can literally max out the geometry size on GLXspheres-- > over a billion polys-- and it keeps chugging along at 300 fps, because > it's using display lists by default (and thus, once the geometry is > downloaded once to the GPU, subsequent frames just instruct the GPU to > reuse that same geometry.) FYI I recently was testing the theoretical limit on a card and went down the path of: `glxspheres -p 1000000` "no difference" `glxspheres -p 10000000` "hmmm, not breaking a sweat" `glxspheres -p 1000000000` "wow" Then I took a trace and found out that the number of actual ROPs was no different between 10 million and 1 billion. gluSphere() apparently hits a limit on how much geometry it produces and won't go higher (increasing window size didn't do anything; I didn't read the GLU source). Bottom line: `glxspheres -p 3500000` (which equates to a little over 14 millon ROPs per frame) is the highest load the stock glxspheres/libGLU will produce. -Nathan -- Nathan Kidd OpenText Connectivity Solutions nk...@op... Software Developer http://connectivity.opentext.com +1 905-762-6001 |
|
From: DRC <dco...@us...> - 2014-10-03 01:41:56
|
Well, gee, that makes a lot more sense. Dean, if you're reading this, the last column in that chart I sent you a year ago should read "3.5 million" and not "10 million." :| I made a note about that in the usage screen of GLXspheres. I think I should further modify it so that it allows the sphere count to be adjusted. It was kept static in order to guarantee that the same image was always generated. Note that there are actually 61 spheres in the default configuration (20 per ring + the 1 in the center), so apparently the polygon limit is around 60,000 per sphere. It might simply be that the polygon count is clamped to a 16-bit value or something. On 10/2/14 5:13 PM, Nathan Kidd wrote: > On 02/10/14 04:26 PM, DRC wrote: >> On the K5000 that nVidia was kind enough to send me >> for testing, I can literally max out the geometry size on GLXspheres-- >> over a billion polys-- and it keeps chugging along at 300 fps, because >> it's using display lists by default (and thus, once the geometry is >> downloaded once to the GPU, subsequent frames just instruct the GPU to >> reuse that same geometry.) > > FYI I recently was testing the theoretical limit on a card and went down > the path of: > `glxspheres -p 1000000` "no difference" > `glxspheres -p 10000000` "hmmm, not breaking a sweat" > `glxspheres -p 1000000000` "wow" > > Then I took a trace and found out that the number of actual ROPs was no > different between 10 million and 1 billion. gluSphere() apparently hits > a limit on how much geometry it produces and won't go higher (increasing > window size didn't do anything; I didn't read the GLU source). > > Bottom line: `glxspheres -p 3500000` (which equates to a little over 14 > millon ROPs per frame) is the highest load the stock glxspheres/libGLU > will produce. > > -Nathan > |
|
From: Nathan K. <nat...@sp...> - 2014-10-03 03:38:44
|
On 02/10/14 09:16 PM, DRC wrote:
> Note that there are actually 61 spheres in the default configuration (20
> per ring + the 1 in the center), so apparently the polygon limit is
> around 60,000 per sphere. It might simply be that the polygon count is
> clamped to a 16-bit value or something.
Ok, I won't be lazy. `apt-get source libglu1-mesa` and a little
LibreOffice Calc later the restriction is quite plain:
quad.c:
#define CACHE_SIZE 240
...
gluSphere(GLUquadric *qobj, GLdouble radius, GLint slices, GLint stacks)
if (slices >= CACHE_SIZE) slices = CACHE_SIZE-1;
if (stacks >= CACHE_SIZE) stacks = CACHE_SIZE-1;
And the solid drawing path is roughly:
(glVertex + glNormal) * (slice*2 + 2 + (stacks -2)*slices*2)
Thus max ROPs/sphere = 227532, * 61 spheres = 13879452 total ROPs.
Looks like the easiest path to increased poly count is to increase the
number of spheres.
-Nathan
--
Nathan Kidd OpenText Connectivity Solutions nk...@op...
Software Developer http://connectivity.opentext.com +1 905-762-6001
|
|
From: DRC <dco...@us...> - 2014-10-03 04:45:25
|
We crossed the streams. I discovered exactly the same thing (refer to previous message.) On 10/2/14 10:38 PM, Nathan Kidd wrote: > On 02/10/14 09:16 PM, DRC wrote: >> Note that there are actually 61 spheres in the default configuration (20 >> per ring + the 1 in the center), so apparently the polygon limit is >> around 60,000 per sphere. It might simply be that the polygon count is >> clamped to a 16-bit value or something. > > Ok, I won't be lazy. `apt-get source libglu1-mesa` and a little > LibreOffice Calc later the restriction is quite plain: > > quad.c: > #define CACHE_SIZE 240 > ... > gluSphere(GLUquadric *qobj, GLdouble radius, GLint slices, GLint stacks) > if (slices >= CACHE_SIZE) slices = CACHE_SIZE-1; > if (stacks >= CACHE_SIZE) stacks = CACHE_SIZE-1; > > And the solid drawing path is roughly: > (glVertex + glNormal) * (slice*2 + 2 + (stacks -2)*slices*2) > > Thus max ROPs/sphere = 227532, * 61 spheres = 13879452 total ROPs. > > Looks like the easiest path to increased poly count is to increase the > number of spheres. > > -Nathan > |
|
From: DRC <dco...@us...> - 2014-10-03 04:44:35
|
Confirmed in the libGLU source-- slices and stacks are clamped at 240, so 57600 polys per sphere is the max-- very nicely fits your observation (57600 * 61 spheres = 3513600 polys.) I modified GLXspheres such that it calculates whether this limit will be reached, warns the user, and prints the actual polygon count, taking the limit into account. I also added an option (-n) for increasing the sphere count, which enables polygon counts higher than 3.5 million. For instance: >glxspheres -n 240 -p 10000000 Polygons in scene: 10029456 (241 spheres * 41616 polys/spheres) Visual ID of window: 0x2c Context is Direct OpenGL Renderer: Quadro K5000/PCIe/SSE2 92.948421 frames/sec - 103.730438 Mpixels/sec 95.584804 frames/sec - 106.672641 Mpixels/sec > glxspheres -n 2400 -p 100000000 Polygons in scene: 99920016 (2401 spheres * 41616 polys/spheres) Visual ID of window: 0x2c Context is Direct OpenGL Renderer: Quadro K5000/PCIe/SSE2 10.136359 frames/sec - 11.312176 Mpixels/sec 9.982275 frames/sec - 11.140219 Mpixels/sec Much more along the lines of what I would expect from the K5000 (about a billion quads/second. The press usually reports it at 1.8 billion tris/sec, so that number make sense.) To pop the stack on the original poster's questions, at the OpenGL level, you can get linear or even super-linear scaling of the GPU resource among multiple users. If I run 5 sessions of GLXspheres at a time, each will perform at about 200 million quads/second. If I run 10, each will perform at about 100 million quads/second. If each user is working with a 1-million-polygon model, then that's over 30 users at 30 frames/second. Obviously there will be other constraints on this in a real-world environment-- VirtualGL and TurboVNC have some CPU overhead to compress/deliver the 3D images to the client, users might be dealing with larger models, applications that use a lot of textures won't scale as well because they'll exhaust GPU memory, etc. However, you're also not going to have all 30 users banging away all the time. Some of them will be down the hall, some of them will be reading e-mail, some of them won't even be in the office, some will be staring at the model and making small changes rather than manipulating the entire scene. On 10/2/14 5:13 PM, Nathan Kidd wrote: > On 02/10/14 04:26 PM, DRC wrote: >> On the K5000 that nVidia was kind enough to send me >> for testing, I can literally max out the geometry size on GLXspheres-- >> over a billion polys-- and it keeps chugging along at 300 fps, because >> it's using display lists by default (and thus, once the geometry is >> downloaded once to the GPU, subsequent frames just instruct the GPU to >> reuse that same geometry.) > > FYI I recently was testing the theoretical limit on a card and went down > the path of: > `glxspheres -p 1000000` "no difference" > `glxspheres -p 10000000` "hmmm, not breaking a sweat" > `glxspheres -p 1000000000` "wow" > > Then I took a trace and found out that the number of actual ROPs was no > different between 10 million and 1 billion. gluSphere() apparently hits > a limit on how much geometry it produces and won't go higher (increasing > window size didn't do anything; I didn't read the GLU source). > > Bottom line: `glxspheres -p 3500000` (which equates to a little over 14 > millon ROPs per frame) is the highest load the stock glxspheres/libGLU > will produce. > > -Nathan > |
|
From: DRC <dco...@us...> - 2014-10-03 04:51:06
|
On 10/2/14 11:44 PM, DRC wrote: > To pop the stack on the original poster's questions, at the OpenGL > level, you can get linear or even super-linear scaling of the GPU > resource among multiple users. If I run 5 sessions of GLXspheres at a > time, each will perform at about 200 million quads/second. If I run 10, > each will perform at about 100 million quads/second. If each user is > working with a 1-million-polygon model, then that's over 30 users at 30 > frames/second. Obviously there will be other constraints on this in a > real-world environment-- VirtualGL and TurboVNC have some CPU overhead > to compress/deliver the 3D images to the client, users might be dealing > with larger models, applications that use a lot of textures won't scale > as well because they'll exhaust GPU memory, etc. However, you're also > not going to have all 30 users banging away all the time. Some of them > will be down the hall, some of them will be reading e-mail, some of them > won't even be in the office, some will be staring at the model and > making small changes rather than manipulating the entire scene. I should also mention that another constraint you'll have in a real-world environment is reading back the pixels, and you may exhaust your bus bandwidth before you actually exhaust your GPU processing power. But my point is-- people throw quite a few users onto their GPUs. Santos, one of our largest (if not our largest) installations, provisions about 13-16 users per high-end nVidia pipe, although the user workloads vary greatly (oil & gas apps run the gamut of everything from straight 2D X11 all the way to monster 3D visualization.) > On 10/2/14 5:13 PM, Nathan Kidd wrote: >> On 02/10/14 04:26 PM, DRC wrote: >>> On the K5000 that nVidia was kind enough to send me >>> for testing, I can literally max out the geometry size on GLXspheres-- >>> over a billion polys-- and it keeps chugging along at 300 fps, because >>> it's using display lists by default (and thus, once the geometry is >>> downloaded once to the GPU, subsequent frames just instruct the GPU to >>> reuse that same geometry.) >> >> FYI I recently was testing the theoretical limit on a card and went down >> the path of: >> `glxspheres -p 1000000` "no difference" >> `glxspheres -p 10000000` "hmmm, not breaking a sweat" >> `glxspheres -p 1000000000` "wow" >> >> Then I took a trace and found out that the number of actual ROPs was no >> different between 10 million and 1 billion. gluSphere() apparently hits >> a limit on how much geometry it produces and won't go higher (increasing >> window size didn't do anything; I didn't read the GLU source). >> >> Bottom line: `glxspheres -p 3500000` (which equates to a little over 14 >> millon ROPs per frame) is the highest load the stock glxspheres/libGLU >> will produce. >> >> -Nathan >> |
|
From: Göbbert, J. H. <goe...@vr...> - 2014-10-03 05:14:00
|
Hi VirtualGL, thanks for your detailed answer - we searched, but could not find a good explanaition like this: > Efficiency and cost. VirtualGL and TurboVNC are only going to take > up resources when they are running. A full-blown X server has a much > larger footprint. The screen scraper will eat up CPU cycles even if the > 3D application is sitting there doing nothing, because the screen > scraper is having to poll for changes in the pixels. best, Jens Henrik P.S.: You might want mention the 'screen scraper'-approach in the documentation/introduction/wiki and compare it with VirtualGL. ________________________________________ From: DRC [dco...@us...] Sent: Thursday, October 02, 2014 10:26 PM To: vir...@li... Subject: Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers For starters, GLXgears is not a GPU benchmark. It is a CPU benchmark, because its geometry and window size are so small that its frame rate is almost entirely dependent upon CPU overhead. Please (and I'm saying this to the entire 3D application community, not just you) stop quoting GLXgears frame rates as if they have any relevance to GPU performance. GLXspheres (which is provided with VirtualGL) is a much better solution if you need something quick & dirty, but you also have to understand what it is you're benchmarking. GLXspheres is designed primarily as an image benchmark for remote display systems, so it is meant to be limited by the drawing speed of the remote display solution, not by the 3D rendering speed. On the K5000 that nVidia was kind enough to send me for testing, I can literally max out the geometry size on GLXspheres-- over a billion polys-- and it keeps chugging along at 300 fps, because it's using display lists by default (and thus, once the geometry is downloaded once to the GPU, subsequent frames just instruct the GPU to reuse that same geometry.) Not every app uses display lists, though, so if you want to use GLXspheres as a quick & dirty OpenGL pipeline benchmark, then I suggest boosting its geometry size to 500,000 or a million polygons and enabling immediate mode (-m -p 500000). This will give a better sense of what a "busy" immediate-mode OpenGL app might do. When your benchmark is running at hundreds of frames per second, that's a clue that it isn't testing anything resembling a real-world use case. In the real world, you're never going to see more than 60 fps because of your monitor's refresh rate, and most humans can't perceive any difference after 25-30 fps. In real-world visualization scenarios, if things get too fast, then the engineers will just start using larger (more accurate) models. :) So why would you use VirtualGL? Several reasons: (1) The approach you're describing, in which multiple 3D X servers are served up with VNC, requires screen scraping. Screen scraping periodically reads the pixels on the framebuffer and compares them against a snapshot of the pixels taken earlier. There are some solutions-- the RealVNC/TigerVNC X.org module and x11vnc, for instance-- that are a little more sophisticated than just a plain screen scraper. They use the X Damage extension and other techniques to get hints as to which part of the display to read back, but these techniques don't work well (or sometimes at all) with hardware-accelerated 3D. Either the OpenGL pixels don't update at all, or OpenGL drawing is out of sync with the delivery of pixels to the client (and thus you get tearing artifacts.) I personally tested the version of x11vnc that ships with libvncserver 0.9.9 (libvncserver 0.9.9 has the TurboVNC extensions, so at the library level at least, it's a fast solution.) I observed bad tearing artifacts for a few seconds, and then it would hang because the X server got too busy processing the 3D drawing and couldn't spare any cycles for x11vnc (X servers are single-threaded.) Turning off X Damage support in x11vnc made the solution at least usable, but without X Damage support, x11vnc is mainly just polling the display, so it will incur a lot of overhead. This became particularly evident when using interactive apps. glxspheres -i I couldn't get the TigerVNC 1.3.1 X.org module to work at all, and the TigerVNC 1.1.0 X.org module (the one that ships with RHEL 6) did not display any pixels from the OpenGL app. (2) The ability to share a GPU among multiple users. VirtualGL installations often have dozens of users sharing the GPU, because not all of them will be using it simultaneously, and even when they are, they might only need to process a small model that uses 1% of the GPU's capacity. Like I said above, a K5000 pipe can process billions of polys/second. That's the equivalent performance of at least a handful of desktop GPUs (if not more) combined. It's a lot more cost-effective to buy beefy servers with really beefy multi-pipe GPU configurations and provision the servers to handle 40 or 50 users. You can't do that if each user has a dedicated GPU, because you can't install 40 or 50 dedicated GPUs into a single machine. (3) Efficiency and cost. VirtualGL and TurboVNC are only going to take up resources when they are running. A full-blown X server has a much larger footprint. The screen scraper will eat up CPU cycles even if the 3D application is sitting there doing nothing, because the screen scraper is having to poll for changes in the pixels. TurboVNC/VirtualGL, on the other hand, will not take up CPU cycles unless the 3D application is actually drawing something. Furthermore, if the user goes to lunch, their GPU is now sitting completely idle. If the user only needs to process a 50,000-polygon model, then their dedicated GPU is being grossly underutilized. On 10/2/14 12:36 PM, Göbbert, Jens Henrik wrote: > Hi VirtualGL, > > I am using virtualGL since some years now to get 3d-accelerated remote > visualization possible via TurboVNC on our frond-nodes of a > compute-cluster via. > > Just recently I was asked why remote 3d-accelerated desktop scenario is > not possible with multiple 3d-accelerated X-servers+VNC, each dedicated > to a single user? > > I cannot answer this question as I would like to, as it seems to run fine: > > We tested to run multiple 3d-accelerated X-servers on the same machine > with a single GPU without any problems. > > glxgears showed 600 frames per second on both at the same time –> both > X-server where 3d-accelerated. > > Why shouldn´t I go for multiple 3D-X-server (one for each user) > > and send its framebuffer via VNC to the workstations > > instead of using VirtualGL? > > Best, > > Jens Henrik ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ VirtualGL-Users mailing list Vir...@li... https://lists.sourceforge.net/lists/listinfo/virtualgl-users |
|
From: DRC <dco...@us...> - 2014-10-03 05:45:40
|
A lot of what I posted here is just a more nuts-and-bolts version of the same information that is provided in the background article: http://www.virtualgl.org/About/Background The nuts and bolts change over time, and I don't really want to have to re-write that article every time they do. The basic 10,000-foot managerial explanation has not changed in 10 years, and it's basically this: VirtualGL enables multi-user GPU sharing/load balancing, whereas the screen scraping/dedicated GPU approach doesn't. On 10/3/14 12:13 AM, Göbbert, Jens Henrik wrote: > Hi VirtualGL, > > thanks for your detailed answer - we searched, but could not find a good explanaition like this: > >> Efficiency and cost. VirtualGL and TurboVNC are only going to take >> up resources when they are running. A full-blown X server has a much >> larger footprint. The screen scraper will eat up CPU cycles even if the >> 3D application is sitting there doing nothing, because the screen >> scraper is having to poll for changes in the pixels. > > best, > Jens Henrik > > P.S.: You might want mention the 'screen scraper'-approach in the documentation/introduction/wiki and compare it with VirtualGL. > > ________________________________________ > From: DRC [dco...@us...] > Sent: Thursday, October 02, 2014 10:26 PM > To: vir...@li... > Subject: Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers > > For starters, GLXgears is not a GPU benchmark. It is a CPU benchmark, > because its geometry and window size are so small that its frame rate is > almost entirely dependent upon CPU overhead. Please (and I'm saying > this to the entire 3D application community, not just you) stop quoting > GLXgears frame rates as if they have any relevance to GPU performance. > > GLXspheres (which is provided with VirtualGL) is a much better solution > if you need something quick & dirty, but you also have to understand > what it is you're benchmarking. GLXspheres is designed primarily as an > image benchmark for remote display systems, so it is meant to be limited > by the drawing speed of the remote display solution, not by the 3D > rendering speed. On the K5000 that nVidia was kind enough to send me > for testing, I can literally max out the geometry size on GLXspheres-- > over a billion polys-- and it keeps chugging along at 300 fps, because > it's using display lists by default (and thus, once the geometry is > downloaded once to the GPU, subsequent frames just instruct the GPU to > reuse that same geometry.) > > Not every app uses display lists, though, so if you want to use > GLXspheres as a quick & dirty OpenGL pipeline benchmark, then I suggest > boosting its geometry size to 500,000 or a million polygons and enabling > immediate mode (-m -p 500000). This will give a better sense of what a > "busy" immediate-mode OpenGL app might do. > > When your benchmark is running at hundreds of frames per second, that's > a clue that it isn't testing anything resembling a real-world use case. > In the real world, you're never going to see more than 60 fps because > of your monitor's refresh rate, and most humans can't perceive any > difference after 25-30 fps. In real-world visualization scenarios, if > things get too fast, then the engineers will just start using larger > (more accurate) models. :) > > So why would you use VirtualGL? Several reasons: > > (1) The approach you're describing, in which multiple 3D X servers are > served up with VNC, requires screen scraping. Screen scraping > periodically reads the pixels on the framebuffer and compares them > against a snapshot of the pixels taken earlier. There are some > solutions-- the RealVNC/TigerVNC X.org module and x11vnc, for instance-- > that are a little more sophisticated than just a plain screen scraper. > They use the X Damage extension and other techniques to get hints as to > which part of the display to read back, but these techniques don't work > well (or sometimes at all) with hardware-accelerated 3D. Either the > OpenGL pixels don't update at all, or OpenGL drawing is out of sync with > the delivery of pixels to the client (and thus you get tearing artifacts.) > > I personally tested the version of x11vnc that ships with libvncserver > 0.9.9 (libvncserver 0.9.9 has the TurboVNC extensions, so at the library > level at least, it's a fast solution.) I observed bad tearing artifacts > for a few seconds, and then it would hang because the X server got too > busy processing the 3D drawing and couldn't spare any cycles for x11vnc > (X servers are single-threaded.) Turning off X Damage support in x11vnc > made the solution at least usable, but without X Damage support, x11vnc > is mainly just polling the display, so it will incur a lot of overhead. > This became particularly evident when using interactive apps. > glxspheres -i > > I couldn't get the TigerVNC 1.3.1 X.org module to work at all, and the > TigerVNC 1.1.0 X.org module (the one that ships with RHEL 6) did not > display any pixels from the OpenGL app. > > (2) The ability to share a GPU among multiple users. VirtualGL > installations often have dozens of users sharing the GPU, because not > all of them will be using it simultaneously, and even when they are, > they might only need to process a small model that uses 1% of the GPU's > capacity. Like I said above, a K5000 pipe can process billions of > polys/second. That's the equivalent performance of at least a handful > of desktop GPUs (if not more) combined. It's a lot more cost-effective > to buy beefy servers with really beefy multi-pipe GPU configurations and > provision the servers to handle 40 or 50 users. You can't do that if > each user has a dedicated GPU, because you can't install 40 or 50 > dedicated GPUs into a single machine. > > (3) Efficiency and cost. VirtualGL and TurboVNC are only going to take > up resources when they are running. A full-blown X server has a much > larger footprint. The screen scraper will eat up CPU cycles even if the > 3D application is sitting there doing nothing, because the screen > scraper is having to poll for changes in the pixels. > TurboVNC/VirtualGL, on the other hand, will not take up CPU cycles > unless the 3D application is actually drawing something. Furthermore, > if the user goes to lunch, their GPU is now sitting completely idle. If > the user only needs to process a 50,000-polygon model, then their > dedicated GPU is being grossly underutilized. > > > On 10/2/14 12:36 PM, Göbbert, Jens Henrik wrote: >> Hi VirtualGL, >> >> I am using virtualGL since some years now to get 3d-accelerated remote >> visualization possible via TurboVNC on our frond-nodes of a >> compute-cluster via. >> >> Just recently I was asked why remote 3d-accelerated desktop scenario is >> not possible with multiple 3d-accelerated X-servers+VNC, each dedicated >> to a single user? >> >> I cannot answer this question as I would like to, as it seems to run fine: >> >> We tested to run multiple 3d-accelerated X-servers on the same machine >> with a single GPU without any problems. >> >> glxgears showed 600 frames per second on both at the same time –> both >> X-server where 3d-accelerated. >> >> Why shouldn´t I go for multiple 3D-X-server (one for each user) >> >> and send its framebuffer via VNC to the workstations >> >> instead of using VirtualGL? >> >> Best, >> >> Jens Henrik > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Users mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-users > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Users mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-users > |
|
From: Göbbert, J. H. <goe...@vr...> - 2014-10-03 05:53:54
|
Yes, that link gives just the answer. Thanks. I was searching the internet with the wrong keywords like "virtualGL vs. multiple remote 3D X-servers" or simular. best, Jens Henrik ________________________________________ From: DRC [dco...@us...] Sent: Friday, October 03, 2014 7:45 AM To: vir...@li... Subject: Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers A lot of what I posted here is just a more nuts-and-bolts version of the same information that is provided in the background article: http://www.virtualgl.org/About/Background The nuts and bolts change over time, and I don't really want to have to re-write that article every time they do. The basic 10,000-foot managerial explanation has not changed in 10 years, and it's basically this: VirtualGL enables multi-user GPU sharing/load balancing, whereas the screen scraping/dedicated GPU approach doesn't. On 10/3/14 12:13 AM, Göbbert, Jens Henrik wrote: > Hi VirtualGL, > > thanks for your detailed answer - we searched, but could not find a good explanaition like this: > >> Efficiency and cost. VirtualGL and TurboVNC are only going to take >> up resources when they are running. A full-blown X server has a much >> larger footprint. The screen scraper will eat up CPU cycles even if the >> 3D application is sitting there doing nothing, because the screen >> scraper is having to poll for changes in the pixels. > > best, > Jens Henrik > > P.S.: You might want mention the 'screen scraper'-approach in the documentation/introduction/wiki and compare it with VirtualGL. > > ________________________________________ > From: DRC [dco...@us...] > Sent: Thursday, October 02, 2014 10:26 PM > To: vir...@li... > Subject: Re: [VirtualGL-Users] virtualGL vs. multiple remote 3D X-servers > > For starters, GLXgears is not a GPU benchmark. It is a CPU benchmark, > because its geometry and window size are so small that its frame rate is > almost entirely dependent upon CPU overhead. Please (and I'm saying > this to the entire 3D application community, not just you) stop quoting > GLXgears frame rates as if they have any relevance to GPU performance. > > GLXspheres (which is provided with VirtualGL) is a much better solution > if you need something quick & dirty, but you also have to understand > what it is you're benchmarking. GLXspheres is designed primarily as an > image benchmark for remote display systems, so it is meant to be limited > by the drawing speed of the remote display solution, not by the 3D > rendering speed. On the K5000 that nVidia was kind enough to send me > for testing, I can literally max out the geometry size on GLXspheres-- > over a billion polys-- and it keeps chugging along at 300 fps, because > it's using display lists by default (and thus, once the geometry is > downloaded once to the GPU, subsequent frames just instruct the GPU to > reuse that same geometry.) > > Not every app uses display lists, though, so if you want to use > GLXspheres as a quick & dirty OpenGL pipeline benchmark, then I suggest > boosting its geometry size to 500,000 or a million polygons and enabling > immediate mode (-m -p 500000). This will give a better sense of what a > "busy" immediate-mode OpenGL app might do. > > When your benchmark is running at hundreds of frames per second, that's > a clue that it isn't testing anything resembling a real-world use case. > In the real world, you're never going to see more than 60 fps because > of your monitor's refresh rate, and most humans can't perceive any > difference after 25-30 fps. In real-world visualization scenarios, if > things get too fast, then the engineers will just start using larger > (more accurate) models. :) > > So why would you use VirtualGL? Several reasons: > > (1) The approach you're describing, in which multiple 3D X servers are > served up with VNC, requires screen scraping. Screen scraping > periodically reads the pixels on the framebuffer and compares them > against a snapshot of the pixels taken earlier. There are some > solutions-- the RealVNC/TigerVNC X.org module and x11vnc, for instance-- > that are a little more sophisticated than just a plain screen scraper. > They use the X Damage extension and other techniques to get hints as to > which part of the display to read back, but these techniques don't work > well (or sometimes at all) with hardware-accelerated 3D. Either the > OpenGL pixels don't update at all, or OpenGL drawing is out of sync with > the delivery of pixels to the client (and thus you get tearing artifacts.) > > I personally tested the version of x11vnc that ships with libvncserver > 0.9.9 (libvncserver 0.9.9 has the TurboVNC extensions, so at the library > level at least, it's a fast solution.) I observed bad tearing artifacts > for a few seconds, and then it would hang because the X server got too > busy processing the 3D drawing and couldn't spare any cycles for x11vnc > (X servers are single-threaded.) Turning off X Damage support in x11vnc > made the solution at least usable, but without X Damage support, x11vnc > is mainly just polling the display, so it will incur a lot of overhead. > This became particularly evident when using interactive apps. > glxspheres -i > > I couldn't get the TigerVNC 1.3.1 X.org module to work at all, and the > TigerVNC 1.1.0 X.org module (the one that ships with RHEL 6) did not > display any pixels from the OpenGL app. > > (2) The ability to share a GPU among multiple users. VirtualGL > installations often have dozens of users sharing the GPU, because not > all of them will be using it simultaneously, and even when they are, > they might only need to process a small model that uses 1% of the GPU's > capacity. Like I said above, a K5000 pipe can process billions of > polys/second. That's the equivalent performance of at least a handful > of desktop GPUs (if not more) combined. It's a lot more cost-effective > to buy beefy servers with really beefy multi-pipe GPU configurations and > provision the servers to handle 40 or 50 users. You can't do that if > each user has a dedicated GPU, because you can't install 40 or 50 > dedicated GPUs into a single machine. > > (3) Efficiency and cost. VirtualGL and TurboVNC are only going to take > up resources when they are running. A full-blown X server has a much > larger footprint. The screen scraper will eat up CPU cycles even if the > 3D application is sitting there doing nothing, because the screen > scraper is having to poll for changes in the pixels. > TurboVNC/VirtualGL, on the other hand, will not take up CPU cycles > unless the 3D application is actually drawing something. Furthermore, > if the user goes to lunch, their GPU is now sitting completely idle. If > the user only needs to process a 50,000-polygon model, then their > dedicated GPU is being grossly underutilized. > > > On 10/2/14 12:36 PM, Göbbert, Jens Henrik wrote: >> Hi VirtualGL, >> >> I am using virtualGL since some years now to get 3d-accelerated remote >> visualization possible via TurboVNC on our frond-nodes of a >> compute-cluster via. >> >> Just recently I was asked why remote 3d-accelerated desktop scenario is >> not possible with multiple 3d-accelerated X-servers+VNC, each dedicated >> to a single user? >> >> I cannot answer this question as I would like to, as it seems to run fine: >> >> We tested to run multiple 3d-accelerated X-servers on the same machine >> with a single GPU without any problems. >> >> glxgears showed 600 frames per second on both at the same time –> both >> X-server where 3d-accelerated. >> >> Why shouldn´t I go for multiple 3D-X-server (one for each user) >> >> and send its framebuffer via VNC to the workstations >> >> instead of using VirtualGL? >> >> Best, >> >> Jens Henrik > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Users mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-users > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > _______________________________________________ > VirtualGL-Users mailing list > Vir...@li... > https://lists.sourceforge.net/lists/listinfo/virtualgl-users > ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ VirtualGL-Users mailing list Vir...@li... https://lists.sourceforge.net/lists/listinfo/virtualgl-users |