Menu

#31 Java TurboVNC Viewer drawing performance very slow on Mac under Java 1.7+ but not Java 1.6

closed-fixed
DRC
None
5
2015-06-11
2014-09-21
DRC
No

When I run the Java TurboVNC Viewer under Java 6 (from the Apple Java Developer package), it draws at about 130 Mpixels/sec. When I run the same app under either Oracle JDK 7 or my own build of OpenJDK 7 (from the macosx-port-dev Mercurial branch), it draws at about 20 Mpixels/sec.

You can build enough of the app to demonstrate this issue by doing:

svn co svn://svn.code.sf.net/p/turbovnc/code/trunk/java java
cd java
javac com/turbovnc/vncviewer/ImageDrawTest.java
java -Dsun.java2d.trace=count -cp . com.turbovnc.vncviewer.ImageDrawTest

Results from Java 6:

Graphics device supports HW acceleration.
Window size: 1240 x 900
129.607164 Mpixels/sec
130.089516 Mpixels/sec
4768 calls to sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntArgbPre)

Results from Java 7:

Graphics device supports HW acceleration.
Window size: 1240 x 900
19.109939 Mpixels/sec
20.604652 Mpixels/sec
19.849651 Mpixels/sec
316 calls to sun.java2d.opengl.OGLSwToSurfaceBlit::Blit(IntRgb, AnyAlpha, "OpenGL Surface")
2 calls to sun.java2d.loops.Blit::Blit(IntRgb, SrcNoEa, IntRgb)
317 calls to sun.java2d.opengl.OGLRTTSurfaceToSurfaceBlit::Blit("OpenGL Surface (render-to-texture)", AnyAlpha, "OpenGL Surface")
6 calls to sun.java2d.loops.FillRect::FillRect(AnyColor, SrcNoEa, AnyInt)
641 total calls to 4 different primitives

It seems clear that Java 7 is using OpenGL by default to implement Java 2D drawing, and this path is apparently slow. I've reproduced this using a Mac Mini with nVidia GeForce 9400 and OS X 10.8.5, and a MacBook Pro with Intel HD Graphics 3000 and OS X 10.9.4 (the latter performed a bit better under Java 7, but it was still half as fast as under Java 6.)

There appears to be no way to get back the faster loops that were used in Java 6. I'd love to be able to just do -Dsun.java2d.opengl=false, but no luck there. If I could figure out how to make the OpenGL drawing faster, that would be acceptable as well.

Discussion

  • DRC

    DRC - 2015-01-29

    After extensive research, it was discovered that this is a legitimate issue with Java 1.7 & 1.8. Apparently the accelerated 2D drawing path (which was part of Apple's Java 1.6 but not Oracle Java 1.6) did not make it into Java 1.7, so Java 1.7 only supports OpenGL drawing on Macs.

    I was, however, able to make the OpenGL drawing significantly faster by using a BufferedImage of TYPE_INT_ARGB_PRE instead of TYPE_INT_RGB. The OpenGL blit loops in Java2D detect whether the source pixel format is alpha-enabled, and if it isn't, they will call

    glPixelTransferf(GL_ALPHA_SCALE, 0.0f);
    glPixelTransferf(GL_ALPHA_BIAS, 1.0f);

    prior to calling glDrawPixels(). This is instructing glDrawPixels() to individually zero out all of the alpha components and set them to 255, which is very slow on some GPUs. Instead, the TurboVNC Viewer can now work with an ARGB_PRE (which is really BGRA on little endian systems) back buffer and set the alpha values to opaque on its own.

    The code in trunk (2.0 evolving) will use a BGRA back buffer automatically if it detects a Mac running Java 1.7 or 1.8, and it will also use BGRA if it detects that -Dsun.java2d.opengl=true was passed on the command line (BGRA also accelerates OpenGL drawing on other platforms besides Mac.) A new undocumented parameter, ForceAlpha, can be used to override the default behavior.

     
  • DRC

    DRC - 2015-01-29
    • status: open-accepted --> closed-fixed
     
  • DRC

    DRC - 2015-01-29

    NOTE: This message seems to indicate that this is being addressed in Java 1.9:

    http://mail.openjdk.java.net/pipermail/2d-dev/2014-October/004870.html

     
  • DRC

    DRC - 2015-02-20

    Further clarifying remarks:

    There are basically two components to this issue:
    (1) raw blitting performance (which affects large image updates)
    (2) the high overhead of the OpenGL blitter (which affects small image updates)

    Without going into too many gorey details about the architecture of Java 2D (which, despite poring over the code, I'm still quite fuzzy about), it appears that, in general terms, the throughput of the Quartz-based blitter in Apple Java 1.6 was related to the number of pixels being drawn, but the performance of the OpenGL-based blitter in Java 1.7+ is related to the number of frames being drawn. Thus, if you are redrawing a whole remote desktop image, or if you are simply updating 1 pixel in the image, the OpenGL blitter always performs as if you are redrawing the whole image. As you can imagine, this has more of a detrimental effect on 2D applications than 3D applications, and low-level benchmarks conducted with the TurboVNC Viewer's built-in benchmark feature confirm this. Although using ARGB_PRE BufferedImages improves the situation dramatically, particularly with older Macs, the performance under Java 1.8 is still as much as 2x slower than Apple Java 1.6 for 3D applications and as much as an order of magnitude slower for 2D apps (because of the overhead mentioned above.)

    Also, in further discussions with the author of the post linked to above, the changes in Java 1.9 will speed up scaled image drawing (which will benefit the TurboVNC Viewer when desktop scaling is enabled), but they will not affect the default mode of operation of the TurboVNC Viewer. The issue still exists in the Java 1.9 pre-release builds. I engaged Sergey in a conversation about it and sent him the necessary benchmarks. He has reproduced the issue and is investigating.

    https://www.mail-archive.com/macosx-port-dev%40openjdk.java.net/msg00695.html

     

    Last edit: DRC 2015-02-20
  • DRC

    DRC - 2015-02-20
    • status: closed-fixed --> open
     
  • DRC

    DRC - 2015-06-08

    Ugh. Well, after upgrading my Mac Mini to a newer machine (late 2014 model with 3.0 GHz Intel Core i7 and Intel Iris graphics, whereas the old model was 2009 w/ Intel Core 2 Duo and nVidia GeForce 9400), unfortunately now the situation has changed. Now it seems that Apple Java 1.6 on this platform is not hardware-accelerating Java 2D for some reason, so the performance is abyssmal-- even worse than it was with OpenGL on the old Mac before I implemented the turbovnc.forcealpha workaround, and using an alpha channel or attempting to enable OpenGL (-Dsun.java2d.opengl=True) have no effect.

     
  • DRC

    DRC - 2015-06-11

    Breakthrough:

    I was able to work around the Oracle Java 7+/OpenGL blitter performance issues under OS X by taking advantage of the Graphics2D.getClipBounds() method from within DesktopWindow.paintImmediately(). This allowed me to determine which region was passed to repaint() so I could blit only the relevant portion of the back buffer image. The Quartz 2D blitter apparently does this automatically, but the OpenGL blitter apparently doesn't. The performance improvements on the canonical TurboVNC datasets were stunning.

    2009 Mac Mini with nVidia GeForce 9400, Apple Java 6u65:
    * TurboVNC 2.0 beta1 (before):  2D total = 16.4s, 3D total = 34.2s
    * TurboVNC 2.0 (after):         2D total = 10.9s, 3D total = 27.9s
    
    2009 Mac Mini with nVidia GeForce 9400, Oracle Java 8u45:
    * TurboVNC 2.0 beta1 (before):  2D total = 201s,  3D total = 40.9s
    * TurboVNC 2.0 (after):         2D total = 51.0s, 3D total = 29.0s
    
    2011 Macbook Pro with Intel HD Graphics 3000, Apple Java 6u65:
    * TurboVNC 2.0 beta1 (before):  2D total = 9.63s, 3D total = 19.3s
    * TurboVNC 2.0 (after):         2D total = 7.29s, 3D total = 17.2s
    
    2011 Macbook Pro with Intel HD Graphics 3000, Oracle Java 8u45:
    * TurboVNC 2.0 beta1 (before):  2D total = 241s,  3D total = 47.8s
    * TurboVNC 2.0 (after):         2D total = 75.0s, 3D total = 32.1s
    
    2015 Mac Mini with Intel Iris, Apple Java 6u65:
    * TurboVNC 2.0 beta1 (before):  2D total = 219s,  3D total = 59.5s
    * TurboVNC 2.0 (after):         2D total = 234s,  3D total = 55.1s
    
    2015 Mac Mini with Intel Iris, Oracle Java 8u45:
    * TurboVNC 2.0 beta1 (before):  2D total = 133s,  3D total = 26.0s
    * TurboVNC 2.0 (after):         2D total = 21.0s, 3D total = 19.8s
    

    On the older Mini, 2D application performance under Java 8 was improved by a factor of 4, whereas 3D performance is now basically at parity with Apple Java 6. On the Macbook, 2D performance under Java 8 was improved by a factor of 3, and 3D performance was improved by 50%, although Apple Java 6 still performs much better. On the new Mini, 2D performance under Java 8 was improved by a factor of nearly 7, and 3D performance was improved by 35%. Performance in all cases is very poor under Apple Java 6 on this machine-- due to apparent lack of hardware acceleration in Java 2D for the Iris GPU.

    The remaining performance disparity between OpenGL and Quartz 2D seems to track the disparity in ImageDrawTest performance and thus seems to be now explained mostly by pixel throughput. Since that throughput is largely a function of the underlying graphics pipeline and is out of TurboVNC's control, I am closing this as "Fixed." I don't think there's anything further we can do to improve this situation.

     
  • DRC

    DRC - 2015-06-11
    • status: open --> closed-fixed
     

Log in to post a comment.