From: Jeff A. <ja...@fa...> - 2021-01-06 16:29:22
|
On 06/01/2021 02:33, Thad Guidry wrote: > Hi Jeff! > > I'm from the OpenRefine team where we are constantly watching the > future of Jython since we use it as an expression language within > OpenRefine, along with Clojure. > We've talked on the mailing list I think in the past, perhaps not. > I think we have. Thanks for your continued interest in Jython. > Regarding the microbenchmarks and your analysis...and some of the > anomalies you found... > I'm wondering if you verified that SIMD, SSE, etc. intrinsics were > being used or not sometimes? > https://www.amd.com/system/files/TechDocs/25112.PDF#G14.232935 > <https://www.amd.com/system/files/TechDocs/25112.PDF#G14.232935> > Yes, I found similar information: that's what led to my conclusions about the quartic test. I'm impressed HotSpot is able to use them. > And to see if intrinsic methods are being utilized or not and where in > compiled code, you can add: > -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions > -XX:+PrintInlining > Unlock has to come first, it seems. I've experimented with those options and found what they produced was pretty incomprehensible. I never made the disassembly option work. Going back and trying a little harder, thanks to your suggestion, I got further this morning. The output remains too complex for me to follow (so many jumps!), but a superficial inspection supports the conjectures I made based only on timing. In particular, of the three fixtures, only for Jython 2 does the JVM manage to in-line the floating point arithmetic into quartic(). It contains this in what I assume is the fast path: 0x00000202187735df: movapd xmm3,xmm0 0x00000202187735e3: addsd xmm3,xmm2 0x00000202187735e7: subsd xmm2,xmm0 ;*dsub {reexecute=0 rethrow=0 return_oop=0} ; - org.python.core.PyFloat::float___sub__@23 (line 486) ; - org.python.core.PyFloat::__sub__@2 (line 477) ; - org.python.core.PyObject::_basic_sub@2 (line 2192) ; - org.python.core.PyObject::_sub@31 (line 2177) ; - uk.co.farowl.jy2bm.PyFloatBinary::quartic@33 (line 86) 0x00000202187735eb: mulsd xmm3,xmm1 I added a task to the Gradle scripts that dumps the compiled code (if one has the hsdis-amd64 plug-in) as I'm sure to forget how I did this. https://github.com/jeff5/very-slow-jython/blob/333f61d54787f7499ec8141eafe6b8c5c04f0cea/jy2bm/jy2bm.gradle#L74 > You might also be thought provoked with some extra information within > this JEP: > https://bugs.openjdk.java.net/browse/JDK-8205637 > <https://bugs.openjdk.java.net/browse/JDK-8205637> > > Some Java JVM compilers & many of Java's robust libraries completely > miss the point of sometimes using Intrinsic functions as often as > possible. For example: SSE 4.2 > https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=898,2862,2861,2860,2863,2864,2865&techs=SSE4_2 > <https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=898,2862,2861,2860,2863,2864,2865&techs=SSE4_2>and > the reason why Azul's Zing JVM is fast, is because it DOES use > intrinsic functions as much as possible. Kris Mok (Azul Systems) did a > great presentation of this back in 2013 > https://www.slideshare.net/RednaxelaFX/green-teajug-hotspotintrinsics02232013 > <https://www.slideshare.net/RednaxelaFX/green-teajug-hotspotintrinsics02232013> > One of the nice things about a JVM language is that it gets better when other people do clever things. Jeff |