Re: [Jython-dev] [pypy-dev] What can Cython do for PyPy?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Compatibility first, for certain. We can address performance later.

Roughly speaking, there are three types of extension modules out there:

   - Providing access to native libraries. Maybe this is the case for most
   of the SWIG code out there.
   - Fine grained, such as PyAMF 0.6's use of Cython for
   serialization/deserialization, or decimals and datetimes.
   - Bulk grained, such as NumPy.

Because str/unicode are immutable (unlike Ruby) we won't have to sync on an
upcall, so there's just an initial copy. (Here I'm assuming this would be
done in an implementation of the buffer/memoryview protocol.) Mutable types,
like dict, will have to sync, but this is on a per item basis. Given this,
here are some of my thoughts.

*Access libraries* don't really have performance considerations, so I will
just ignore them.

*Fine-grained libraries* basically don't make sense here from a performance
perspective, and likely will run more slowly than pure Python equivalents
(if they are available). On the other hand, they probably are good for
testing the compatibility of our C ext api support. In particular, I would
think PyAMF would work as an excellent test suite, including ref counting.

*Bulk grained modules* are in the sweet spot. They have a limited number of
downcalls/upcalls and generally manage their own memory allocations for
their objects, as opposed to using Python objects, so there should be
relatively little time spent in Jython's C ext api layer.

The existence of the GIL is another reason that extension code is often
used. In particular, I've had good experience using lxml concurrently.
However, any existing such code needs to be very carefully partitioned so as
not to make upcalls (to access a given Python object) since it would have
to reacquire the GIL. So further minimizes these performance issues.

Such code would be able to take advantage of Java's excellent support for
thread management. In particular, I'm thinking of the fork-join execution
model, as supported by work stealing, that is now available in JSR166y.

But I think it can get better. We probably can do some work to specifically
expose NumPy arrays directly in Jython, through NIO's ByteBuffers. The
advantage here is to write NumPy ufuncs in either Python or Java (the most
important case) that don't incur upcall costs, including GIL access. This
means such ufuncs can run concurrently.

Similar special support might be extended to use cython's nogil annotation,
so that entry points into C code don't have to acquire the GIL (before
entry) to be immediately released in an upcall.

So when can we start?

Wayne, we would definitely appreciate your help bootstrapping on this
project. Now's a good time to start planning, with that coding weekend
sometime mid fall (so basically after the RC process has been in place for
2.5.2). I would assume it can leverage the work you've already done on the
Jython jffi module (for others here - this is used by our partial support
for ctypes, now in trunk, but which won't be complete in time for 2.5.2
release).

- Jim

On Sun, Aug 15, 2010 at 8:17 PM, Wayne Meissner <wme...@gm...> wrote:

> I'll throw in my two bits from what I've garnered from working on the
> jruby cextension code (vs the jruby C FFI and jython's ctypes impls).
>
> As Tal says, compatibility is much more important than performance.
> The performance of jruby's cext layer is approx 50 to 100 times slower
> than the equivalent ruby or even ruby-ffi code.  A lot of the overhead
> is due to the extra book-keeping jruby needs to do to map between
> CRuby's GC system and the JVM's GC system.
>
> A lot of the performance hit comes from places you don't initially
> for-see.  e.g. in ruby (and I think python), you can get direct access
> to the char backing array of a ruby String.  Once that occurs,
> everytime you upcall to the JRuby interpreter (e.g. to call a ruby
> method), the contents of the C char array has to be synced with the
> ruby version - both ways - the C contents are copied to the ruby
> version on the way up, and the ruby contents have to be re-synced to
> the C version on return from the upcall.  Thats a pretty nasty
> performance hit.
>
> Upcalls from native code to java code are pretty slow - approx an
> order of magnitude slower than a java->native call.  For extensions
> that interact with the ruby interpreter a lot (e.g. calling rb_*()
> functions, or methods on ruby objects), the C extension ends up being
> a lot slower than straight ruby code.
>
> The third major performance hit is from allocating native handles for
> each java object that gets passed to native code, and wiring up the
> corresponding WeakReference to clean them up when the java object is
> garbage collected.  In Ruby at least, extensions expect to get the
> same C pointer for a ruby object when it is passed into C code, so we
> needed have a constant mapping from ruby object -> C handle for the
> life of the ruby object.
>
> One area where jython may get a win, is with object life cycle
> management - jruby's GC emulation is basically an
> accumulate-mark-sweep - i.e. when objects are passed to the C layer,
> they get "locked in", until there is a GC mark+sweep phase to release
> any handles that are no longer needed.  That can cause a bit of a
> performance hit.  Jython, on the other hand, knows when an native
> handle is no longer needed when its refcount reaches zero, so handles
> can be freed immediately.
>
> Contrast all that with the way JRuby's FFI and Jython's ctypes work -
> all the interaction with the ruby/python side is done in java, and
> ruby/python objects are converted to native representations on the
> java side, where the JIT has some chance of optimising the process.
> The C side interacts minimally with the JVM, and never upcalls, except
> for callbacks, which means it is pretty darn fast.
>
> If anyone _seriously_ wants to work on making CPython extensions work
> on Jython, I'm happy to help - I just don't want to do it all myself.
> A basic proof-of-concept loader can get easily done in a weekend, and
> then its all iterations to fill out the API.
>
> On 13 August 2010 01:41, Jim Baker <jb...@zy...> wrote:
> > [crossposting to jython-dev]
> > Because of some conversations I had with Maciej (mostly at Folsom Coffee
> in
> > Boulder :) ), we are considering adding support for the CPython
> C-Extension
> > API for Jython, modeling what has already been done in PyPy and
> IronPython.
> > Although I think it may make a lot of sense to port NumPy to Java, and
> have
> > argued for it in the past, being pragmatic suggests it's better to work
> with
> > the tide of NumPy/Cython than against it. Also, this can bring in a large
> > swath of existing libraries to work with Jython, including those coded
> > against SWIG, at the cost that it will not run under most security
> manager
> > policies. I think that's a reasonable tradeoff.
> > Similar concerns that Maciej raises apply to Jython. No Java JIT will
> inline
> > such native code, marshaling from the Java domain to the native one will
> be
> > expensive, etc. But this is (mostly) true of Jython today, from Python
> code
> > to Java (although invokedynamic will at least reduce some of those
> costs).
> > But users can still take advantage of Java to achieve much better
> > performance from Jython, if they are careful about structuring the
> execution
> > of their code. At the end of the day, Jython to C code, including that
> > produced by Cython should see a similar performance profile to CPython to
> C
> > code, as long as they don't hammer the INCREF/DECREF *functions*. (JRuby
> is
> > implementing something similar, and we probably can borrow their
> > "refcounting" support.) But of course that's exactly what one needs to
> avoid
> > to write performant extension code anyway in CPython, at least if it's to
> be
> > multithreaded.
> > One interesting part of this discussion is whether we can support lock
> > eliding. This is one part of JIT inlining that you don't want to give up
> for
> > multithreaded performance. Rather than having C code callback into Java
> to
> > release the GIL (which is only global for such C code!), it would be
> better
> > to have a marker on the C code that allows for immediate release, or
> perhaps
> > some other inlinable Java stub. I could imagine this could be readily
> > supported by Cython (and perhaps already is).
> > Lastly, I want to emphasize again that if/when Jython adds support for
> the C
> > extension API, the "GIL" and "refcounting" support will only be for such
> C
> > code! We like our concurrency support and we are not giving it up :)
> > - Jim
> > On Thu, Aug 12, 2010 at 3:25 AM, Stefan Behnel <ste...@be...>
> wrote:
> >>
> >> Maciej Fijalkowski, 12.08.2010 10:05:
> >> > On Thu, Aug 12, 2010 at 8:49 AM, Stefan Behnel wrote:
> >> >> there has recently been a move towards a .NET/IronPython port of
> >> >> Cython,
> >> >> mostly driven by the need for a fast NumPy port. During the related
> >> >> discussion, the question came up how much it would take to let Cython
> >> >> also
> >> >> target other runtimes, including PyPy.
> >> >>
> >> >> Given that PyPy already has a CPython C-API compatibility layer, I
> >> >> doubt
> >> >> that it would be hard to enable that. With my limited knowledge about
> >> >> the
> >> >> internals of that layer, I guess the question thus becomes: is there
> >> >> anything Cython could do to the C code it generates that would make
> the
> >> >> Cython generated extension modules run faster/better/safer on PyPy
> than
> >> >> they would currently? I never tried to make a Cython module actually
> >> >> run on
> >> >> PyPy (simply because I don't use PyPy), but I have my doubts that
> >> >> they'd
> >> >> run perfectly out of the box. While generally portable, I'm pretty
> sure
> >> >> the
> >> >> C code relies on some specific internals of CPython that PyPy can't
> >> >> easily
> >> >> (or efficiently) provide.
> >> >
> >> > CPython extension compatibility layer is in alpha at best. I heavily
> >> > doubt that anything would run out of the box. However, this is a
> >> > cpython compatiblity layer anyway, it's not meant to be used as a long
> >> > term solutions. First of all it's inneficient (and unclear if will
> >> > ever be)
> >>
> >> If you only use it to call into non-trivial Cython code (e.g. some heavy
> >> calculations on NumPy tables), the call overhead should be mostly
> >> negligible, maybe even close to that in CPython. You could even provide
> >> some kind of fast-path to 'cpdef' functions (i.e. functions that are
> >> callable from both C and Python) and 'api' functions (which are
> currently
> >> exported at the module API level using the PyCapsule mechanism). That
> >> would
> >> reduce the call overhead to that of a C call.
> >>
> >> Then, a lot of Cython code doesn't do much ref-counting and the like but
> >> simply runs in plain C. So, often enough, there won't be that much
> >> overhead
> >> involved in the code itself either, especially in tight loops where
> users
> >> prune away all CPython interaction anyway.
> >>
> >>
> >> > but it's also unjitable. This means that to JIT, cpython
> >> > extension is like a black box which should not be touched.
> >>
> >> Well, unless both sides learn about each other, that is. It won't
> >> necessarily impact the JIT, but then again, a JIT usually won't have a
> >> noticeable impact on the performance of Cython code anyway.
> >>
> >>
> >> > Also, several concepts, like refcounting are completely alien to pypy
> >> > and emulated.
> >>
> >> Sure. That's why I asked if there is anything that Cython can help to
> >> improve here. For example, the code it generates for INCREF/DECREF
> >> operations is not only configurable at the C preprocessor level.
> >>
> >>
> >> > For example for numpy, I think a rewrite is necessary to make it fast
> >> > (and as experiments have shown, it's possible to make it really fast),
> >> > so I would not worry about using cython for speeding things up.
> >>
> >> This isn't only about making things fast when being rewritten. This is
> >> also
> >> about accessing and reusing existing code in a new environment. Cython
> is
> >> becoming increasingly popular in the numerics community, and a lot of
> >> Cython code is being written as we speak, not only in the SciPy/NumPy
> >> environment. People even find it attractive enough to start rewriting
> >> their
> >> CPython extension modules (most often library wrappers) from C in
> Cython,
> >> both for performance and TCO reasons.
> >>
> >>
> >> > There is another usecase for using cython for providing access to C
> >> > libraries. This is a bit harder question and I don't have a good
> >> > answer for that, but maybe cpython compatibility layer would be good
> >> > enough in this case? I can't see how Cython can produce a "native" C
> >> > code instead of CPython C code without some major effort.
> >>
> >> Native (standalone) C code isn't the goal, just something that adapts
> well
> >> to what PyPy can provide as a CPython compatibility layer. If Cython
> >> modules work across independent Python implementations, that would be
> the
> >> most simple way by far to make lots of them available cross-platform,
> thus
> >> making it a lot simpler to switch between different implementations.
> >>
> >> Stefan
> >>
> >> _______________________________________________
> >> pyp...@co...
> >> http://codespeak.net/mailman/listinfo/pypy-dev
> >
> >
> >
> ------------------------------------------------------------------------------
> > This SF.net email is sponsored by
> >
> > Make an app they can't live without
> > Enter the BlackBerry Developer Challenge
> > http://p.sf.net/sfu/RIM-dev2dev
> > _______________________________________________
> > Jython-dev mailing list
> > Jyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/jython-dev
> >
> >
>