On 12/03/2012 04:07 AM, Ian Thomas wrote:
I vote for using the raw Python/C API.  I've written a couple of PyCXX extensions and whilst it is mostly convenient, PyCXX doesn't support the use of numpy arrays so for them you have to use the Python/C API.  This means dealing with the reference counting yourself for numpy arrays; extending this to do the reference counting for all python objects is not onerous.  Dealing with object lifetimes is bread-and-butter work for C/C++ developers.

That matches my experience quite well.

I have never used Cython, but to me the code looks like an inelegant combination of Python, C/C++ and some Cython-specific stuff.  I can see the advantage of this approach for small sections of code, but I have strong reservations about using it for complicated modules that have extensive use of templated code and/or Standard Template Library collections (mpl has examples of both of these).

Even for C libraries like libpng, which requires use of C function callbacks for some things, Cython is more convoluted, particularly when things go wrong and require debugging.  (Running gdb over generated Cython code is not fun!)  And in my view, writing code like that requires a pretty deep understanding of the Python/C API, C itself, and the rather complex transformations that Cython performs.  Writing directly to the Python/C API only requires knowledge of the first two.  And there's a large body of books/tutorials/debuggers/tools for C that don't really have equivalents for Cython.


I agree that Cython opens us up to a larger body of contributors, but I don't think that this is necessarily a good thing.  I think this really means opens us up to a larger body of Python/Cython contributors, and is a view expressed from the Python side of the fence and has the wrong emphasis.  I am primarily a C++ developer is a sea of Python developers, and rather than encourage other Python contributors to dip their toes into C/C++ via Cython I think we should be encouraging C/C++ contributors to do what they do best.  We only need a few C/C++ developers if we allow them to use their skills in their preferred way, and they are used to interfacing to legacy APIs and dealing with object lifetimes.

I think Cython is well suited to writing new algorithmic code to speed up hot spots in Python code.  I don't think it's as well suited as glue between C and Python -- that was not a main goal of the original Pyrex project, IIRC.  It feels kind of tacked on and not a very good fit to the problem.  Most of the work to remove PyCXX use in matplotlib is either wrapping third-party libraries (where Cython doesn't really shine), or wrapping C/C++ code in our own tree that's already well-tested and vetted, and I wouldn't propose rewriting that in Cython.  I'm only really considering rewriting the Python-to-C interface layer.


OK, cards on the table.  If we wanted to switch all of our PyCXX modules to use the raw Python/C API, I would happily take on some of the burden for making the changes and ongoing maintenance of such modules.  Particularly if, in return, I get some help with my sometimes substandard Python!  If we go down the Cython route I couldn't make this offer; would our many Cython advocates take on the responsibility of changing and maintaining my C++ code in this scenario?

That's a good way to look at this.  I was definitely hoping that moving to Cython might open us up to more developers, but at the end of the day, the chosen tool should be the one preferred by those doing the work.  Maybe rather than asking "if we switched to using Cython, would more participate", I should be asking "among those that can participate in removing the PyCXX dependency, what is the preferred approach?"

Cheers,
Mike