Thread: [matplotlib-devel] Experiments in removing/replacing PyCXX

Brought to you by: cjgohlke, dsdale, efiring, heeres, and 8 others

matplotlib-devel

[matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michael D. <md...@st...> - 2012-11-29 17:00:09

Given the slow pace of development on PyCXX, I know it has been the 
desire of some here to remove our dependency on it.

I thought a helpful starting point to evaluate the alternatives would be 
to restructure one of our extensions to not use PyCXX anymore.  I've 
taken the PNG extension, which is reasonably straightforward in that it 
doesn't define any custom types, but does have some low level C-wrapping 
challenges, and separated out the Python-specific parts from the 
libpng-specific parts.  The Python-specific parts are now written using 
the "raw" Python C/API. The other part still uses C++ (not C) and does 
throw exceptions, but doesn't use classes or templates or anything else 
that can be difficult to wrap.  All of this is on my "no_cxx" branch.

Now here's the challenge: can we do better than this using any of the 
available wrapping tools?  Cython, SWIG, Boost.Python etc.? I've not had 
much luck with Cython for this kind of thing in the past, but I know it 
is popular.  Perhaps someone with more Cython experience would want to 
take a crack at this and then we could have something concrete to compare...

Cheers,
Mike

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michiel de H. <mjl...@ya...> - 2012-11-30 01:47:49

Hi,

The Mac OS X backend is entirely written in C (with some Objective-C elements where necessary). AFAICT, this is the largest C/C++ code in matplotlib. This backend was written from scratch without using Cython, SWIG, or Boost.Python. From my experience, I would prefer to write such extensions in C directly rather than relying on Cython, SWIG, or Boost.Python, because those approaches would lead to another dependency (for developers at least), and requires developers to learn how to code in them. Which may not be very hard, but we may as well avoid that if possible.

I'd be happy to help out with the conversion of the other extensions from CXX to C. I would need some help though to use github appropriately.

Best,
-Michiel.


--- On Thu, 11/29/12, Michael Droettboom <md...@st...> wrote:

> From: Michael Droettboom <md...@st...>
> Subject: [matplotlib-devel] Experiments in removing/replacing PyCXX
> To: "mat...@li..." <mat...@li...>
> Date: Thursday, November 29, 2012, 11:59 AM
> Given the slow pace of development on
> PyCXX, I know it has been the 
> desire of some here to remove our dependency on it.
> 
> I thought a helpful starting point to evaluate the
> alternatives would be 
> to restructure one of our extensions to not use PyCXX
> anymore.  I've 
> taken the PNG extension, which is reasonably straightforward
> in that it 
> doesn't define any custom types, but does have some low
> level C-wrapping 
> challenges, and separated out the Python-specific parts from
> the 
> libpng-specific parts.  The Python-specific parts are
> now written using 
> the "raw" Python C/API. The other part still uses C++ (not
> C) and does 
> throw exceptions, but doesn't use classes or templates or
> anything else 
> that can be difficult to wrap.  All of this is on my
> "no_cxx" branch.
> 
> Now here's the challenge: can we do better than this using
> any of the 
> available wrapping tools?  Cython, SWIG, Boost.Python
> etc.? I've not had 
> much luck with Cython for this kind of thing in the past,
> but I know it 
> is popular.  Perhaps someone with more Cython
> experience would want to 
> take a crack at this and then we could have something
> concrete to compare...
> 
> Cheers,
> Mike
> 
> ------------------------------------------------------------------------------
> Keep yourself connected to Go Parallel: 
> VERIFY Test and improve your parallel project with help from
> experts 
> and peers. http://goparallel.sourceforge.net
> _______________________________________________
> Matplotlib-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
>

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michael D. <md...@st...> - 2012-11-30 14:07:09

Thanks, Michiel.

If you read between the lines of what I was saying, that is basically 
where I fall as well.  There seems to be a lot of desire to use Cython 
to make the code more accessible, however, and I'm willing to consider 
it if it can be shown to be superior to the raw C/API for this task -- 
I'm not sure it is -- I always seem to end up with things that are more 
lines of code with more obscure workarounds than just coding in C directly.

Cheers,
Mike

On 11/29/2012 08:47 PM, Michiel de Hoon wrote:
> Hi,
>
> The Mac OS X backend is entirely written in C (with some Objective-C elements where necessary). AFAICT, this is the largest C/C++ code in matplotlib. This backend was written from scratch without using Cython, SWIG, or Boost.Python. From my experience, I would prefer to write such extensions in C directly rather than relying on Cython, SWIG, or Boost.Python, because those approaches would lead to another dependency (for developers at least), and requires developers to learn how to code in them. Which may not be very hard, but we may as well avoid that if possible.
>
> I'd be happy to help out with the conversion of the other extensions from CXX to C. I would need some help though to use github appropriately.
>
> Best,
> -Michiel.
>
>
> --- On Thu, 11/29/12, Michael Droettboom <md...@st...> wrote:
>
>> From: Michael Droettboom <md...@st...>
>> Subject: [matplotlib-devel] Experiments in removing/replacing PyCXX
>> To: "mat...@li..." <mat...@li...>
>> Date: Thursday, November 29, 2012, 11:59 AM
>> Given the slow pace of development on
>> PyCXX, I know it has been the
>> desire of some here to remove our dependency on it.
>>
>> I thought a helpful starting point to evaluate the
>> alternatives would be
>> to restructure one of our extensions to not use PyCXX
>> anymore.  I've
>> taken the PNG extension, which is reasonably straightforward
>> in that it
>> doesn't define any custom types, but does have some low
>> level C-wrapping
>> challenges, and separated out the Python-specific parts from
>> the
>> libpng-specific parts.  The Python-specific parts are
>> now written using
>> the "raw" Python C/API. The other part still uses C++ (not
>> C) and does
>> throw exceptions, but doesn't use classes or templates or
>> anything else
>> that can be difficult to wrap.  All of this is on my
>> "no_cxx" branch.
>>
>> Now here's the challenge: can we do better than this using
>> any of the
>> available wrapping tools?  Cython, SWIG, Boost.Python
>> etc.? I've not had
>> much luck with Cython for this kind of thing in the past,
>> but I know it
>> is popular.  Perhaps someone with more Cython
>> experience would want to
>> take a crack at this and then we could have something
>> concrete to compare...
>>
>> Cheers,
>> Mike
>>
>> ------------------------------------------------------------------------------
>> Keep yourself connected to Go Parallel:
>> VERIFY Test and improve your parallel project with help from
>> experts
>> and peers. http://goparallel.sourceforge.net
>> _______________________________________________
>> Matplotlib-devel mailing list
>> Mat...@li...
>> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
>>

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Chris B. - N. F. <chr...@no...> - 2012-11-30 17:33:16

On Fri, Nov 30, 2012 at 6:06 AM, Michael Droettboom <md...@st...> wrote:

> If you read between the lines of what I was saying, that is basically
> where I fall as well.  There seems to be a lot of desire to use Cython
> to make the code more accessible,

I'll add a beat to that drum -- I'm a big Cython fan.

> however, and I'm willing to consider
> it if it can be shown to be superior to the raw C/API for this task --

I think there is NO QUESTION that Cython is superior to the C/API --
why would you want to deal with the reference counting, etc yourself?
Cython can handle the boiler plate code for you very cleanly an
elegantly.

Something to keep in mind about Cython:

It can be used in multiple ways:

1) Add static typing to what is essentially Python code to get better
performance -- this may be what you mean by the "more accesible" part.
A great use, but maybe, maybe, maybe not best for the core bits of
MPL.

2) Calling C/C++ code -- Cython is s great way to call C/C++ code --
it can handle the packing and unpacking of python types, reference
counting, etc. for you, so much like using the C API, but a lot less
tricky boiler plate code to write.

(2) is the use case that I'm arguing is NO QUESTION a better option
than the C API.

Compared to SWIG, SIP (and I assume C_XX), the downside is that there
is no auto-generation of wrappers (at least nothing mature). However,
for the MPL case, we're not trying to wrap a large existing library,
but rather particular code that is often written for MPL specifically,
so hand-writing the Cython is a fine option.

So why not Ctypes, or??? I think the real strength of Cython in
wrapping C code is that you can write a "thick" wrapper in an
almost_python language. So if you want to vectorize a C function, for
instance, you can write that bit in Cython very easily (and Cython's
built-in understanding of numpy array is very helpful here). When you
use ctypes, you need to write that in pure Python -- easy enough, but
probably not very performant.

With SWIG, etc, you end up writing a fair bi tof C (or SWIG) code to
handle the thicker bits of the wrapper -- so you're dealing with the
raw CPython API, and , well, C. Cython really is an easier option.

I've found that for stuf that is less than very small (i.e. one or two
loops through an array), writing the core code in native C or C++ can
be easier, you know for sure you're not accidentally making expensive
Python calls, etc. but using Cython to call it is still very helpful.

> I'm not sure it is -- I always seem to end up with things that are more
> lines of code with more obscure workarounds than just coding in C directly.

Exactly -- but I don't think that applies to the CPython-API bits, but
rather the core code -- so keep that in C.

In summary, I guess what I think is the power of Cython is the
flexibility in where you draw the line between Python, Cython, and C
-- you can pass pure Python through Cython, or you can do almost
nothing with it but call a C function, and eveything in between.

> From my experience, I would prefer to write such extensions in C directly rather
> than relying on Cython, SWIG, or Boost.Python, because those approaches would
> lead to another dependency (for developers at least),

The dependency is pretty easy to deal with compared to the many others in MPL.

> and requires developers to
> learn how to code in them. Which may not be very hard, but we may as well avoid > that if possible.

Here's where I disagree -- if we go pure C and C-API developers need
to know the Python C-API -- that is actually a pretty big deal, and
hard to get right. Knowing enough Cython to call some C code is a
smaller lift for sure.

Anyway, I saw give it a shot -- I suspect you'll like it.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michiel de H. <mjl...@ya...> - 2012-11-30 23:41:04

One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?

Best,
-Michiel.

--- On Fri, 11/30/12, Chris Barker - NOAA Federal <chr...@no...> wrote:

> From: Chris Barker - NOAA Federal <chr...@no...>
> Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX
> To: "Michael Droettboom" <md...@st...>
> Cc: "Michiel de Hoon" <mjl...@ya...>, "mat...@li..." <mat...@li...>
> Date: Friday, November 30, 2012, 12:32 PM
> On Fri, Nov 30, 2012 at 6:06 AM,
> Michael Droettboom <md...@st...>
> wrote:
> 
> > If you read between the lines of what I was saying,
> that is basically
> > where I fall as well.  There seems to be a lot of
> desire to use Cython
> > to make the code more accessible,
> 
> I'll add a beat to that drum -- I'm a big Cython fan.
> 
> > however, and I'm willing to consider
> > it if it can be shown to be superior to the raw C/API
> for this task --
> 
> I think there is NO QUESTION that Cython is superior to the
> C/API --
> why would you want to deal with the reference counting, etc
> yourself?
> Cython can handle the boiler plate code for you very cleanly
> an
> elegantly.
> 
> Something to keep in mind about Cython:
> 
> It can be used in multiple ways:
> 
> 1) Add static typing to what is essentially Python code to
> get better
> performance -- this may be what you mean by the "more
> accesible" part.
> A great use, but maybe, maybe, maybe not best for the core
> bits of
> MPL.
> 
> 2) Calling C/C++ code -- Cython is s great way to call C/C++
> code --
> it can handle the packing and unpacking of python types,
> reference
> counting, etc. for you, so much like using the C API, but a
> lot less
> tricky boiler plate code to write.
> 
> (2) is the use case that I'm arguing is NO QUESTION a better
> option
> than the C API.
> 
> Compared to SWIG, SIP (and I assume C_XX), the downside is
> that there
> is no auto-generation of wrappers (at least nothing mature).
> However,
> for the MPL case, we're not trying to wrap a large existing
> library,
> but rather particular code that is often written for MPL
> specifically,
> so hand-writing the Cython is a fine option.
> 
> So why not Ctypes, or??? I think the real strength of Cython
> in
> wrapping C code is that you can write a "thick" wrapper in
> an
> almost_python language. So if you want to vectorize a C
> function, for
> instance, you can write that bit in Cython very easily (and
> Cython's
> built-in understanding of numpy array is very helpful here).
> When you
> use ctypes, you need to write that in pure Python -- easy
> enough, but
> probably not very performant.
> 
> With SWIG, etc, you end up writing a fair bi tof C (or SWIG)
> code to
> handle the thicker bits of the wrapper -- so you're dealing
> with the
> raw CPython API, and , well, C. Cython really is an easier
> option.
> 
> I've found that for stuf that is less than very small (i.e.
> one or two
> loops through an array), writing the core code in native C
> or C++ can
> be easier, you know for sure you're not accidentally making
> expensive
> Python calls, etc. but using Cython to call it is still very
> helpful.
> 
> > I'm not sure it is -- I always seem to end up with
> things that are more
> > lines of code with more obscure workarounds than just
> coding in C directly.
> 
> Exactly -- but I don't think that applies to the CPython-API
> bits, but
> rather the core code -- so keep that in C.
> 
> In summary, I guess what I think is the power of Cython is
> the
> flexibility in where you draw the line between Python,
> Cython, and C
> -- you can pass pure Python through Cython, or you can do
> almost
> nothing with it but call a C function, and eveything in
> between.
> 
> > From my experience, I would prefer to write such
> extensions in C directly rather
> > than relying on Cython, SWIG, or Boost.Python, because
> those approaches would
> > lead to another dependency (for developers at least),
> 
> The dependency is pretty easy to deal with compared to the
> many others in MPL.
> 
> > and requires developers to
> > learn how to code in them. Which may not be very hard,
> but we may as well avoid > that if possible.
> 
> Here's where I disagree -- if we go pure C and C-API
> developers need
> to know the Python C-API -- that is actually a pretty big
> deal, and
> hard to get right. Knowing enough Cython to call some C code
> is a
> smaller lift for sure.
> 
> Anyway, I saw give it a shot -- I suspect you'll like it.
> 
> -Chris
> 
> 
> -- 
> 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR&R           
> (206) 526-6959   voice
> 7600 Sand Point Way NE   (206)
> 526-6329   fax
> Seattle, WA  98115       (206)
> 526-6317   main reception
> 
> Chr...@no...
>

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Nathaniel S. <nj...@po...> - 2012-11-30 23:44:53

On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjl...@ya...> wrote:
> One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?

You can set things up so that end-users don't have to install cython.
You just convert the .pyx files to regular .c files before
distributing your package. Numpy itself uses cython, but end-users
don't notice or care. (It's something more of a hassle for developers
to do things this way, and cython is very easy to install, so I don't
know if it's worth it. But it's certainly possible.)

-n

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Benjamin R. <ben...@ou...> - 2012-12-01 01:33:16

On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <nj...@po...> wrote:

> On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjl...@ya...>
> wrote:
> > One package (Pysam) that I use a lot relies on Cython, and requires
> users to install Cython before they can install Pysam itself. With Cython,
> is that always the case? Will all users need to install Cython? Or is it
> sufficient if only matplotlib developers install Cython?
>
> You can set things up so that end-users don't have to install cython.
> You just convert the .pyx files to regular .c files before
> distributing your package. Numpy itself uses cython, but end-users
> don't notice or care. (It's something more of a hassle for developers
> to do things this way, and cython is very easy to install, so I don't
> know if it's worth it. But it's certainly possible.)
>
>
Since when has numpy used Cython?  I specifically remember a rather
involved discussion thread on numpy-discussion about the pros-and-cons of
including cython.  Now, SciPy on the other hand, does utilize Cython in
some spots IIRC, but does it in a way that it isn't even required for the
developers to have cython installed to build from source.

I would not be against such an approach.  Much of the C/C++ stuff is rarely
touched.  If we have some source cython that is used to generate C/C++
source code that is packaged in the same way as the current code is, I
would have no problem with that.  Given that matplotlib is such a
fundamental tool in the ecosystem, I want to make sure that the decisions
we make are ones that improves our packaging situation.

Cheers!
Ben Root

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Nelle V. <nel...@gm...> - 2012-12-01 11:03:01

>> > One package (Pysam) that I use a lot relies on Cython, and requires
>> > users to install Cython before they can install Pysam itself. With Cython,
>> > is that always the case? Will all users need to install Cython? Or is it
>> > sufficient if only matplotlib developers install Cython?
>>
>> You can set things up so that end-users don't have to install cython.
>> You just convert the .pyx files to regular .c files before
>> distributing your package. Numpy itself uses cython, but end-users
>> don't notice or care. (It's something more of a hassle for developers
>> to do things this way, and cython is very easy to install, so I don't
>> know if it's worth it. But it's certainly possible.)
>>
>
> Since when has numpy used Cython?  I specifically remember a rather involved
> discussion thread on numpy-discussion about the pros-and-cons of including
> cython.  Now, SciPy on the other hand, does utilize Cython in some spots
> IIRC, but does it in a way that it isn't even required for the developers to
> have cython installed to build from source.

You just ship the c/c++ code for the developpers as well as for the
end users. This is what we do with scikit-learn. It requires the
developpers to make sure to compile the cython code, and commit both
files. It is also quite annoying for reviews to have the generated c++
code, so the cython code needs to be compile after the reviews.

The reason the scikit's developpers chose to use cython instead of
something  else is to decrease the maintenance burden: more
contributors understand cython code than c/c++ code (or more
precisely, understand c++ code written by someone else). Hence, this
increases the bus number.

> I would not be against such an approach.  Much of the C/C++ stuff is rarely
> touched.  If we have some source cython that is used to generate C/C++
> source code that is packaged in the same way as the current code is, I would
> have no problem with that.  Given that matplotlib is such a fundamental tool
> in the ecosystem, I want to make sure that the decisions we make are ones
> that improves our packaging situation.
>
> Cheers!
> Ben Root
>
> ------------------------------------------------------------------------------
> Keep yourself connected to Go Parallel:
> INSIGHTS What's next for parallel hardware, programming and related areas?
> Interviews and blogs by thought leaders keep you ahead of the curve.
> http://goparallel.sourceforge.net
> _______________________________________________
> Matplotlib-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel
>

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Julian T. <jta...@go...> - 2012-12-01 12:13:57

On 12/01/2012 02:32 AM, Benjamin Root wrote:
> 
> 
> Since when has numpy used Cython?  I specifically remember a rather
> involved discussion thread on numpy-discussion about the pros-and-cons
> of including cython.  Now, SciPy on the other hand, does utilize Cython
> in some spots IIRC, but does it in a way that it isn't even required for
> the developers to have cython installed to build from source.

If you should choose cython please don't follow scipy too closely.
Up until rather recent git head they did not ship the cython sources in
their source tarballs which occasionally lead to inconsistent generated
files (e.g. in 0.10.1 interpnd.pyx) and causes trouble for distributors
(see e.g. debian bug 589731)

A better example to follow would be e.g. pyzmq which ships both the
cython and generated sources and has an easy to use cython setup.py
target to recythonize.

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michiel de H. <mjl...@ya...> - 2012-12-01 14:45:03

In my experience, Benjamin is right that the C code is rarely touched. This is even more true for the Python/C glue code, at least from my experience with the Mac OS X backend. Since the Python/C glue code is modified only very rarely, there may not be a  need for regenerating the Python/C glue code by developers or users from a Cython source code.

In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it's relatively straightforward to modify it by looking at the existing Python/C glue code.

This argues against making the Cython source code a part of the matplotlib codebase.

At the same time, to minimize errors, we could use Cython to create the initial Python/C glue code, and then add the generated code to the matplotlib codebase. Then neither users nor developers have to install Cython, we don't have to worry about inconsistencies (if any) between different Cython versions, we don't have to worry about keeping the Cython source code and the generated code in sync, and we will still get a high-quality Cython-generated Python/C glue code.

By the way, how many modules in matplotlib make use of CXX, and would have to be converted?

Best,
-Michiel.

--- On Fri, 11/30/12, Benjamin Root <ben...@ou...> wrote:

From: Benjamin Root <ben...@ou...>
Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX
To: "Nathaniel Smith" <nj...@po...>
Cc: "Michiel de Hoon" <mjl...@ya...>, "mat...@li..." <mat...@li...>, "Chris Barker - NOAA Federal" <chr...@no...>
Date: Friday, November 30, 2012, 8:32 PM



On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <nj...@po...> wrote:


On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjl...@ya...> wrote:

> One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?





You can set things up so that end-users don't have to install cython.

You just convert the .pyx files to regular .c files before

distributing your package. Numpy itself uses cython, but end-users

don't notice or care. (It's something more of a hassle for developers

to do things this way, and cython is very easy to install, so I don't

know if it's worth it. But it's certainly possible.)



Since when has numpy used Cython?  I specifically remember a rather involved discussion thread on numpy-discussion about the pros-and-cons of including cython.  Now, SciPy on the other hand, does utilize Cython in some spots IIRC, but does it in a way that it isn't even required for the developers to have cython installed to build from source.



I would not be against such an approach.  Much of the C/C++ stuff is rarely touched.  If we have some source cython that is used to generate C/C++ source code that is packaged in the same way as the current code is, I would have no problem with that.  Given that matplotlib is such a fundamental tool in the ecosystem, I want to make sure that the decisions we make are ones that improves our packaging situation.



Cheers!
Ben Root

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Ryan M. <rm...@gm...> - 2012-12-01 14:56:55

I'm +1 on Cython. I think its prevalence in the community gives us a larger potential contributor pool than CXX or hand-coded python C-API. I know using  Cython would open up that part of the code base for me.

Ryan

On Dec 1, 2012, at 8:44, Michiel de Hoon <mjl...@ya...> wrote:

> 
> In my experience, Benjamin is right that the C code is rarely touched. This is even more true for the Python/C glue code, at least from my experience with the Mac OS X backend. Since the Python/C glue code is modified only very rarely, there may not be a  need for regenerating the Python/C glue code by developers or users from a Cython source code.
> 
> In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it's relatively straightforward to modify it by looking at the existing Python/C glue code.
> 
> This argues against making the Cython source code a part of the matplotlib codebase.
> 
> At the same time, to minimize errors, we could use Cython to create the initial Python/C glue code, and then add the generated code to the matplotlib codebase. Then neither users nor developers have to install Cython, we don't have to worry about inconsistencies (if any) between different Cython versions, we don't have to worry about keeping the Cython source code and the generated code in sync, and we will still get a high-quality Cython-generated Python/C glue code.
> 
> By the way, how many modules in matplotlib make use of CXX, and would have to be converted?
> 
> Best,
> -Michiel.
> 
> --- On Fri, 11/30/12, Benjamin Root <ben...@ou...> wrote:
> 
> From: Benjamin Root <ben...@ou...>
> Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX
> To: "Nathaniel Smith" <nj...@po...>
> Cc: "Michiel de Hoon" <mjl...@ya...>, "mat...@li..." <mat...@li...>, "Chris Barker - NOAA Federal" <chr...@no...>
> Date: Friday, November 30, 2012, 8:32 PM
> 
> 
> 
> On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <nj...@po...> wrote:
> On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjl...@ya...> wrote:
> > One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython?
> 
> You can set things up so that end-users don't have to install cython.
> You just convert the .pyx files to regular .c files before
> distributing your package. Numpy itself uses cython, but end-users
> don't notice or care. (It's something more of a hassle for developers
> to do things this way, and cython is very easy to install, so I don't
> know if it's worth it. But it's certainly possible.)
> 
> 
> Since when has numpy used Cython?  I specifically remember a rather involved discussion thread on numpy-discussion about the pros-and-cons of including cython.  Now, SciPy on the other hand, does utilize Cython in some spots IIRC, but does it in a way that it isn't even required for the developers to have cython installed to build from source.
> 
> I would not be against such an approach.  Much of the C/C++ stuff is rarely touched.  If we have some source cython that is used to generate C/C++ source code that is packaged in the same way as the current code is, I would have no problem with that.  Given that matplotlib is such a fundamental tool in the ecosystem, I want to make sure that the decisions we make are ones that improves our packaging situation.
> 
> Cheers!
> Ben Root
> ------------------------------------------------------------------------------
> Keep yourself connected to Go Parallel: 
> INSIGHTS What's next for parallel hardware, programming and related areas?
> Interviews and blogs by thought leaders keep you ahead of the curve.
> http://goparallel.sourceforge.net
> _______________________________________________
> Matplotlib-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michael D. <md...@st...> - 2012-12-01 19:03:13

Including the Cython-generated C in the tarballs and optionally the git 
repository as well can certainly be considered to reduce the need for 
Cython for developers and users alike.  However, the Cython source 
should also be included in the repository for the inevitable times when 
it does need to be updated -- it shouldn't be off somewhere else.

The png, path, ft2font, backend_agg, gtkagg, tkagg, tri, and image 
modules all use CXX.  The backend_agg, image and ft2font ones are 
particularly complex, but some of that complexity could be reduced by 
using Numpy arrays in place of the image buffer types that each of them 
contain (that code predates matplotlib's numpy requirement, so it's not 
terribly surprising that a more complex approach was taken).

Mike

On 12/01/2012 09:44 AM, Michiel de Hoon wrote:
> In my experience, Benjamin is right that the C code is rarely touched. 
> This is even more true for the Python/C glue code, at least from my 
> experience with the Mac OS X backend. Since the Python/C glue code is 
> modified only very rarely, there may not be a  need for regenerating 
> the Python/C glue code by developers or users from a Cython source code.
>
> In addition, it is much easier to maintain the Python/C glue code than 
> to write it from scratch. Once you have the Python/C glue code, it's 
> relatively straightforward to modify it by looking at the existing 
> Python/C glue code.
>
> This argues against making the Cython source code a part of the 
> matplotlib codebase.
>
> At the same time, to minimize errors, we could use Cython to create 
> the initial Python/C glue code, and then add the generated code to the 
> matplotlib codebase. Then neither users nor developers have to install 
> Cython, we don't have to worry about inconsistencies (if any) between 
> different Cython versions, we don't have to worry about keeping the 
> Cython source code and the generated code in sync, and we will still 
> get a high-quality Cython-generated Python/C glue code.
>
> By the way, how many modules in matplotlib make use of CXX, and would 
> have to be converted?
>
> Best,
> -Michiel.
>
> --- On *Fri, 11/30/12, Benjamin Root /<ben...@ou...>/* wrote:
>
>
>     From: Benjamin Root <ben...@ou...>
>     Subject: Re: [matplotlib-devel] Experiments in removing/replacing
>     PyCXX
>     To: "Nathaniel Smith" <nj...@po...>
>     Cc: "Michiel de Hoon" <mjl...@ya...>,
>     "mat...@li..."
>     <mat...@li...>, "Chris Barker - NOAA
>     Federal" <chr...@no...>
>     Date: Friday, November 30, 2012, 8:32 PM
>
>
>
>     On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <nj...@po...
>     </mc/compose?to=nj...@po...>> wrote:
>
>         On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon
>         <mjl...@ya... </mc/compose?to=mjl...@ya...>> wrote:
>         > One package (Pysam) that I use a lot relies on Cython, and
>         requires users to install Cython before they can install Pysam
>         itself. With Cython, is that always the case? Will all users
>         need to install Cython? Or is it sufficient if only matplotlib
>         developers install Cython?
>
>         You can set things up so that end-users don't have to install
>         cython.
>         You just convert the .pyx files to regular .c files before
>         distributing your package. Numpy itself uses cython, but end-users
>         don't notice or care. (It's something more of a hassle for
>         developers
>         to do things this way, and cython is very easy to install, so
>         I don't
>         know if it's worth it. But it's certainly possible.)
>
>
>     Since when has numpy used Cython?  I specifically remember a
>     rather involved discussion thread on numpy-discussion about the
>     pros-and-cons of including cython.  Now, SciPy on the other hand,
>     does utilize Cython in some spots IIRC, but does it in a way that
>     it isn't even required for the developers to have cython installed
>     to build from source.
>
>     I would not be against such an approach.  Much of the C/C++ stuff
>     is rarely touched.  If we have some source cython that is used to
>     generate C/C++ source code that is packaged in the same way as the
>     current code is, I would have no problem with that.  Given that
>     matplotlib is such a fundamental tool in the ecosystem, I want to
>     make sure that the decisions we make are ones that improves our
>     packaging situation.
>
>     Cheers!
>     Ben Root
>
>
>
> ------------------------------------------------------------------------------
> Keep yourself connected to Go Parallel:
> INSIGHTS What's next for parallel hardware, programming and related areas?
> Interviews and blogs by thought leaders keep you ahead of the curve.
> http://goparallel.sourceforge.net
>
>
> _______________________________________________
> Matplotlib-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Thomas K. <th...@kl...> - 2012-12-01 17:12:23

Drat, re-sending on the list.

On 1 December 2012 16:40, Thomas Kluyver <th...@kl...> wrote:

> On 1 December 2012 14:44, Michiel de Hoon <mjl...@ya...> wrote:
>
>> At the same time, to minimize errors, we could use Cython to create the
>> initial Python/C glue code, and then add the generated code to the
>> matplotlib codebase. Then neither users nor developers have to install
>> Cython, we don't have to worry about inconsistencies (if any) between
>> different Cython versions, we don't have to worry about keeping the Cython
>> source code and the generated code in sync, and we will still get a
>> high-quality Cython-generated Python/C glue code.
>
>
> Having looked at some bits of Cython-generated C code, I wouldn't
> recommend that. I'm sure it's high quality in terms of compiling and
> running correctly, but it's definitely not designed to be read or
> maintained directly. Here's a sample from SciPy to illustrate:
>
>
> https://github.com/scipy/scipy/blob/master/scipy/stats/vonmises_cython.c#L2269
>
> For another reason, there have been cases where the Cython-generated C
> code was broken in some way, and it was fixed by regenerating with a newer
> version of Cython. I experienced this with pyzmq when testing with Python
> 3.3 for example - it completely failed to import until I installed a newer
> version of Cython and redid the conversion. If you don't keep the original
> Cython code, you don't have this option.
>
> Best wishes,
> Thomas
>

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Chris B. - N. F. <chr...@no...> - 2012-12-03 18:13:09

On Sat, Dec 1, 2012 at 6:44 AM, Michiel de Hoon
>
>  Since the Python/C glue code is modified only very rarely, there may not be a  need for regenerating the Python/C glue code by developers or users from a Cython source code.

True.

> In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it's relatively straightforward to modify it by looking at the existing Python/C glue code.
>
not so true -- getting reference counting right, etc is difficult -- I
suppose once the glue code is robust, and all you are changing is a
bit of API to the C, maybe....

>
> This argues against making the Cython source code a part of the matplotlib codebase.
>
huh? are you suggesting that we use Cython to generate the glue, then
hand-maintain that glue? I think that is a really, rally bad idea --
generated code is ugly and hard to maintain, it is not designed to be
human-readable, and we wouldn't get the advantages of bug-fixes
further development in Cython.

So -- if you use Cython, you want to keep using, and theat means the
Cython source IS the source. I agree that it's a good idea to ship the
generated code as well, so that no one that is not touching the Cython
has to generate. Other than the slight mess from generated files
showing up in diffs, etc, this really works just fine.

Any reason MPL couldn't continue with EXACTLY the same approach now
used with C_XX -- it generates code as well, yes?

Michael Droettboom wrote:

> For the PNG extension specifically, it was creating callbacks that can
> be called from C and the setjmp magic that libpng requires.  I think
> it's possible to do it, but I was surprised at how non-obvious those
> pieces of Cython were.  I was really hoping by creating this experiment
> that a Cython expert would step up and show the way ;)

Did you not get the support you expected from the cython list? Anyway,
there's no reason you can't keep stuff in C that's easier in C (or did
C_XX make this easy?). I think making basic callbacks is actually
pretty straightforward, but In don't know about the setjmp magic (I
have no idea hat that means!).

> The Agg backend has more C++-specific challenges, particularly
> instantiating very complex template expressions --

I'm guessing you'd do the complex template stuff in C++ -- and let
Cython see a more traditional static API.

> but some of that complexity could be reduced by using Numpy arrays in place of the
> image buffer types that each of them contain

OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython.

> The Cython version isn't that much shorter than the C++ version.

I think some things make sense to keep in C++, though I do see a fair
bit of calls (in the C++) to the  python API -- I'm surprised there
isn't much code advantage, but anyway, the goal is more robust/easier
to maintain, which may correlate with code-size, but not completely.

> These declarations aren't exact matches to what one would find in the header file(s) >because Cython doesn't support exact-width data types etc.

It does support the C99 fixed-width integer types:

from libc.stdint cimport int16_t, int32_t,

Or are you talking about something else?

> I'm not sure why some of the Python/C API calls I needed were not defined in Cython's include wrappers.

I suspect that's an oversight -- for the most part, stuff has been
added as it's needed.

One other note -- from a quick glance at your Cython code, it looks
like you did almost everything is Cython-that-will-compile-to-pure-C
-- i.e. a lot of calls to the CPython API. But the whole point of
Cython is that it makes those calls for you. So you can do type
checking, and switching on types, and calling np.asarray(), etc, etc,
etc, in Python, without calling the CPython api yourself. I know
nothing of the PNG API, and am pretty week on the CPython API (and C
for that matter), but I it's likely that the Cython code you've
written could be much simplified.

> Once things compiled, due to my own mistake, calling the function segfaulted.  Debugging
> that segfault in gdb required, again, wading through the generated code.  Using gdb on
> hand-written code is *much* nicer.

for sure -- there is a plug-in/add-on/something for using gdb on
Cython code -- I haven't used it but I imagine it would help.

Ian Thomas wrote:	
> I have never used Cython, but to me the code looks like an inelegant combination of
> Python,C/C++ and some Cython-specific stuff.

well, yes, it is that!

> I can see the advantage of this approach for small sections of code, but I have strong > reservations about using it for complicated modules that have extensive use of
> templated code and/or Standard Template Library collections (mpl has examples of
> both of these).

So far, I've found that Cython is good for:
 - The simple stuff -- basic loops through numpy arrays, etc.
 - wrapping/calling more complex C or C++
    -- essentially handling the reference counting and python type
packing/unpacking of python types.

So we find we do write some shim code in C++ to make the access to the
core libraries Cython-friendly. We haven't dealt with complex
templating, etc, but I'd guess if we did I'd keep that in C++. And
since the resulting actual glue code is pretty simple, it makes the
debugging easier.

> Maybe rather than asking "if we switched to using Cython, would more participate", I
> should be asking "among those that can participate in removing the PyCXX
> dependency, what is the preferred approach?"

I don't know that we need a one-sieze fits all approach -- perhaps
some bits make the most sense to move to plain old C/C++, and some to
Cython, either because of the nature of the code itself, or because of
the experience/preference of the person that takes ownership of a
particular problem.

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michael D. <md...@st...> - 2012-12-03 19:59:16

On 12/03/2012 01:12 PM, Chris Barker - NOAA Federal wrote:
> This argues against making the Cython source code a part of the matplotlib codebase.
>
> huh? are you suggesting that we use Cython to generate the glue, then
> hand-maintain that glue? I think that is a really, rally bad idea --
> generated code is ugly and hard to maintain, it is not designed to be
> human-readable, and we wouldn't get the advantages of bug-fixes
> further development in Cython.
>
> So -- if you use Cython, you want to keep using, and theat means the
> Cython source IS the source. I agree that it's a good idea to ship the
> generated code as well, so that no one that is not touching the Cython
> has to generate. Other than the slight mess from generated files
> showing up in diffs, etc, this really works just fine.

I agree with this approach.
>
> Any reason MPL couldn't continue with EXACTLY the same approach now
> used with C_XX -- it generates code as well, yes?

No -- PyCXX is just C++.  Its killer feature is that it provides a 
fairly thin layer around the Python C/API that does implicit reference 
counting through the use of C++ constructors and destructors.  I 
actually think it's a really elegant approach to the problem.  The 
downside we're running into is that it's barely maintained, so using 
vanilla upstream as provided by packagers is not viable.  An alternative 
to all of this discussion is to fork PyCXX and release as needed.  The 
maintenance required is primarily when new versions of Python are 
released, so it wouldn't necessarily be a huge undertaking.  However, I 
know some are reluctant to use a relatively unused tool.

>
> Michael Droettboom wrote:
>
>> For the PNG extension specifically, it was creating callbacks that can
>> be called from C and the setjmp magic that libpng requires.  I think
>> it's possible to do it, but I was surprised at how non-obvious those
>> pieces of Cython were.  I was really hoping by creating this experiment
>> that a Cython expert would step up and show the way ;)
> Did you not get the support you expected from the cython list? Anyway,
> there's no reason you can't keep stuff in C that's easier in C (or did
> C_XX make this easy?).

The support has been adequate, but the solutions aren't always an 
improvement over raw Python/C API (not just in terms of lines of code 
but in terms of the number of layers of abstraction and "magic" between 
the coder and what actually happens).

>   I think making basic callbacks is actually
> pretty straightforward, but In don't know about the setjmp magic (I
> have no idea hat that means!).

It turned out to be not terrible once I figured out the correct incantation.

>
>> The Agg backend has more C++-specific challenges, particularly
>> instantiating very complex template expressions --
> I'm guessing you'd do the complex template stuff in C++ -- and let
> Cython see a more traditional static API.

Agreed -- I'm really only considering replacing the glue code provided 
by PyCXX, not the whole thing.  matplotlib's C/C++ code has been around 
for a while and has been fairly vetted at this point, so I don't think a 
wholesale rewrite makes sense.

>
>> but some of that complexity could be reduced by using Numpy arrays in place of the
>> image buffer types that each of them contain
> OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython.

Sure, but when we return to Python, they should be Numpy arrays which 
have more methods etc. -- or am I missing something?

>> The Cython version isn't that much shorter than the C++ version.
> I think some things make sense to keep in C++, though I do see a fair
> bit of calls (in the C++) to the  python API -- I'm surprised there
> isn't much code advantage, but anyway, the goal is more robust/easier
> to maintain, which may correlate with code-size, but not completely.



>
>> These declarations aren't exact matches to what one would find in the header file(s) >because Cython doesn't support exact-width data types etc.
> It does support the C99 fixed-width integer types:
>
> from libc.stdint cimport int16_t, int32_t,
>
> Or are you talking about something else?

The problem is that Cython can't actually read the C header, so there 
are types in libpng, for example, that we don't actually know the size 
of.  They are different on different platforms.  In C, you just include 
the header.  In Cython, I'd have to determine the size of the types in a 
pre-compilation step, or manually determine their sizes and hard code 
them for the platforms we care about.

>
>> I'm not sure why some of the Python/C API calls I needed were not defined in Cython's include wrappers.
> I suspect that's an oversight -- for the most part, stuff has been
> added as it's needed.
>
> One other note -- from a quick glance at your Cython code, it looks
> like you did almost everything is Cython-that-will-compile-to-pure-C
> -- i.e. a lot of calls to the CPython API. But the whole point of
> Cython is that it makes those calls for you. So you can do type
> checking, and switching on types, and calling np.asarray(), etc, etc,
> etc, in Python, without calling the CPython api yourself. I know
> nothing of the PNG API, and am pretty week on the CPython API (and C
> for that matter), but I it's likely that the Cython code you've
> written could be much simplified.

It would at least make this a more fair comparison to have the Cython 
code as Cythonic as possible.  However, I couldn't find any ways around 
using these particular APIs -- other than the Numpy stuff which probably 
does have a more elegant solution in the form of Cython arrays and 
memory views.

>
>
>> Once things compiled, due to my own mistake, calling the function segfaulted.  Debugging
>> that segfault in gdb required, again, wading through the generated code.  Using gdb on
>> hand-written code is *much* nicer.
> for sure -- there is a plug-in/add-on/something for using gdb on
> Cython code -- I haven't used it but I imagine it would help.

Ah.  I wasn't aware of that.  Thanks for pointing that out.  I have the 
CPython plug-in for gdb and it's great.

>
> Ian Thomas wrote:	
>> I have never used Cython, but to me the code looks like an inelegant combination of
>> Python,C/C++ and some Cython-specific stuff.
> well, yes, it is that!
>
>> I can see the advantage of this approach for small sections of code, but I have strong > reservations about using it for complicated modules that have extensive use of
>> templated code and/or Standard Template Library collections (mpl has examples of
>> both of these).
> So far, I've found that Cython is good for:
>   - The simple stuff -- basic loops through numpy arrays, etc.
>   - wrapping/calling more complex C or C++
>      -- essentially handling the reference counting and python type
> packing/unpacking of python types.
>
> So we find we do write some shim code in C++ to make the access to the
> core libraries Cython-friendly. We haven't dealt with complex
> templating, etc, but I'd guess if we did I'd keep that in C++. And
> since the resulting actual glue code is pretty simple, it makes the
> debugging easier.
>
>> Maybe rather than asking "if we switched to using Cython, would more participate", I
>> should be asking "among those that can participate in removing the PyCXX
>> dependency, what is the preferred approach?"
> I don't know that we need a one-sieze fits all approach -- perhaps
> some bits make the most sense to move to plain old C/C++, and some to
> Cython, either because of the nature of the code itself, or because of
> the experience/preference of the person that takes ownership of a
> particular problem.
>
True.  We do have two categories of stuff using PyCXX in matplotlib: 
things that (primarily) wrap third-party C/C++ libraries, and things 
that are actually doing algorithmic heavy lifting.  It's quite possible 
we don't want the same solution for all.

Cheers,
Mike

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Chris B. - N. F. <chr...@no...> - 2012-12-03 20:25:51

On Mon, Dec 3, 2012 at 11:59 AM, Michael Droettboom <md...@st...> wrote:

>>> but some of that complexity could be reduced by using Numpy arrays in place of the
>>> image buffer types that each of them contain
>> OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython.
>
> Sure, but when we return to Python, they should be Numpy arrays which
> have more methods etc. -- or am I missing something?

Cython makes it really easy to switch between ndarrays and
memoryviews, etc -- it's a question of what you want to work with in
your code, so you have write a function that takes numpy arrays and
returns numpy arrays, but uses a memoryview internally (and passes to
C code that way). But I'm not an expert on this, I'mve found that I'm
either doing simplestuff where using numpy arrays directly works fine,
or passing the pointer to the data array off to C:

def a_function_to_call_C( cnp.ndarray[double, ndim=2, mode="c" ] in_array ):
    """
    calls the_c_function, altering the array in-place
    """
     cdef int m, n
     m = in_array.size[0]
     m = in_array.size[1]
     the_c_function( &in_array[0], m, n )

>> It does support the C99 fixed-width integer types:
>> from libc.stdint cimport int16_t, int32_t,
>>
> The problem is that Cython can't actually read the C header,

yeah, this is a pity. There has been some work on auto-generating
Cython from C headers, though nothing mature.  For my work, I've been
considering writing some simple pyd-generating code, just to make sure
my data types are inline with the C++ as it may change.

> so there
> are types in libpng, for example, that we don't actually know the size
> of.  They are different on different platforms.  In C, you just include
> the header.  In Cython, I'd have to determine the size of the types in a
> pre-compilation step, or manually determine their sizes and hard code
> them for the platforms we care about.

yeah -- this is a tricky problem, however, I think you can follow what
you'd do in C -- i.e. presumable the header define their own data
types: png_short or whatever. The actually definition is filled in by
the pre-processor. So I wonder if you can declare those types  in
Cython, then have it write C code that uses those types, and it all
gets cleared up at compile time -- maybe. The key is that when you
declare stuff in Cython, that declaration is used to determine how to
write the C code, I don't think the declarations themselves are
translated.

> It would at least make this a more fair comparison to have the Cython
> code as Cythonic as possible.  However, I couldn't find any ways around
> using these particular APIs -- other than the Numpy stuff which probably
> does have a more elegant solution in the form of Cython arrays and
> memory views.

yup -- that's what I noticed right away -- I"m note sure it there is
easier handling of file handles.

> True.  We do have two categories of stuff using PyCXX in matplotlib:
> things that (primarily) wrap third-party C/C++ libraries, and things
> that are actually doing algorithmic heavy lifting.  It's quite possible
> we don't want the same solution for all.

And I'm not sure the wrappers all need to be written the same way, either.

-Chris
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Nathaniel S. <nj...@po...> - 2012-12-03 22:53:53

On Mon, Dec 3, 2012 at 8:24 PM, Chris Barker - NOAA Federal
<chr...@no...> wrote:
> On Mon, Dec 3, 2012 at 11:59 AM, Michael Droettboom <md...@st...> wrote:
>> so there
>> are types in libpng, for example, that we don't actually know the size
>> of.  They are different on different platforms.  In C, you just include
>> the header.  In Cython, I'd have to determine the size of the types in a
>> pre-compilation step, or manually determine their sizes and hard code
>> them for the platforms we care about.
>
> yeah -- this is a tricky problem, however, I think you can follow what
> you'd do in C -- i.e. presumable the header define their own data
> types: png_short or whatever. The actually definition is filled in by
> the pre-processor. So I wonder if you can declare those types  in
> Cython, then have it write C code that uses those types, and it all
> gets cleared up at compile time -- maybe. The key is that when you
> declare stuff in Cython, that declaration is used to determine how to
> write the C code, I don't think the declarations themselves are
> translated.

Yeah, this isn't an issue in Cython, it's a totally standard thing
(though perhaps not well documented). When you write

  cdef extern from "png.h":
      ctypedef int png_short

or whatever, what you are saying is "the C compiler knows about a type
called png_short, which acts in an int-like fashion, so Cython, please
use your int rules when dealing with it". So this means that Cython
will know that if you return a png_short from a python function, it
should insert a call to PyInt_FromLong (or maybe PyInt_FromSsize_t? --
cython worries about these things so I don't have to). But Cython only
takes care of the Python<->C interface. It will leave the C compiler
to actually allocate the appropriate memory for png_shorts, perform C
arithmetic, coerce a png_short into a 'long' when necessary, etc.

It's kind of mind-bending to wrap your head around, and it definitely
does help to spend some time reading the C code that Cython spits out
to understand how the mapping works (it's both more and less magic
than it looks -- Python stuff gets carefully expanded, C stuff goes
through almost verbatim), but the end result works amazingly well.

>> It would at least make this a more fair comparison to have the Cython
>> code as Cythonic as possible.  However, I couldn't find any ways around
>> using these particular APIs -- other than the Numpy stuff which probably
>> does have a more elegant solution in the form of Cython arrays and
>> memory views.
>
> yup -- that's what I noticed right away -- I"m note sure it there is
> easier handling of file handles.

For the file handle, I would just write

  cdef FILE *fp = fdopen(file_obj.fileno(), "w")

and be done with it. This will work with any version of Python etc.

-n

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Chris B. - N. F. <chr...@no...> - 2012-12-03 23:51:10

On Mon, Dec 3, 2012 at 2:21 PM, Nathaniel Smith <nj...@po...> wrote:
> For the file handle, I would just write
>
>   cdef FILE *fp = fdopen(file_obj.fileno(), "w")
>
> and be done with it. This will work with any version of Python etc.

yeah, that makes sense -- though what if you want to be able to
read_to/write_from a file that is already open, and in the middle of
the file somewhere -- would that work?

I just posted a question to the Cython list, and indeed, it looks like
there is no easy answer to the file issue.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Chris B. - N. F. <chr...@no...> - 2012-12-04 00:01:34

On Mon, Dec 3, 2012 at 12:24 PM, Chris Barker - NOAA Federal
<chr...@no...> wrote:

>>>> but some of that complexity could be reduced by using Numpy arrays in place

>> It would at least make this a more fair comparison to have the Cython
>> code as Cythonic as possible.  However, I couldn't find any ways around
>> using these particular APIs -- other than the Numpy stuff which probably
>> does have a more elegant solution in the form of Cython arrays and
>> memory views.

OK -- so I poked at it, and this is my (very untested) version of
write_png (I left out the py3 stuff, though it does look like it may
be required for file handling...

Letting Cython unpack the numpy array is the real win. Maybe having it
this simple won't work for MPL, but this is what my code tends to look
like.


def write_png(cnp.ndarray[cnp.uint32, ndim=2, mode="c" ] buff not None,
              file_obj,
              double dpi=0.0):

    cdef png_uint_32 width  = buff.size[0]
    cdef png_uint_32 height = buff.size[1]

    if PyFile_CheckExact(file_obj):
        cdef FILE *fp = fdopen(file_obj.fileno(), "w")
        fp = PyFile_AsFile(file_obj)
        write_png_c(buff[0,0], width, height, fp,
                    NULL, NULL, NULL, dpi)
        return
    else:
        raise TypeError("write_png only works with real PyFileObject")


NOTE: that could be:

cnp.ndarray[cnp.uint8, ndim=3, mode="c" ]

I'm not sure how MPL stores image buffers.

or you could accept any object, then call:

np.view()

-Chris



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michael D. <md...@st...> - 2012-12-04 13:37:32

On 12/03/2012 07:00 PM, Chris Barker - NOAA Federal wrote:
> On Mon, Dec 3, 2012 at 12:24 PM, Chris Barker - NOAA Federal
> <chr...@no...> wrote:
>
>>>>> but some of that complexity could be reduced by using Numpy arrays in place
>>> It would at least make this a more fair comparison to have the Cython
>>> code as Cythonic as possible.  However, I couldn't find any ways around
>>> using these particular APIs -- other than the Numpy stuff which probably
>>> does have a more elegant solution in the form of Cython arrays and
>>> memory views.
> OK -- so I poked at it, and this is my (very untested) version of
> write_png (I left out the py3 stuff, though it does look like it may
> be required for file handling...
>
> Letting Cython unpack the numpy array is the real win. Maybe having it
> this simple won't work for MPL, but this is what my code tends to look
> like.
>
>
> def write_png(cnp.ndarray[cnp.uint32, ndim=2, mode="c" ] buff not None,
>                file_obj,
>                double dpi=0.0):
>
>      cdef png_uint_32 width  = buff.size[0]
>      cdef png_uint_32 height = buff.size[1]
>
>      if PyFile_CheckExact(file_obj):
>          cdef FILE *fp = fdopen(file_obj.fileno(), "w")
>          fp = PyFile_AsFile(file_obj)
>          write_png_c(buff[0,0], width, height, fp,
>                      NULL, NULL, NULL, dpi)
>          return
>      else:
>          raise TypeError("write_png only works with real PyFileObject")
>
>
> NOTE: that could be:
>
> cnp.ndarray[cnp.uint8, ndim=3, mode="c" ]
>
> I'm not sure how MPL stores image buffers.
>
> or you could accept any object, then call:
>
> np.view()
The buffer comes in both ways, so the latter solution seems like the 
thing to do.

Thanks for working this through.  This sort of thing is very helpful.

We can also, of course, maintain the existing code that allows writing 
to an arbitrary file-like object, but this fast path (where it is a 
"real" file) is very important.  It's significantly faster than calling 
methods on Python objects.

Mike

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Nathaniel S. <nj...@po...> - 2012-12-04 00:16:39

On Mon, Dec 3, 2012 at 11:50 PM, Chris Barker - NOAA Federal
<chr...@no...> wrote:
> On Mon, Dec 3, 2012 at 2:21 PM, Nathaniel Smith <nj...@po...> wrote:
>> For the file handle, I would just write
>>
>>   cdef FILE *fp = fdopen(file_obj.fileno(), "w")
>>
>> and be done with it. This will work with any version of Python etc.
>
> yeah, that makes sense -- though what if you want to be able to
> read_to/write_from a file that is already open, and in the middle of
> the file somewhere -- would that work?
>
> I just posted a question to the Cython list, and indeed, it looks like
> there is no easy answer to the file issue.

Yeah, this is a general problem with the Python file API, trying to
hook it up to stdio is not at all an easy thing. A better version of
this code would skip that altogether like:

cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
    fobj = <object>png_get_io_ptr(s)
    pydata = PyString_FromStringAndSize(data, count)
    fobj.write(pydata)

cdef void flush_pyfile(png_structp s):
    # Not sure if this is even needed
    fobj = <object>png_get_io_ptr(s)
    fobj.flush()

# in write_png:
write_png_c(<png_byte*>pix_buffer, width, height,
  NULL, <void*>file_obj, write_to_pyfile, flush_pyfile, dpi)

But this is a separate issue :-) (and needs further fiddling to make
exception handling work).

Or if you're only going to work on real OS-level file objects anyway,
you might as well just accept a filename as a string and fopen() it
locally. Having Python do the fopen just makes your life harder for no
reason.

-n

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michael D. <md...@st...> - 2012-12-04 13:43:58

On 12/03/2012 07:16 PM, Nathaniel Smith wrote:
> On Mon, Dec 3, 2012 at 11:50 PM, Chris Barker - NOAA Federal
> <chr...@no...> wrote:
>> On Mon, Dec 3, 2012 at 2:21 PM, Nathaniel Smith <nj...@po...> wrote:
>>> For the file handle, I would just write
>>>
>>>    cdef FILE *fp = fdopen(file_obj.fileno(), "w")
>>>
>>> and be done with it. This will work with any version of Python etc.
>> yeah, that makes sense -- though what if you want to be able to
>> read_to/write_from a file that is already open, and in the middle of
>> the file somewhere -- would that work?
>>
>> I just posted a question to the Cython list, and indeed, it looks like
>> there is no easy answer to the file issue.
> Yeah, this is a general problem with the Python file API, trying to
> hook it up to stdio is not at all an easy thing. A better version of
> this code would skip that altogether like:
>
> cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
>      fobj = <object>png_get_io_ptr(s)
>      pydata = PyString_FromStringAndSize(data, count)
>      fobj.write(pydata)
>
> cdef void flush_pyfile(png_structp s):
>      # Not sure if this is even needed
>      fobj = <object>png_get_io_ptr(s)
>      fobj.flush()
>
> # in write_png:
> write_png_c(<png_byte*>pix_buffer, width, height,
>    NULL, <void*>file_obj, write_to_pyfile, flush_pyfile, dpi)

This is what my original version already does in the event that the 
file_obj is not a "real" file.  In practice, you need to support both 
methods -- the callback approach is many times slower than writing 
directly to a regular old FILE object, because there is overhead both at 
the libpng and Python level, and there's no way to select a good buffer 
size.

>
> But this is a separate issue :-) (and needs further fiddling to make
> exception handling work).
>
> Or if you're only going to work on real OS-level file objects anyway,
> you might as well just accept a filename as a string and fopen() it
> locally. Having Python do the fopen just makes your life harder for no
> reason.
There's actually a very good reason.  It is difficult to deal with 
Unicode in file paths from C in a portable way.  On Windows, for 
example, if the user's name contains non-ascii characters, you can't 
write to the home directory using fopen, etc.  It's doable with some 
care by using platform-specific C APIs etc., but CPython has already 
done all of the hard work for us, so it's easiest just to leverage that 
by opening the file from Python.

Mike

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Chris B. - N. F. <chr...@no...> - 2012-12-04 01:02:28

On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <nj...@po...> wrote:

> Yeah, this is a general problem with the Python file API, trying to
> hook it up to stdio is not at all an easy thing. A better version of
> this code would skip that altogether like:
>
> cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
>     fobj = <object>png_get_io_ptr(s)
>     pydata = PyString_FromStringAndSize(data, count)
>     fobj.write(pydata)

Good point -- not at all Cython-specific, but do you need libpng (or
whatever) to write to the file? can you just get a buffer with the
encoded data and write it on the Python side? Particularly if the user
wants to pass in an open file object. This might be a better API for
folks that might want stream an image right through a web app, too.

As a lot of Python APIs take either a file name or a file-like object,
perhaps it would make sense to push that distinction down to the
Cython level:
  -- if it's a filename, open it with raw C
  -- if it's a file-like object, have libpng write to a buffer (bytes
object) , and pass that to the file-like object in Python

anyway, not really a Cython issue, but that second object sure would
be easy on Cython....

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michael D. <md...@st...> - 2012-12-04 13:46:02

On 12/03/2012 08:01 PM, Chris Barker - NOAA Federal wrote:
> On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <nj...@po...> wrote:
>
>> Yeah, this is a general problem with the Python file API, trying to
>> hook it up to stdio is not at all an easy thing. A better version of
>> this code would skip that altogether like:
>>
>> cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
>>      fobj = <object>png_get_io_ptr(s)
>>      pydata = PyString_FromStringAndSize(data, count)
>>      fobj.write(pydata)
> Good point -- not at all Cython-specific, but do you need libpng (or
> whatever) to write to the file? can you just get a buffer with the
> encoded data and write it on the Python side? Particularly if the user
> wants to pass in an open file object. This might be a better API for
> folks that might want stream an image right through a web app, too.
You need to support both: raw C FILE objects for speed, and writing to a 
Python file-like object for flexibility.  The code in master already 
does this (albeit with PyCXX), and the code on my "No CXX" branch does 
this as well with Cython.
>
> As a lot of Python APIs take either a file name or a file-like object,
> perhaps it would make sense to push that distinction down to the
> Cython level:
>    -- if it's a filename, open it with raw C

Unfortunately, as stated in detail in my last e-mail, that doesn't work 
with Unicode paths.

>    -- if it's a file-like object, have libpng write to a buffer (bytes
> object) , and pass that to the file-like object in Python

libpng does one better and allows us to stream directly to a callback 
which can then write to a Python object.  This prevents double 
allocation of memory.

>
> anyway, not really a Cython issue, but that second object sure would
> be easy on Cython....
>
Yeah -- once I figured out how to make a real C callback function from 
Cython, the contents of the callback function itself is pretty easy to 
write.

Mike

Re: [matplotlib-devel] Experiments in removing/replacing PyCXX

From: Michael D. <md...@st...> - 2012-12-04 13:52:47

Also -- this feedback is really helpful when writing some comments in 
the wrappers as to why certain things are the way they are...  I'll make 
sure to include rationales for raw file fast path and the need to open 
the files on the Python side.

Mike

On 12/04/2012 08:45 AM, Michael Droettboom wrote:
> On 12/03/2012 08:01 PM, Chris Barker - NOAA Federal wrote:
>> On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <nj...@po...> wrote:
>>
>>> Yeah, this is a general problem with the Python file API, trying to
>>> hook it up to stdio is not at all an easy thing. A better version of
>>> this code would skip that altogether like:
>>>
>>> cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count):
>>>       fobj = <object>png_get_io_ptr(s)
>>>       pydata = PyString_FromStringAndSize(data, count)
>>>       fobj.write(pydata)
>> Good point -- not at all Cython-specific, but do you need libpng (or
>> whatever) to write to the file? can you just get a buffer with the
>> encoded data and write it on the Python side? Particularly if the user
>> wants to pass in an open file object. This might be a better API for
>> folks that might want stream an image right through a web app, too.
> You need to support both: raw C FILE objects for speed, and writing to a
> Python file-like object for flexibility.  The code in master already
> does this (albeit with PyCXX), and the code on my "No CXX" branch does
> this as well with Cython.
>> As a lot of Python APIs take either a file name or a file-like object,
>> perhaps it would make sense to push that distinction down to the
>> Cython level:
>>     -- if it's a filename, open it with raw C
> Unfortunately, as stated in detail in my last e-mail, that doesn't work
> with Unicode paths.
>
>>     -- if it's a file-like object, have libpng write to a buffer (bytes
>> object) , and pass that to the file-like object in Python
> libpng does one better and allows us to stream directly to a callback
> which can then write to a Python object.  This prevents double
> allocation of memory.
>
>> anyway, not really a Cython issue, but that second object sure would
>> be easy on Cython....
>>
> Yeah -- once I figured out how to make a real C callback function from
> Cython, the contents of the callback function itself is pretty easy to
> write.
>
> Mike
>
> ------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d
> _______________________________________________
> Matplotlib-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel

1 2 > >> (Page 1 of 2)