From: Michael D. <md...@st...> - 2012-11-29 17:00:09
|
Given the slow pace of development on PyCXX, I know it has been the desire of some here to remove our dependency on it. I thought a helpful starting point to evaluate the alternatives would be to restructure one of our extensions to not use PyCXX anymore. I've taken the PNG extension, which is reasonably straightforward in that it doesn't define any custom types, but does have some low level C-wrapping challenges, and separated out the Python-specific parts from the libpng-specific parts. The Python-specific parts are now written using the "raw" Python C/API. The other part still uses C++ (not C) and does throw exceptions, but doesn't use classes or templates or anything else that can be difficult to wrap. All of this is on my "no_cxx" branch. Now here's the challenge: can we do better than this using any of the available wrapping tools? Cython, SWIG, Boost.Python etc.? I've not had much luck with Cython for this kind of thing in the past, but I know it is popular. Perhaps someone with more Cython experience would want to take a crack at this and then we could have something concrete to compare... Cheers, Mike |
From: Michiel de H. <mjl...@ya...> - 2012-11-30 01:47:49
|
Hi, The Mac OS X backend is entirely written in C (with some Objective-C elements where necessary). AFAICT, this is the largest C/C++ code in matplotlib. This backend was written from scratch without using Cython, SWIG, or Boost.Python. From my experience, I would prefer to write such extensions in C directly rather than relying on Cython, SWIG, or Boost.Python, because those approaches would lead to another dependency (for developers at least), and requires developers to learn how to code in them. Which may not be very hard, but we may as well avoid that if possible. I'd be happy to help out with the conversion of the other extensions from CXX to C. I would need some help though to use github appropriately. Best, -Michiel. --- On Thu, 11/29/12, Michael Droettboom <md...@st...> wrote: > From: Michael Droettboom <md...@st...> > Subject: [matplotlib-devel] Experiments in removing/replacing PyCXX > To: "mat...@li..." <mat...@li...> > Date: Thursday, November 29, 2012, 11:59 AM > Given the slow pace of development on > PyCXX, I know it has been the > desire of some here to remove our dependency on it. > > I thought a helpful starting point to evaluate the > alternatives would be > to restructure one of our extensions to not use PyCXX > anymore. I've > taken the PNG extension, which is reasonably straightforward > in that it > doesn't define any custom types, but does have some low > level C-wrapping > challenges, and separated out the Python-specific parts from > the > libpng-specific parts. The Python-specific parts are > now written using > the "raw" Python C/API. The other part still uses C++ (not > C) and does > throw exceptions, but doesn't use classes or templates or > anything else > that can be difficult to wrap. All of this is on my > "no_cxx" branch. > > Now here's the challenge: can we do better than this using > any of the > available wrapping tools? Cython, SWIG, Boost.Python > etc.? I've not had > much luck with Cython for this kind of thing in the past, > but I know it > is popular. Perhaps someone with more Cython > experience would want to > take a crack at this and then we could have something > concrete to compare... > > Cheers, > Mike > > ------------------------------------------------------------------------------ > Keep yourself connected to Go Parallel: > VERIFY Test and improve your parallel project with help from > experts > and peers. http://goparallel.sourceforge.net > _______________________________________________ > Matplotlib-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel > |
From: Michael D. <md...@st...> - 2012-11-30 14:07:09
|
Thanks, Michiel. If you read between the lines of what I was saying, that is basically where I fall as well. There seems to be a lot of desire to use Cython to make the code more accessible, however, and I'm willing to consider it if it can be shown to be superior to the raw C/API for this task -- I'm not sure it is -- I always seem to end up with things that are more lines of code with more obscure workarounds than just coding in C directly. Cheers, Mike On 11/29/2012 08:47 PM, Michiel de Hoon wrote: > Hi, > > The Mac OS X backend is entirely written in C (with some Objective-C elements where necessary). AFAICT, this is the largest C/C++ code in matplotlib. This backend was written from scratch without using Cython, SWIG, or Boost.Python. From my experience, I would prefer to write such extensions in C directly rather than relying on Cython, SWIG, or Boost.Python, because those approaches would lead to another dependency (for developers at least), and requires developers to learn how to code in them. Which may not be very hard, but we may as well avoid that if possible. > > I'd be happy to help out with the conversion of the other extensions from CXX to C. I would need some help though to use github appropriately. > > Best, > -Michiel. > > > --- On Thu, 11/29/12, Michael Droettboom <md...@st...> wrote: > >> From: Michael Droettboom <md...@st...> >> Subject: [matplotlib-devel] Experiments in removing/replacing PyCXX >> To: "mat...@li..." <mat...@li...> >> Date: Thursday, November 29, 2012, 11:59 AM >> Given the slow pace of development on >> PyCXX, I know it has been the >> desire of some here to remove our dependency on it. >> >> I thought a helpful starting point to evaluate the >> alternatives would be >> to restructure one of our extensions to not use PyCXX >> anymore. I've >> taken the PNG extension, which is reasonably straightforward >> in that it >> doesn't define any custom types, but does have some low >> level C-wrapping >> challenges, and separated out the Python-specific parts from >> the >> libpng-specific parts. The Python-specific parts are >> now written using >> the "raw" Python C/API. The other part still uses C++ (not >> C) and does >> throw exceptions, but doesn't use classes or templates or >> anything else >> that can be difficult to wrap. All of this is on my >> "no_cxx" branch. >> >> Now here's the challenge: can we do better than this using >> any of the >> available wrapping tools? Cython, SWIG, Boost.Python >> etc.? I've not had >> much luck with Cython for this kind of thing in the past, >> but I know it >> is popular. Perhaps someone with more Cython >> experience would want to >> take a crack at this and then we could have something >> concrete to compare... >> >> Cheers, >> Mike >> >> ------------------------------------------------------------------------------ >> Keep yourself connected to Go Parallel: >> VERIFY Test and improve your parallel project with help from >> experts >> and peers. http://goparallel.sourceforge.net >> _______________________________________________ >> Matplotlib-devel mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/matplotlib-devel >> |
From: Chris B. - N. F. <chr...@no...> - 2012-11-30 17:33:16
|
On Fri, Nov 30, 2012 at 6:06 AM, Michael Droettboom <md...@st...> wrote: > If you read between the lines of what I was saying, that is basically > where I fall as well. There seems to be a lot of desire to use Cython > to make the code more accessible, I'll add a beat to that drum -- I'm a big Cython fan. > however, and I'm willing to consider > it if it can be shown to be superior to the raw C/API for this task -- I think there is NO QUESTION that Cython is superior to the C/API -- why would you want to deal with the reference counting, etc yourself? Cython can handle the boiler plate code for you very cleanly an elegantly. Something to keep in mind about Cython: It can be used in multiple ways: 1) Add static typing to what is essentially Python code to get better performance -- this may be what you mean by the "more accesible" part. A great use, but maybe, maybe, maybe not best for the core bits of MPL. 2) Calling C/C++ code -- Cython is s great way to call C/C++ code -- it can handle the packing and unpacking of python types, reference counting, etc. for you, so much like using the C API, but a lot less tricky boiler plate code to write. (2) is the use case that I'm arguing is NO QUESTION a better option than the C API. Compared to SWIG, SIP (and I assume C_XX), the downside is that there is no auto-generation of wrappers (at least nothing mature). However, for the MPL case, we're not trying to wrap a large existing library, but rather particular code that is often written for MPL specifically, so hand-writing the Cython is a fine option. So why not Ctypes, or??? I think the real strength of Cython in wrapping C code is that you can write a "thick" wrapper in an almost_python language. So if you want to vectorize a C function, for instance, you can write that bit in Cython very easily (and Cython's built-in understanding of numpy array is very helpful here). When you use ctypes, you need to write that in pure Python -- easy enough, but probably not very performant. With SWIG, etc, you end up writing a fair bi tof C (or SWIG) code to handle the thicker bits of the wrapper -- so you're dealing with the raw CPython API, and , well, C. Cython really is an easier option. I've found that for stuf that is less than very small (i.e. one or two loops through an array), writing the core code in native C or C++ can be easier, you know for sure you're not accidentally making expensive Python calls, etc. but using Cython to call it is still very helpful. > I'm not sure it is -- I always seem to end up with things that are more > lines of code with more obscure workarounds than just coding in C directly. Exactly -- but I don't think that applies to the CPython-API bits, but rather the core code -- so keep that in C. In summary, I guess what I think is the power of Cython is the flexibility in where you draw the line between Python, Cython, and C -- you can pass pure Python through Cython, or you can do almost nothing with it but call a C function, and eveything in between. > From my experience, I would prefer to write such extensions in C directly rather > than relying on Cython, SWIG, or Boost.Python, because those approaches would > lead to another dependency (for developers at least), The dependency is pretty easy to deal with compared to the many others in MPL. > and requires developers to > learn how to code in them. Which may not be very hard, but we may as well avoid > that if possible. Here's where I disagree -- if we go pure C and C-API developers need to know the Python C-API -- that is actually a pretty big deal, and hard to get right. Knowing enough Cython to call some C code is a smaller lift for sure. Anyway, I saw give it a shot -- I suspect you'll like it. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Michiel de H. <mjl...@ya...> - 2012-11-30 23:41:04
|
One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython? Best, -Michiel. --- On Fri, 11/30/12, Chris Barker - NOAA Federal <chr...@no...> wrote: > From: Chris Barker - NOAA Federal <chr...@no...> > Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX > To: "Michael Droettboom" <md...@st...> > Cc: "Michiel de Hoon" <mjl...@ya...>, "mat...@li..." <mat...@li...> > Date: Friday, November 30, 2012, 12:32 PM > On Fri, Nov 30, 2012 at 6:06 AM, > Michael Droettboom <md...@st...> > wrote: > > > If you read between the lines of what I was saying, > that is basically > > where I fall as well. There seems to be a lot of > desire to use Cython > > to make the code more accessible, > > I'll add a beat to that drum -- I'm a big Cython fan. > > > however, and I'm willing to consider > > it if it can be shown to be superior to the raw C/API > for this task -- > > I think there is NO QUESTION that Cython is superior to the > C/API -- > why would you want to deal with the reference counting, etc > yourself? > Cython can handle the boiler plate code for you very cleanly > an > elegantly. > > Something to keep in mind about Cython: > > It can be used in multiple ways: > > 1) Add static typing to what is essentially Python code to > get better > performance -- this may be what you mean by the "more > accesible" part. > A great use, but maybe, maybe, maybe not best for the core > bits of > MPL. > > 2) Calling C/C++ code -- Cython is s great way to call C/C++ > code -- > it can handle the packing and unpacking of python types, > reference > counting, etc. for you, so much like using the C API, but a > lot less > tricky boiler plate code to write. > > (2) is the use case that I'm arguing is NO QUESTION a better > option > than the C API. > > Compared to SWIG, SIP (and I assume C_XX), the downside is > that there > is no auto-generation of wrappers (at least nothing mature). > However, > for the MPL case, we're not trying to wrap a large existing > library, > but rather particular code that is often written for MPL > specifically, > so hand-writing the Cython is a fine option. > > So why not Ctypes, or??? I think the real strength of Cython > in > wrapping C code is that you can write a "thick" wrapper in > an > almost_python language. So if you want to vectorize a C > function, for > instance, you can write that bit in Cython very easily (and > Cython's > built-in understanding of numpy array is very helpful here). > When you > use ctypes, you need to write that in pure Python -- easy > enough, but > probably not very performant. > > With SWIG, etc, you end up writing a fair bi tof C (or SWIG) > code to > handle the thicker bits of the wrapper -- so you're dealing > with the > raw CPython API, and , well, C. Cython really is an easier > option. > > I've found that for stuf that is less than very small (i.e. > one or two > loops through an array), writing the core code in native C > or C++ can > be easier, you know for sure you're not accidentally making > expensive > Python calls, etc. but using Cython to call it is still very > helpful. > > > I'm not sure it is -- I always seem to end up with > things that are more > > lines of code with more obscure workarounds than just > coding in C directly. > > Exactly -- but I don't think that applies to the CPython-API > bits, but > rather the core code -- so keep that in C. > > In summary, I guess what I think is the power of Cython is > the > flexibility in where you draw the line between Python, > Cython, and C > -- you can pass pure Python through Cython, or you can do > almost > nothing with it but call a C function, and eveything in > between. > > > From my experience, I would prefer to write such > extensions in C directly rather > > than relying on Cython, SWIG, or Boost.Python, because > those approaches would > > lead to another dependency (for developers at least), > > The dependency is pretty easy to deal with compared to the > many others in MPL. > > > and requires developers to > > learn how to code in them. Which may not be very hard, > but we may as well avoid > that if possible. > > Here's where I disagree -- if we go pure C and C-API > developers need > to know the Python C-API -- that is actually a pretty big > deal, and > hard to get right. Knowing enough Cython to call some C code > is a > smaller lift for sure. > > Anyway, I saw give it a shot -- I suspect you'll like it. > > -Chris > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R > (206) 526-6959 voice > 7600 Sand Point Way NE (206) > 526-6329 fax > Seattle, WA 98115 (206) > 526-6317 main reception > > Chr...@no... > |
From: Nathaniel S. <nj...@po...> - 2012-11-30 23:44:53
|
On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjl...@ya...> wrote: > One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython? You can set things up so that end-users don't have to install cython. You just convert the .pyx files to regular .c files before distributing your package. Numpy itself uses cython, but end-users don't notice or care. (It's something more of a hassle for developers to do things this way, and cython is very easy to install, so I don't know if it's worth it. But it's certainly possible.) -n |
From: Benjamin R. <ben...@ou...> - 2012-12-01 01:33:16
|
On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <nj...@po...> wrote: > On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjl...@ya...> > wrote: > > One package (Pysam) that I use a lot relies on Cython, and requires > users to install Cython before they can install Pysam itself. With Cython, > is that always the case? Will all users need to install Cython? Or is it > sufficient if only matplotlib developers install Cython? > > You can set things up so that end-users don't have to install cython. > You just convert the .pyx files to regular .c files before > distributing your package. Numpy itself uses cython, but end-users > don't notice or care. (It's something more of a hassle for developers > to do things this way, and cython is very easy to install, so I don't > know if it's worth it. But it's certainly possible.) > > Since when has numpy used Cython? I specifically remember a rather involved discussion thread on numpy-discussion about the pros-and-cons of including cython. Now, SciPy on the other hand, does utilize Cython in some spots IIRC, but does it in a way that it isn't even required for the developers to have cython installed to build from source. I would not be against such an approach. Much of the C/C++ stuff is rarely touched. If we have some source cython that is used to generate C/C++ source code that is packaged in the same way as the current code is, I would have no problem with that. Given that matplotlib is such a fundamental tool in the ecosystem, I want to make sure that the decisions we make are ones that improves our packaging situation. Cheers! Ben Root |
From: Nelle V. <nel...@gm...> - 2012-12-01 11:03:01
|
>> > One package (Pysam) that I use a lot relies on Cython, and requires >> > users to install Cython before they can install Pysam itself. With Cython, >> > is that always the case? Will all users need to install Cython? Or is it >> > sufficient if only matplotlib developers install Cython? >> >> You can set things up so that end-users don't have to install cython. >> You just convert the .pyx files to regular .c files before >> distributing your package. Numpy itself uses cython, but end-users >> don't notice or care. (It's something more of a hassle for developers >> to do things this way, and cython is very easy to install, so I don't >> know if it's worth it. But it's certainly possible.) >> > > Since when has numpy used Cython? I specifically remember a rather involved > discussion thread on numpy-discussion about the pros-and-cons of including > cython. Now, SciPy on the other hand, does utilize Cython in some spots > IIRC, but does it in a way that it isn't even required for the developers to > have cython installed to build from source. You just ship the c/c++ code for the developpers as well as for the end users. This is what we do with scikit-learn. It requires the developpers to make sure to compile the cython code, and commit both files. It is also quite annoying for reviews to have the generated c++ code, so the cython code needs to be compile after the reviews. The reason the scikit's developpers chose to use cython instead of something else is to decrease the maintenance burden: more contributors understand cython code than c/c++ code (or more precisely, understand c++ code written by someone else). Hence, this increases the bus number. > I would not be against such an approach. Much of the C/C++ stuff is rarely > touched. If we have some source cython that is used to generate C/C++ > source code that is packaged in the same way as the current code is, I would > have no problem with that. Given that matplotlib is such a fundamental tool > in the ecosystem, I want to make sure that the decisions we make are ones > that improves our packaging situation. > > Cheers! > Ben Root > > ------------------------------------------------------------------------------ > Keep yourself connected to Go Parallel: > INSIGHTS What's next for parallel hardware, programming and related areas? > Interviews and blogs by thought leaders keep you ahead of the curve. > http://goparallel.sourceforge.net > _______________________________________________ > Matplotlib-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel > |
From: Julian T. <jta...@go...> - 2012-12-01 12:13:57
|
On 12/01/2012 02:32 AM, Benjamin Root wrote: > > > Since when has numpy used Cython? I specifically remember a rather > involved discussion thread on numpy-discussion about the pros-and-cons > of including cython. Now, SciPy on the other hand, does utilize Cython > in some spots IIRC, but does it in a way that it isn't even required for > the developers to have cython installed to build from source. If you should choose cython please don't follow scipy too closely. Up until rather recent git head they did not ship the cython sources in their source tarballs which occasionally lead to inconsistent generated files (e.g. in 0.10.1 interpnd.pyx) and causes trouble for distributors (see e.g. debian bug 589731) A better example to follow would be e.g. pyzmq which ships both the cython and generated sources and has an easy to use cython setup.py target to recythonize. |
From: Michiel de H. <mjl...@ya...> - 2012-12-01 14:45:03
|
In my experience, Benjamin is right that the C code is rarely touched. This is even more true for the Python/C glue code, at least from my experience with the Mac OS X backend. Since the Python/C glue code is modified only very rarely, there may not be a need for regenerating the Python/C glue code by developers or users from a Cython source code. In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it's relatively straightforward to modify it by looking at the existing Python/C glue code. This argues against making the Cython source code a part of the matplotlib codebase. At the same time, to minimize errors, we could use Cython to create the initial Python/C glue code, and then add the generated code to the matplotlib codebase. Then neither users nor developers have to install Cython, we don't have to worry about inconsistencies (if any) between different Cython versions, we don't have to worry about keeping the Cython source code and the generated code in sync, and we will still get a high-quality Cython-generated Python/C glue code. By the way, how many modules in matplotlib make use of CXX, and would have to be converted? Best, -Michiel. --- On Fri, 11/30/12, Benjamin Root <ben...@ou...> wrote: From: Benjamin Root <ben...@ou...> Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX To: "Nathaniel Smith" <nj...@po...> Cc: "Michiel de Hoon" <mjl...@ya...>, "mat...@li..." <mat...@li...>, "Chris Barker - NOAA Federal" <chr...@no...> Date: Friday, November 30, 2012, 8:32 PM On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <nj...@po...> wrote: On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjl...@ya...> wrote: > One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython? You can set things up so that end-users don't have to install cython. You just convert the .pyx files to regular .c files before distributing your package. Numpy itself uses cython, but end-users don't notice or care. (It's something more of a hassle for developers to do things this way, and cython is very easy to install, so I don't know if it's worth it. But it's certainly possible.) Since when has numpy used Cython? I specifically remember a rather involved discussion thread on numpy-discussion about the pros-and-cons of including cython. Now, SciPy on the other hand, does utilize Cython in some spots IIRC, but does it in a way that it isn't even required for the developers to have cython installed to build from source. I would not be against such an approach. Much of the C/C++ stuff is rarely touched. If we have some source cython that is used to generate C/C++ source code that is packaged in the same way as the current code is, I would have no problem with that. Given that matplotlib is such a fundamental tool in the ecosystem, I want to make sure that the decisions we make are ones that improves our packaging situation. Cheers! Ben Root |
From: Ryan M. <rm...@gm...> - 2012-12-01 14:56:55
|
I'm +1 on Cython. I think its prevalence in the community gives us a larger potential contributor pool than CXX or hand-coded python C-API. I know using Cython would open up that part of the code base for me. Ryan On Dec 1, 2012, at 8:44, Michiel de Hoon <mjl...@ya...> wrote: > > In my experience, Benjamin is right that the C code is rarely touched. This is even more true for the Python/C glue code, at least from my experience with the Mac OS X backend. Since the Python/C glue code is modified only very rarely, there may not be a need for regenerating the Python/C glue code by developers or users from a Cython source code. > > In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it's relatively straightforward to modify it by looking at the existing Python/C glue code. > > This argues against making the Cython source code a part of the matplotlib codebase. > > At the same time, to minimize errors, we could use Cython to create the initial Python/C glue code, and then add the generated code to the matplotlib codebase. Then neither users nor developers have to install Cython, we don't have to worry about inconsistencies (if any) between different Cython versions, we don't have to worry about keeping the Cython source code and the generated code in sync, and we will still get a high-quality Cython-generated Python/C glue code. > > By the way, how many modules in matplotlib make use of CXX, and would have to be converted? > > Best, > -Michiel. > > --- On Fri, 11/30/12, Benjamin Root <ben...@ou...> wrote: > > From: Benjamin Root <ben...@ou...> > Subject: Re: [matplotlib-devel] Experiments in removing/replacing PyCXX > To: "Nathaniel Smith" <nj...@po...> > Cc: "Michiel de Hoon" <mjl...@ya...>, "mat...@li..." <mat...@li...>, "Chris Barker - NOAA Federal" <chr...@no...> > Date: Friday, November 30, 2012, 8:32 PM > > > > On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <nj...@po...> wrote: > On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon <mjl...@ya...> wrote: > > One package (Pysam) that I use a lot relies on Cython, and requires users to install Cython before they can install Pysam itself. With Cython, is that always the case? Will all users need to install Cython? Or is it sufficient if only matplotlib developers install Cython? > > You can set things up so that end-users don't have to install cython. > You just convert the .pyx files to regular .c files before > distributing your package. Numpy itself uses cython, but end-users > don't notice or care. (It's something more of a hassle for developers > to do things this way, and cython is very easy to install, so I don't > know if it's worth it. But it's certainly possible.) > > > Since when has numpy used Cython? I specifically remember a rather involved discussion thread on numpy-discussion about the pros-and-cons of including cython. Now, SciPy on the other hand, does utilize Cython in some spots IIRC, but does it in a way that it isn't even required for the developers to have cython installed to build from source. > > I would not be against such an approach. Much of the C/C++ stuff is rarely touched. If we have some source cython that is used to generate C/C++ source code that is packaged in the same way as the current code is, I would have no problem with that. Given that matplotlib is such a fundamental tool in the ecosystem, I want to make sure that the decisions we make are ones that improves our packaging situation. > > Cheers! > Ben Root > ------------------------------------------------------------------------------ > Keep yourself connected to Go Parallel: > INSIGHTS What's next for parallel hardware, programming and related areas? > Interviews and blogs by thought leaders keep you ahead of the curve. > http://goparallel.sourceforge.net > _______________________________________________ > Matplotlib-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel |
From: Michael D. <md...@st...> - 2012-12-01 19:03:13
|
Including the Cython-generated C in the tarballs and optionally the git repository as well can certainly be considered to reduce the need for Cython for developers and users alike. However, the Cython source should also be included in the repository for the inevitable times when it does need to be updated -- it shouldn't be off somewhere else. The png, path, ft2font, backend_agg, gtkagg, tkagg, tri, and image modules all use CXX. The backend_agg, image and ft2font ones are particularly complex, but some of that complexity could be reduced by using Numpy arrays in place of the image buffer types that each of them contain (that code predates matplotlib's numpy requirement, so it's not terribly surprising that a more complex approach was taken). Mike On 12/01/2012 09:44 AM, Michiel de Hoon wrote: > In my experience, Benjamin is right that the C code is rarely touched. > This is even more true for the Python/C glue code, at least from my > experience with the Mac OS X backend. Since the Python/C glue code is > modified only very rarely, there may not be a need for regenerating > the Python/C glue code by developers or users from a Cython source code. > > In addition, it is much easier to maintain the Python/C glue code than > to write it from scratch. Once you have the Python/C glue code, it's > relatively straightforward to modify it by looking at the existing > Python/C glue code. > > This argues against making the Cython source code a part of the > matplotlib codebase. > > At the same time, to minimize errors, we could use Cython to create > the initial Python/C glue code, and then add the generated code to the > matplotlib codebase. Then neither users nor developers have to install > Cython, we don't have to worry about inconsistencies (if any) between > different Cython versions, we don't have to worry about keeping the > Cython source code and the generated code in sync, and we will still > get a high-quality Cython-generated Python/C glue code. > > By the way, how many modules in matplotlib make use of CXX, and would > have to be converted? > > Best, > -Michiel. > > --- On *Fri, 11/30/12, Benjamin Root /<ben...@ou...>/* wrote: > > > From: Benjamin Root <ben...@ou...> > Subject: Re: [matplotlib-devel] Experiments in removing/replacing > PyCXX > To: "Nathaniel Smith" <nj...@po...> > Cc: "Michiel de Hoon" <mjl...@ya...>, > "mat...@li..." > <mat...@li...>, "Chris Barker - NOAA > Federal" <chr...@no...> > Date: Friday, November 30, 2012, 8:32 PM > > > > On Fri, Nov 30, 2012 at 6:44 PM, Nathaniel Smith <nj...@po... > </mc/compose?to=nj...@po...>> wrote: > > On Fri, Nov 30, 2012 at 11:40 PM, Michiel de Hoon > <mjl...@ya... </mc/compose?to=mjl...@ya...>> wrote: > > One package (Pysam) that I use a lot relies on Cython, and > requires users to install Cython before they can install Pysam > itself. With Cython, is that always the case? Will all users > need to install Cython? Or is it sufficient if only matplotlib > developers install Cython? > > You can set things up so that end-users don't have to install > cython. > You just convert the .pyx files to regular .c files before > distributing your package. Numpy itself uses cython, but end-users > don't notice or care. (It's something more of a hassle for > developers > to do things this way, and cython is very easy to install, so > I don't > know if it's worth it. But it's certainly possible.) > > > Since when has numpy used Cython? I specifically remember a > rather involved discussion thread on numpy-discussion about the > pros-and-cons of including cython. Now, SciPy on the other hand, > does utilize Cython in some spots IIRC, but does it in a way that > it isn't even required for the developers to have cython installed > to build from source. > > I would not be against such an approach. Much of the C/C++ stuff > is rarely touched. If we have some source cython that is used to > generate C/C++ source code that is packaged in the same way as the > current code is, I would have no problem with that. Given that > matplotlib is such a fundamental tool in the ecosystem, I want to > make sure that the decisions we make are ones that improves our > packaging situation. > > Cheers! > Ben Root > > > > ------------------------------------------------------------------------------ > Keep yourself connected to Go Parallel: > INSIGHTS What's next for parallel hardware, programming and related areas? > Interviews and blogs by thought leaders keep you ahead of the curve. > http://goparallel.sourceforge.net > > > _______________________________________________ > Matplotlib-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel |
From: Thomas K. <th...@kl...> - 2012-12-01 17:12:23
|
Drat, re-sending on the list. On 1 December 2012 16:40, Thomas Kluyver <th...@kl...> wrote: > On 1 December 2012 14:44, Michiel de Hoon <mjl...@ya...> wrote: > >> At the same time, to minimize errors, we could use Cython to create the >> initial Python/C glue code, and then add the generated code to the >> matplotlib codebase. Then neither users nor developers have to install >> Cython, we don't have to worry about inconsistencies (if any) between >> different Cython versions, we don't have to worry about keeping the Cython >> source code and the generated code in sync, and we will still get a >> high-quality Cython-generated Python/C glue code. > > > Having looked at some bits of Cython-generated C code, I wouldn't > recommend that. I'm sure it's high quality in terms of compiling and > running correctly, but it's definitely not designed to be read or > maintained directly. Here's a sample from SciPy to illustrate: > > > https://github.com/scipy/scipy/blob/master/scipy/stats/vonmises_cython.c#L2269 > > For another reason, there have been cases where the Cython-generated C > code was broken in some way, and it was fixed by regenerating with a newer > version of Cython. I experienced this with pyzmq when testing with Python > 3.3 for example - it completely failed to import until I installed a newer > version of Cython and redid the conversion. If you don't keep the original > Cython code, you don't have this option. > > Best wishes, > Thomas > |
From: Chris B. - N. F. <chr...@no...> - 2012-12-03 18:13:09
|
On Sat, Dec 1, 2012 at 6:44 AM, Michiel de Hoon > > Since the Python/C glue code is modified only very rarely, there may not be a need for regenerating the Python/C glue code by developers or users from a Cython source code. True. > In addition, it is much easier to maintain the Python/C glue code than to write it from scratch. Once you have the Python/C glue code, it's relatively straightforward to modify it by looking at the existing Python/C glue code. > not so true -- getting reference counting right, etc is difficult -- I suppose once the glue code is robust, and all you are changing is a bit of API to the C, maybe.... > > This argues against making the Cython source code a part of the matplotlib codebase. > huh? are you suggesting that we use Cython to generate the glue, then hand-maintain that glue? I think that is a really, rally bad idea -- generated code is ugly and hard to maintain, it is not designed to be human-readable, and we wouldn't get the advantages of bug-fixes further development in Cython. So -- if you use Cython, you want to keep using, and theat means the Cython source IS the source. I agree that it's a good idea to ship the generated code as well, so that no one that is not touching the Cython has to generate. Other than the slight mess from generated files showing up in diffs, etc, this really works just fine. Any reason MPL couldn't continue with EXACTLY the same approach now used with C_XX -- it generates code as well, yes? Michael Droettboom wrote: > For the PNG extension specifically, it was creating callbacks that can > be called from C and the setjmp magic that libpng requires. I think > it's possible to do it, but I was surprised at how non-obvious those > pieces of Cython were. I was really hoping by creating this experiment > that a Cython expert would step up and show the way ;) Did you not get the support you expected from the cython list? Anyway, there's no reason you can't keep stuff in C that's easier in C (or did C_XX make this easy?). I think making basic callbacks is actually pretty straightforward, but In don't know about the setjmp magic (I have no idea hat that means!). > The Agg backend has more C++-specific challenges, particularly > instantiating very complex template expressions -- I'm guessing you'd do the complex template stuff in C++ -- and let Cython see a more traditional static API. > but some of that complexity could be reduced by using Numpy arrays in place of the > image buffer types that each of them contain OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython. > The Cython version isn't that much shorter than the C++ version. I think some things make sense to keep in C++, though I do see a fair bit of calls (in the C++) to the python API -- I'm surprised there isn't much code advantage, but anyway, the goal is more robust/easier to maintain, which may correlate with code-size, but not completely. > These declarations aren't exact matches to what one would find in the header file(s) >because Cython doesn't support exact-width data types etc. It does support the C99 fixed-width integer types: from libc.stdint cimport int16_t, int32_t, Or are you talking about something else? > I'm not sure why some of the Python/C API calls I needed were not defined in Cython's include wrappers. I suspect that's an oversight -- for the most part, stuff has been added as it's needed. One other note -- from a quick glance at your Cython code, it looks like you did almost everything is Cython-that-will-compile-to-pure-C -- i.e. a lot of calls to the CPython API. But the whole point of Cython is that it makes those calls for you. So you can do type checking, and switching on types, and calling np.asarray(), etc, etc, etc, in Python, without calling the CPython api yourself. I know nothing of the PNG API, and am pretty week on the CPython API (and C for that matter), but I it's likely that the Cython code you've written could be much simplified. > Once things compiled, due to my own mistake, calling the function segfaulted. Debugging > that segfault in gdb required, again, wading through the generated code. Using gdb on > hand-written code is *much* nicer. for sure -- there is a plug-in/add-on/something for using gdb on Cython code -- I haven't used it but I imagine it would help. Ian Thomas wrote: > I have never used Cython, but to me the code looks like an inelegant combination of > Python,C/C++ and some Cython-specific stuff. well, yes, it is that! > I can see the advantage of this approach for small sections of code, but I have strong > reservations about using it for complicated modules that have extensive use of > templated code and/or Standard Template Library collections (mpl has examples of > both of these). So far, I've found that Cython is good for: - The simple stuff -- basic loops through numpy arrays, etc. - wrapping/calling more complex C or C++ -- essentially handling the reference counting and python type packing/unpacking of python types. So we find we do write some shim code in C++ to make the access to the core libraries Cython-friendly. We haven't dealt with complex templating, etc, but I'd guess if we did I'd keep that in C++. And since the resulting actual glue code is pretty simple, it makes the debugging easier. > Maybe rather than asking "if we switched to using Cython, would more participate", I > should be asking "among those that can participate in removing the PyCXX > dependency, what is the preferred approach?" I don't know that we need a one-sieze fits all approach -- perhaps some bits make the most sense to move to plain old C/C++, and some to Cython, either because of the nature of the code itself, or because of the experience/preference of the person that takes ownership of a particular problem. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Michael D. <md...@st...> - 2012-12-03 19:59:16
|
On 12/03/2012 01:12 PM, Chris Barker - NOAA Federal wrote: > This argues against making the Cython source code a part of the matplotlib codebase. > > huh? are you suggesting that we use Cython to generate the glue, then > hand-maintain that glue? I think that is a really, rally bad idea -- > generated code is ugly and hard to maintain, it is not designed to be > human-readable, and we wouldn't get the advantages of bug-fixes > further development in Cython. > > So -- if you use Cython, you want to keep using, and theat means the > Cython source IS the source. I agree that it's a good idea to ship the > generated code as well, so that no one that is not touching the Cython > has to generate. Other than the slight mess from generated files > showing up in diffs, etc, this really works just fine. I agree with this approach. > > Any reason MPL couldn't continue with EXACTLY the same approach now > used with C_XX -- it generates code as well, yes? No -- PyCXX is just C++. Its killer feature is that it provides a fairly thin layer around the Python C/API that does implicit reference counting through the use of C++ constructors and destructors. I actually think it's a really elegant approach to the problem. The downside we're running into is that it's barely maintained, so using vanilla upstream as provided by packagers is not viable. An alternative to all of this discussion is to fork PyCXX and release as needed. The maintenance required is primarily when new versions of Python are released, so it wouldn't necessarily be a huge undertaking. However, I know some are reluctant to use a relatively unused tool. > > Michael Droettboom wrote: > >> For the PNG extension specifically, it was creating callbacks that can >> be called from C and the setjmp magic that libpng requires. I think >> it's possible to do it, but I was surprised at how non-obvious those >> pieces of Cython were. I was really hoping by creating this experiment >> that a Cython expert would step up and show the way ;) > Did you not get the support you expected from the cython list? Anyway, > there's no reason you can't keep stuff in C that's easier in C (or did > C_XX make this easy?). The support has been adequate, but the solutions aren't always an improvement over raw Python/C API (not just in terms of lines of code but in terms of the number of layers of abstraction and "magic" between the coder and what actually happens). > I think making basic callbacks is actually > pretty straightforward, but In don't know about the setjmp magic (I > have no idea hat that means!). It turned out to be not terrible once I figured out the correct incantation. > >> The Agg backend has more C++-specific challenges, particularly >> instantiating very complex template expressions -- > I'm guessing you'd do the complex template stuff in C++ -- and let > Cython see a more traditional static API. Agreed -- I'm really only considering replacing the glue code provided by PyCXX, not the whole thing. matplotlib's C/C++ code has been around for a while and has been fairly vetted at this point, so I don't think a wholesale rewrite makes sense. > >> but some of that complexity could be reduced by using Numpy arrays in place of the >> image buffer types that each of them contain > OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython. Sure, but when we return to Python, they should be Numpy arrays which have more methods etc. -- or am I missing something? >> The Cython version isn't that much shorter than the C++ version. > I think some things make sense to keep in C++, though I do see a fair > bit of calls (in the C++) to the python API -- I'm surprised there > isn't much code advantage, but anyway, the goal is more robust/easier > to maintain, which may correlate with code-size, but not completely. > >> These declarations aren't exact matches to what one would find in the header file(s) >because Cython doesn't support exact-width data types etc. > It does support the C99 fixed-width integer types: > > from libc.stdint cimport int16_t, int32_t, > > Or are you talking about something else? The problem is that Cython can't actually read the C header, so there are types in libpng, for example, that we don't actually know the size of. They are different on different platforms. In C, you just include the header. In Cython, I'd have to determine the size of the types in a pre-compilation step, or manually determine their sizes and hard code them for the platforms we care about. > >> I'm not sure why some of the Python/C API calls I needed were not defined in Cython's include wrappers. > I suspect that's an oversight -- for the most part, stuff has been > added as it's needed. > > One other note -- from a quick glance at your Cython code, it looks > like you did almost everything is Cython-that-will-compile-to-pure-C > -- i.e. a lot of calls to the CPython API. But the whole point of > Cython is that it makes those calls for you. So you can do type > checking, and switching on types, and calling np.asarray(), etc, etc, > etc, in Python, without calling the CPython api yourself. I know > nothing of the PNG API, and am pretty week on the CPython API (and C > for that matter), but I it's likely that the Cython code you've > written could be much simplified. It would at least make this a more fair comparison to have the Cython code as Cythonic as possible. However, I couldn't find any ways around using these particular APIs -- other than the Numpy stuff which probably does have a more elegant solution in the form of Cython arrays and memory views. > > >> Once things compiled, due to my own mistake, calling the function segfaulted. Debugging >> that segfault in gdb required, again, wading through the generated code. Using gdb on >> hand-written code is *much* nicer. > for sure -- there is a plug-in/add-on/something for using gdb on > Cython code -- I haven't used it but I imagine it would help. Ah. I wasn't aware of that. Thanks for pointing that out. I have the CPython plug-in for gdb and it's great. > > Ian Thomas wrote: >> I have never used Cython, but to me the code looks like an inelegant combination of >> Python,C/C++ and some Cython-specific stuff. > well, yes, it is that! > >> I can see the advantage of this approach for small sections of code, but I have strong > reservations about using it for complicated modules that have extensive use of >> templated code and/or Standard Template Library collections (mpl has examples of >> both of these). > So far, I've found that Cython is good for: > - The simple stuff -- basic loops through numpy arrays, etc. > - wrapping/calling more complex C or C++ > -- essentially handling the reference counting and python type > packing/unpacking of python types. > > So we find we do write some shim code in C++ to make the access to the > core libraries Cython-friendly. We haven't dealt with complex > templating, etc, but I'd guess if we did I'd keep that in C++. And > since the resulting actual glue code is pretty simple, it makes the > debugging easier. > >> Maybe rather than asking "if we switched to using Cython, would more participate", I >> should be asking "among those that can participate in removing the PyCXX >> dependency, what is the preferred approach?" > I don't know that we need a one-sieze fits all approach -- perhaps > some bits make the most sense to move to plain old C/C++, and some to > Cython, either because of the nature of the code itself, or because of > the experience/preference of the person that takes ownership of a > particular problem. > True. We do have two categories of stuff using PyCXX in matplotlib: things that (primarily) wrap third-party C/C++ libraries, and things that are actually doing algorithmic heavy lifting. It's quite possible we don't want the same solution for all. Cheers, Mike |
From: Chris B. - N. F. <chr...@no...> - 2012-12-03 20:25:51
|
On Mon, Dec 3, 2012 at 11:59 AM, Michael Droettboom <md...@st...> wrote: >>> but some of that complexity could be reduced by using Numpy arrays in place of the >>> image buffer types that each of them contain >> OR Cython arrays and/or memoryviews -- this is indeed a real strength of Cython. > > Sure, but when we return to Python, they should be Numpy arrays which > have more methods etc. -- or am I missing something? Cython makes it really easy to switch between ndarrays and memoryviews, etc -- it's a question of what you want to work with in your code, so you have write a function that takes numpy arrays and returns numpy arrays, but uses a memoryview internally (and passes to C code that way). But I'm not an expert on this, I'mve found that I'm either doing simplestuff where using numpy arrays directly works fine, or passing the pointer to the data array off to C: def a_function_to_call_C( cnp.ndarray[double, ndim=2, mode="c" ] in_array ): """ calls the_c_function, altering the array in-place """ cdef int m, n m = in_array.size[0] m = in_array.size[1] the_c_function( &in_array[0], m, n ) >> It does support the C99 fixed-width integer types: >> from libc.stdint cimport int16_t, int32_t, >> > The problem is that Cython can't actually read the C header, yeah, this is a pity. There has been some work on auto-generating Cython from C headers, though nothing mature. For my work, I've been considering writing some simple pyd-generating code, just to make sure my data types are inline with the C++ as it may change. > so there > are types in libpng, for example, that we don't actually know the size > of. They are different on different platforms. In C, you just include > the header. In Cython, I'd have to determine the size of the types in a > pre-compilation step, or manually determine their sizes and hard code > them for the platforms we care about. yeah -- this is a tricky problem, however, I think you can follow what you'd do in C -- i.e. presumable the header define their own data types: png_short or whatever. The actually definition is filled in by the pre-processor. So I wonder if you can declare those types in Cython, then have it write C code that uses those types, and it all gets cleared up at compile time -- maybe. The key is that when you declare stuff in Cython, that declaration is used to determine how to write the C code, I don't think the declarations themselves are translated. > It would at least make this a more fair comparison to have the Cython > code as Cythonic as possible. However, I couldn't find any ways around > using these particular APIs -- other than the Numpy stuff which probably > does have a more elegant solution in the form of Cython arrays and > memory views. yup -- that's what I noticed right away -- I"m note sure it there is easier handling of file handles. > True. We do have two categories of stuff using PyCXX in matplotlib: > things that (primarily) wrap third-party C/C++ libraries, and things > that are actually doing algorithmic heavy lifting. It's quite possible > we don't want the same solution for all. And I'm not sure the wrappers all need to be written the same way, either. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Nathaniel S. <nj...@po...> - 2012-12-03 22:53:53
|
On Mon, Dec 3, 2012 at 8:24 PM, Chris Barker - NOAA Federal <chr...@no...> wrote: > On Mon, Dec 3, 2012 at 11:59 AM, Michael Droettboom <md...@st...> wrote: >> so there >> are types in libpng, for example, that we don't actually know the size >> of. They are different on different platforms. In C, you just include >> the header. In Cython, I'd have to determine the size of the types in a >> pre-compilation step, or manually determine their sizes and hard code >> them for the platforms we care about. > > yeah -- this is a tricky problem, however, I think you can follow what > you'd do in C -- i.e. presumable the header define their own data > types: png_short or whatever. The actually definition is filled in by > the pre-processor. So I wonder if you can declare those types in > Cython, then have it write C code that uses those types, and it all > gets cleared up at compile time -- maybe. The key is that when you > declare stuff in Cython, that declaration is used to determine how to > write the C code, I don't think the declarations themselves are > translated. Yeah, this isn't an issue in Cython, it's a totally standard thing (though perhaps not well documented). When you write cdef extern from "png.h": ctypedef int png_short or whatever, what you are saying is "the C compiler knows about a type called png_short, which acts in an int-like fashion, so Cython, please use your int rules when dealing with it". So this means that Cython will know that if you return a png_short from a python function, it should insert a call to PyInt_FromLong (or maybe PyInt_FromSsize_t? -- cython worries about these things so I don't have to). But Cython only takes care of the Python<->C interface. It will leave the C compiler to actually allocate the appropriate memory for png_shorts, perform C arithmetic, coerce a png_short into a 'long' when necessary, etc. It's kind of mind-bending to wrap your head around, and it definitely does help to spend some time reading the C code that Cython spits out to understand how the mapping works (it's both more and less magic than it looks -- Python stuff gets carefully expanded, C stuff goes through almost verbatim), but the end result works amazingly well. >> It would at least make this a more fair comparison to have the Cython >> code as Cythonic as possible. However, I couldn't find any ways around >> using these particular APIs -- other than the Numpy stuff which probably >> does have a more elegant solution in the form of Cython arrays and >> memory views. > > yup -- that's what I noticed right away -- I"m note sure it there is > easier handling of file handles. For the file handle, I would just write cdef FILE *fp = fdopen(file_obj.fileno(), "w") and be done with it. This will work with any version of Python etc. -n |
From: Chris B. - N. F. <chr...@no...> - 2012-12-03 23:51:10
|
On Mon, Dec 3, 2012 at 2:21 PM, Nathaniel Smith <nj...@po...> wrote: > For the file handle, I would just write > > cdef FILE *fp = fdopen(file_obj.fileno(), "w") > > and be done with it. This will work with any version of Python etc. yeah, that makes sense -- though what if you want to be able to read_to/write_from a file that is already open, and in the middle of the file somewhere -- would that work? I just posted a question to the Cython list, and indeed, it looks like there is no easy answer to the file issue. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Chris B. - N. F. <chr...@no...> - 2012-12-04 00:01:34
|
On Mon, Dec 3, 2012 at 12:24 PM, Chris Barker - NOAA Federal <chr...@no...> wrote: >>>> but some of that complexity could be reduced by using Numpy arrays in place >> It would at least make this a more fair comparison to have the Cython >> code as Cythonic as possible. However, I couldn't find any ways around >> using these particular APIs -- other than the Numpy stuff which probably >> does have a more elegant solution in the form of Cython arrays and >> memory views. OK -- so I poked at it, and this is my (very untested) version of write_png (I left out the py3 stuff, though it does look like it may be required for file handling... Letting Cython unpack the numpy array is the real win. Maybe having it this simple won't work for MPL, but this is what my code tends to look like. def write_png(cnp.ndarray[cnp.uint32, ndim=2, mode="c" ] buff not None, file_obj, double dpi=0.0): cdef png_uint_32 width = buff.size[0] cdef png_uint_32 height = buff.size[1] if PyFile_CheckExact(file_obj): cdef FILE *fp = fdopen(file_obj.fileno(), "w") fp = PyFile_AsFile(file_obj) write_png_c(buff[0,0], width, height, fp, NULL, NULL, NULL, dpi) return else: raise TypeError("write_png only works with real PyFileObject") NOTE: that could be: cnp.ndarray[cnp.uint8, ndim=3, mode="c" ] I'm not sure how MPL stores image buffers. or you could accept any object, then call: np.view() -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Michael D. <md...@st...> - 2012-12-04 13:37:32
|
On 12/03/2012 07:00 PM, Chris Barker - NOAA Federal wrote: > On Mon, Dec 3, 2012 at 12:24 PM, Chris Barker - NOAA Federal > <chr...@no...> wrote: > >>>>> but some of that complexity could be reduced by using Numpy arrays in place >>> It would at least make this a more fair comparison to have the Cython >>> code as Cythonic as possible. However, I couldn't find any ways around >>> using these particular APIs -- other than the Numpy stuff which probably >>> does have a more elegant solution in the form of Cython arrays and >>> memory views. > OK -- so I poked at it, and this is my (very untested) version of > write_png (I left out the py3 stuff, though it does look like it may > be required for file handling... > > Letting Cython unpack the numpy array is the real win. Maybe having it > this simple won't work for MPL, but this is what my code tends to look > like. > > > def write_png(cnp.ndarray[cnp.uint32, ndim=2, mode="c" ] buff not None, > file_obj, > double dpi=0.0): > > cdef png_uint_32 width = buff.size[0] > cdef png_uint_32 height = buff.size[1] > > if PyFile_CheckExact(file_obj): > cdef FILE *fp = fdopen(file_obj.fileno(), "w") > fp = PyFile_AsFile(file_obj) > write_png_c(buff[0,0], width, height, fp, > NULL, NULL, NULL, dpi) > return > else: > raise TypeError("write_png only works with real PyFileObject") > > > NOTE: that could be: > > cnp.ndarray[cnp.uint8, ndim=3, mode="c" ] > > I'm not sure how MPL stores image buffers. > > or you could accept any object, then call: > > np.view() The buffer comes in both ways, so the latter solution seems like the thing to do. Thanks for working this through. This sort of thing is very helpful. We can also, of course, maintain the existing code that allows writing to an arbitrary file-like object, but this fast path (where it is a "real" file) is very important. It's significantly faster than calling methods on Python objects. Mike |
From: Nathaniel S. <nj...@po...> - 2012-12-04 00:16:39
|
On Mon, Dec 3, 2012 at 11:50 PM, Chris Barker - NOAA Federal <chr...@no...> wrote: > On Mon, Dec 3, 2012 at 2:21 PM, Nathaniel Smith <nj...@po...> wrote: >> For the file handle, I would just write >> >> cdef FILE *fp = fdopen(file_obj.fileno(), "w") >> >> and be done with it. This will work with any version of Python etc. > > yeah, that makes sense -- though what if you want to be able to > read_to/write_from a file that is already open, and in the middle of > the file somewhere -- would that work? > > I just posted a question to the Cython list, and indeed, it looks like > there is no easy answer to the file issue. Yeah, this is a general problem with the Python file API, trying to hook it up to stdio is not at all an easy thing. A better version of this code would skip that altogether like: cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count): fobj = <object>png_get_io_ptr(s) pydata = PyString_FromStringAndSize(data, count) fobj.write(pydata) cdef void flush_pyfile(png_structp s): # Not sure if this is even needed fobj = <object>png_get_io_ptr(s) fobj.flush() # in write_png: write_png_c(<png_byte*>pix_buffer, width, height, NULL, <void*>file_obj, write_to_pyfile, flush_pyfile, dpi) But this is a separate issue :-) (and needs further fiddling to make exception handling work). Or if you're only going to work on real OS-level file objects anyway, you might as well just accept a filename as a string and fopen() it locally. Having Python do the fopen just makes your life harder for no reason. -n |
From: Michael D. <md...@st...> - 2012-12-04 13:43:58
|
On 12/03/2012 07:16 PM, Nathaniel Smith wrote: > On Mon, Dec 3, 2012 at 11:50 PM, Chris Barker - NOAA Federal > <chr...@no...> wrote: >> On Mon, Dec 3, 2012 at 2:21 PM, Nathaniel Smith <nj...@po...> wrote: >>> For the file handle, I would just write >>> >>> cdef FILE *fp = fdopen(file_obj.fileno(), "w") >>> >>> and be done with it. This will work with any version of Python etc. >> yeah, that makes sense -- though what if you want to be able to >> read_to/write_from a file that is already open, and in the middle of >> the file somewhere -- would that work? >> >> I just posted a question to the Cython list, and indeed, it looks like >> there is no easy answer to the file issue. > Yeah, this is a general problem with the Python file API, trying to > hook it up to stdio is not at all an easy thing. A better version of > this code would skip that altogether like: > > cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count): > fobj = <object>png_get_io_ptr(s) > pydata = PyString_FromStringAndSize(data, count) > fobj.write(pydata) > > cdef void flush_pyfile(png_structp s): > # Not sure if this is even needed > fobj = <object>png_get_io_ptr(s) > fobj.flush() > > # in write_png: > write_png_c(<png_byte*>pix_buffer, width, height, > NULL, <void*>file_obj, write_to_pyfile, flush_pyfile, dpi) This is what my original version already does in the event that the file_obj is not a "real" file. In practice, you need to support both methods -- the callback approach is many times slower than writing directly to a regular old FILE object, because there is overhead both at the libpng and Python level, and there's no way to select a good buffer size. > > But this is a separate issue :-) (and needs further fiddling to make > exception handling work). > > Or if you're only going to work on real OS-level file objects anyway, > you might as well just accept a filename as a string and fopen() it > locally. Having Python do the fopen just makes your life harder for no > reason. There's actually a very good reason. It is difficult to deal with Unicode in file paths from C in a portable way. On Windows, for example, if the user's name contains non-ascii characters, you can't write to the home directory using fopen, etc. It's doable with some care by using platform-specific C APIs etc., but CPython has already done all of the hard work for us, so it's easiest just to leverage that by opening the file from Python. Mike |
From: Chris B. - N. F. <chr...@no...> - 2012-12-04 01:02:28
|
On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <nj...@po...> wrote: > Yeah, this is a general problem with the Python file API, trying to > hook it up to stdio is not at all an easy thing. A better version of > this code would skip that altogether like: > > cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count): > fobj = <object>png_get_io_ptr(s) > pydata = PyString_FromStringAndSize(data, count) > fobj.write(pydata) Good point -- not at all Cython-specific, but do you need libpng (or whatever) to write to the file? can you just get a buffer with the encoded data and write it on the Python side? Particularly if the user wants to pass in an open file object. This might be a better API for folks that might want stream an image right through a web app, too. As a lot of Python APIs take either a file name or a file-like object, perhaps it would make sense to push that distinction down to the Cython level: -- if it's a filename, open it with raw C -- if it's a file-like object, have libpng write to a buffer (bytes object) , and pass that to the file-like object in Python anyway, not really a Cython issue, but that second object sure would be easy on Cython.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Michael D. <md...@st...> - 2012-12-04 13:46:02
|
On 12/03/2012 08:01 PM, Chris Barker - NOAA Federal wrote: > On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <nj...@po...> wrote: > >> Yeah, this is a general problem with the Python file API, trying to >> hook it up to stdio is not at all an easy thing. A better version of >> this code would skip that altogether like: >> >> cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count): >> fobj = <object>png_get_io_ptr(s) >> pydata = PyString_FromStringAndSize(data, count) >> fobj.write(pydata) > Good point -- not at all Cython-specific, but do you need libpng (or > whatever) to write to the file? can you just get a buffer with the > encoded data and write it on the Python side? Particularly if the user > wants to pass in an open file object. This might be a better API for > folks that might want stream an image right through a web app, too. You need to support both: raw C FILE objects for speed, and writing to a Python file-like object for flexibility. The code in master already does this (albeit with PyCXX), and the code on my "No CXX" branch does this as well with Cython. > > As a lot of Python APIs take either a file name or a file-like object, > perhaps it would make sense to push that distinction down to the > Cython level: > -- if it's a filename, open it with raw C Unfortunately, as stated in detail in my last e-mail, that doesn't work with Unicode paths. > -- if it's a file-like object, have libpng write to a buffer (bytes > object) , and pass that to the file-like object in Python libpng does one better and allows us to stream directly to a callback which can then write to a Python object. This prevents double allocation of memory. > > anyway, not really a Cython issue, but that second object sure would > be easy on Cython.... > Yeah -- once I figured out how to make a real C callback function from Cython, the contents of the callback function itself is pretty easy to write. Mike |
From: Michael D. <md...@st...> - 2012-12-04 13:52:47
|
Also -- this feedback is really helpful when writing some comments in the wrappers as to why certain things are the way they are... I'll make sure to include rationales for raw file fast path and the need to open the files on the Python side. Mike On 12/04/2012 08:45 AM, Michael Droettboom wrote: > On 12/03/2012 08:01 PM, Chris Barker - NOAA Federal wrote: >> On Mon, Dec 3, 2012 at 4:16 PM, Nathaniel Smith <nj...@po...> wrote: >> >>> Yeah, this is a general problem with the Python file API, trying to >>> hook it up to stdio is not at all an easy thing. A better version of >>> this code would skip that altogether like: >>> >>> cdef void write_to_pyfile(png_structp s, png_bytep data, png_size_t count): >>> fobj = <object>png_get_io_ptr(s) >>> pydata = PyString_FromStringAndSize(data, count) >>> fobj.write(pydata) >> Good point -- not at all Cython-specific, but do you need libpng (or >> whatever) to write to the file? can you just get a buffer with the >> encoded data and write it on the Python side? Particularly if the user >> wants to pass in an open file object. This might be a better API for >> folks that might want stream an image right through a web app, too. > You need to support both: raw C FILE objects for speed, and writing to a > Python file-like object for flexibility. The code in master already > does this (albeit with PyCXX), and the code on my "No CXX" branch does > this as well with Cython. >> As a lot of Python APIs take either a file name or a file-like object, >> perhaps it would make sense to push that distinction down to the >> Cython level: >> -- if it's a filename, open it with raw C > Unfortunately, as stated in detail in my last e-mail, that doesn't work > with Unicode paths. > >> -- if it's a file-like object, have libpng write to a buffer (bytes >> object) , and pass that to the file-like object in Python > libpng does one better and allows us to stream directly to a callback > which can then write to a Python object. This prevents double > allocation of memory. > >> anyway, not really a Cython issue, but that second object sure would >> be easy on Cython.... >> > Yeah -- once I figured out how to make a real C callback function from > Cython, the contents of the callback function itself is pretty easy to > write. > > Mike > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > Matplotlib-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel |