From: vehemental <jim...@gm...> - 2009-06-17 14:05:53
|
Hello, I'm using matplotlib for various tasks beautifully...but on some occasions, I have to visualize large datasets (in the range of 10M data points) (using imshow or regular plots)...system start to choke a bit at that point... I would like to be consistent somehow and not use different tools for basically similar tasks... so I'd like some pointers regarding rendering performance...as I would be interested to be involved in dev is there is something to be done.... To active developers, what's the general feel does matplotlib have room to spare in its rendering performance?... or is it pretty tied down to the speed of Agg right now? Is there something to gain from using the multiprocessing module now included by default in 2.6? or even go as far as using something like pyGPU for fast vectorized computations...? I've seen around previous discussions about OpenGL being a backend in some future... would it really stand up compared to the current backends? is there clues about that right now? thanks for any inputs! :D bye -- View this message in context: http://www.nabble.com/Large-datasets-performance....-tp24074329p24074329.html Sent from the matplotlib - devel mailing list archive at Nabble.com. |
From: Nicolas R. <Nic...@lo...> - 2009-06-17 14:26:09
|
Hello, To give you some hints on performances using OpenGL, you can have a look at glumpy: http://www.loria.fr/~rougier/tmp/glumpy.tgz (It requires pyglet for the OpenGL backend). It is not yet finished but it is usable. Current version allows to visualize static numpy float32 array up to 8000x8000 and dynamic numpy float32 array around 500x500 depending on GPU hardware (dynamic means that you update image at around 30 fps/second). The idea behind glumpy is to directly translate a numpy array into a texture and to use shaders to make the colormap transformation and filtering (nearest, bilinear or bicubic). Nicolas On Wed, 2009-06-17 at 07:02 -0700, vehemental wrote: > Hello, > > I'm using matplotlib for various tasks beautifully...but on some occasions, > I have to visualize large datasets (in the range of 10M data points) (using > imshow or regular plots)...system start to choke a bit at that point... > > I would like to be consistent somehow and not use different tools for > basically similar tasks... > so I'd like some pointers regarding rendering performance...as I would be > interested to be involved in dev is there is something to be done.... > > To active developers, what's the general feel does matplotlib have room to > spare in its rendering performance?... > or is it pretty tied down to the speed of Agg right now? > Is there something to gain from using the multiprocessing module now > included by default in 2.6? > or even go as far as using something like pyGPU for fast vectorized > computations...? > > I've seen around previous discussions about OpenGL being a backend in some > future... > would it really stand up compared to the current backends? is there clues > about that right now? > > thanks for any inputs! :D > bye |
From: Michael D. <md...@st...> - 2009-06-17 14:34:13
|
vehemental wrote: > Hello, > > I'm using matplotlib for various tasks beautifully...but on some occasions, > I have to visualize large datasets (in the range of 10M data points) (using > imshow or regular plots)...system start to choke a bit at that point... > The first thing I would check is whether your system becomes starved for memory at this point and virtual memory swapping kicks in. A common technique for faster plotting of image data is to downsample it before passing it to matplotlib. Same with line plots -- they can be decimated. There is newer/faster path simplification code in SVN trunk that may help with complex line plots (when the path.simplify rcParam is True). I would suggest starting with that as a baseline to see how much performance it already gives over the released version. > I would like to be consistent somehow and not use different tools for > basically similar tasks... > so I'd like some pointers regarding rendering performance...as I would be > interested to be involved in dev is there is something to be done.... > > To active developers, what's the general feel does matplotlib have room to > spare in its rendering performance?... > I've spent a lot of time optimizing the Agg backend (which is already one of the fastest software-only approaches out there), and I'm out of obvious ideas. But a fresh set of eyes may find new things. An advantage of Agg that shouldn't be overlooked is that is works identically everywhere. > or is it pretty tied down to the speed of Agg right now? > Is there something to gain from using the multiprocessing module now > included by default in 2.6? > Probably not. If the work of rendering were to be divided among cores, that would probably be done at the C++ level anyway to see any gains. As it is, the problem with plotting many points generally tends to be limited by memory bandwidth anyway, not processor speed. > or even go as far as using something like pyGPU for fast vectorized > computations...? > Perhaps. But again, the computation isn't the bottleneck -- it's usually a memory bandwidth starvation issue in my experience. Using a GPU may only make matters worse. Note that I consider that approach distinct from just using OpenGL to colormap and render the image as a texture. That approach may bear some fruit -- but only for image plots. Vector graphics acceleration with GPUs is still difficult to do in high quality across platforms and chipsets and beat software for speed. > I've seen around previous discussions about OpenGL being a backend in some > future... > > would it really stand up compared to the current backends? is there clues > about that right now? > > thanks for any inputs! :D > bye > Hope this helps, Mike -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |
From: Jimmy P. <jim...@gm...> - 2009-06-17 14:56:13
|
2009/6/17 Michael Droettboom <md...@st...> > vehemental wrote: > >> Hello, >> >> I'm using matplotlib for various tasks beautifully...but on some >> occasions, >> I have to visualize large datasets (in the range of 10M data points) >> (using >> imshow or regular plots)...system start to choke a bit at that point... >> >> > The first thing I would check is whether your system becomes starved for > memory at this point and virtual memory swapping kicks in. the python process is sitting around a 300Mo of memory comsumption....there should plenty of memory left... but I will look more closely to what's happenning... I would assume the Memory bandwidth to not be very high, given the cheapness of the comp i' m using :D > > > A common technique for faster plotting of image data is to downsample it > before passing it to matplotlib. Same with line plots -- they can be > decimated. There is newer/faster path simplification code in SVN trunk that > may help with complex line plots (when the path.simplify rcParam is True). > I would suggest starting with that as a baseline to see how much > performance it already gives over the released version. yes totally make sense...no need to visualize 3 millions points if you can only display 200 000.... I'm already doing that to some extent, but it's taking time on its own...but at least I have solutions to reduce this time if needed.... i' ll try the SVN version....see if I can extract some improvements.... > > I would like to be consistent somehow and not use different tools for >> basically similar tasks... >> so I'd like some pointers regarding rendering performance...as I would be >> interested to be involved in dev is there is something to be done.... >> >> To active developers, what's the general feel does matplotlib have room to >> spare in its rendering performance?... >> >> > I've spent a lot of time optimizing the Agg backend (which is already one > of the fastest software-only approaches out there), and I'm out of obvious > ideas. But a fresh set of eyes may find new things. An advantage of Agg > that shouldn't be overlooked is that is works identically everywhere. > >> or is it pretty tied down to the speed of Agg right now? >> Is there something to gain from using the multiprocessing module now >> included by default in 2.6? >> >> > Probably not. If the work of rendering were to be divided among cores, > that would probably be done at the C++ level anyway to see any gains. As it > is, the problem with plotting many points generally tends to be limited by > memory bandwidth anyway, not processor speed. > >> or even go as far as using something like pyGPU for fast vectorized >> computations...? >> >> > Perhaps. But again, the computation isn't the bottleneck -- it's usually a > memory bandwidth starvation issue in my experience. Using a GPU may only > make matters worse. Note that I consider that approach distinct from just > using OpenGL to colormap and render the image as a texture. That approach > may bear some fruit -- but only for image plots. Vector graphics > acceleration with GPUs is still difficult to do in high quality across > platforms and chipsets and beat software for speed. > So if I hear you correctly, the Matplotlib/Agg combination is not terribly slower that would be a C plotting lib using Agg as well to render... and we are talking more about hardware limitations, right? > > I've seen around previous discussions about OpenGL being a backend in some >> future... >> would it really stand up compared to the current backends? is there clues >> about that right now? >> > Thanks Nicolas, I' ll take a closer look at GLnumpy.... I can probably gather some info by making a comparison of an imshow to the equivalent in OGL.... > >> thanks for any inputs! :D >> bye >> >> > Hope this helps, it did! thanks jimmy > > Mike > > -- > Michael Droettboom > Science Software Branch > Operations and Engineering Division > Space Telescope Science Institute > Operated by AURA for NASA > > |
From: Jimmy P. <jim...@gm...> - 2009-06-17 16:07:23
|
The demo-animation.py worked beautifully out of the box at 150fps.... I upped a bit the array size to 1200x1200...still around 40fps... very interesting... jimmy 2009/6/17 Jimmy Paillet <jim...@gm...> > > > 2009/6/17 Michael Droettboom <md...@st...> > >> vehemental wrote: >> >>> Hello, >>> >>> I'm using matplotlib for various tasks beautifully...but on some >>> occasions, >>> I have to visualize large datasets (in the range of 10M data points) >>> (using >>> imshow or regular plots)...system start to choke a bit at that point... >>> >>> >> The first thing I would check is whether your system becomes starved for >> memory at this point and virtual memory swapping kicks in. > > > the python process is sitting around a 300Mo of memory comsumption....there > should plenty of memory left... > but I will look more closely to what's happenning... > I would assume the Memory bandwidth to not be very high, given the > cheapness of the comp i' m using :D > >> >> >> A common technique for faster plotting of image data is to downsample it >> before passing it to matplotlib. Same with line plots -- they can be >> decimated. There is newer/faster path simplification code in SVN trunk that >> may help with complex line plots (when the path.simplify rcParam is True). >> I would suggest starting with that as a baseline to see how much >> performance it already gives over the released version. > > > yes totally make sense...no need to visualize 3 millions points if you can > only display 200 000.... > I'm already doing that to some extent, but it's taking time on its > own...but at least I have solutions to reduce this time if needed.... > i' ll try the SVN version....see if I can extract some improvements.... > > >> >> I would like to be consistent somehow and not use different tools for >>> basically similar tasks... >>> so I'd like some pointers regarding rendering performance...as I would be >>> interested to be involved in dev is there is something to be done.... >>> >>> To active developers, what's the general feel does matplotlib have room >>> to >>> spare in its rendering performance?... >>> >>> >> I've spent a lot of time optimizing the Agg backend (which is already one >> of the fastest software-only approaches out there), and I'm out of obvious >> ideas. But a fresh set of eyes may find new things. An advantage of Agg >> that shouldn't be overlooked is that is works identically everywhere. >> >>> or is it pretty tied down to the speed of Agg right now? >>> Is there something to gain from using the multiprocessing module now >>> included by default in 2.6? >>> >>> >> Probably not. If the work of rendering were to be divided among cores, >> that would probably be done at the C++ level anyway to see any gains. As it >> is, the problem with plotting many points generally tends to be limited by >> memory bandwidth anyway, not processor speed. >> >>> or even go as far as using something like pyGPU for fast vectorized >>> computations...? >>> >>> >> Perhaps. But again, the computation isn't the bottleneck -- it's usually >> a memory bandwidth starvation issue in my experience. Using a GPU may only >> make matters worse. Note that I consider that approach distinct from just >> using OpenGL to colormap and render the image as a texture. That approach >> may bear some fruit -- but only for image plots. Vector graphics >> acceleration with GPUs is still difficult to do in high quality across >> platforms and chipsets and beat software for speed. >> > > > So if I hear you correctly, the Matplotlib/Agg combination is not terribly > slower that would be a C plotting lib using Agg as well to render... > and we are talking more about hardware limitations, right? > > >> >> I've seen around previous discussions about OpenGL being a backend in >>> some >>> future... >>> would it really stand up compared to the current backends? is there >>> clues >>> about that right now? >>> >> > Thanks Nicolas, I' ll take a closer look at GLnumpy.... > I can probably gather some info by making a comparison of an imshow to the > equivalent in OGL.... > > > >> >>> thanks for any inputs! :D >>> bye >>> >>> >> Hope this helps, > > > it did! thanks > jimmy > > >> >> Mike >> >> -- >> Michael Droettboom >> Science Software Branch >> Operations and Engineering Division >> Space Telescope Science Institute >> Operated by AURA for NASA >> >> > |
From: Gökhan S. <gok...@gm...> - 2009-06-17 15:10:38
|
On Wed, Jun 17, 2009 at 9:25 AM, Nicolas Rougier <Nic...@lo...>wrote: > > Hello, > > To give you some hints on performances using OpenGL, you can have a look > at glumpy: http://www.loria.fr/~rougier/tmp/glumpy.tgz<http://www.loria.fr/%7Erougier/tmp/glumpy.tgz> > (It requires pyglet for the OpenGL backend). > > It is not yet finished but it is usable. Current version allows to > visualize static numpy float32 array up to 8000x8000 and dynamic numpy > float32 array around 500x500 depending on GPU hardware (dynamic means > that you update image at around 30 fps/second). > > The idea behind glumpy is to directly translate a numpy array into a > texture and to use shaders to make the colormap transformation and > filtering (nearest, bilinear or bicubic). > > Nicolas Nicholas, How do you run a the demo scripts in glumpy? I get errors both with Ipython run and python script_name.py In [1]: run demo-simple.py --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/gsever/glumpy/demo-simple.py in <module>() 20 # 21 # ----------------------------------------------------------------------------- ---> 22 import glumpy 23 import numpy as np 24 import pyglet, pyglet.gl as gl /home/gsever/glumpy/glumpy/__init__.py in <module>() 23 import colormap 24 from color import Color ---> 25 from image import Image 26 from trackball import Trackball 27 from app import app, proxy /home/gsever/glumpy/glumpy/image.py in <module>() 25 26 ---> 27 class Image(object): 28 ''' ''' 29 def __init__(self, Z, format=None, cmap=colormap.IceAndFire, vmin=None, /home/gsever/glumpy/glumpy/image.py in Image() 119 return self._cmap 120 --> 121 @cmap.setter 122 def cmap(self, cmap): 123 ''' Colormap to be used to represent the array. ''' AttributeError: 'property' object has no attribute 'setter' WARNING: Failure executing file: <demo-simple.py> [gsever@ccn glumpy]$ python demo-cube.py Traceback (most recent call last): File "demo-cube.py", line 22, in <module> import glumpy File "/home/gsever/glumpy/glumpy/__init__.py", line 25, in <module> from image import Image File "/home/gsever/glumpy/glumpy/image.py", line 27, in <module> class Image(object): File "/home/gsever/glumpy/glumpy/image.py", line 121, in Image @cmap.setter AttributeError: 'property' object has no attribute 'setter' Have Python 2.5.2... |
From: Nicolas R. <Nic...@lo...> - 2009-06-17 15:29:19
|
I think the setter method is available in python 2.6 only. I modified sources and put them at same place. It should be ok now. Nicolas On Wed, 2009-06-17 at 10:10 -0500, Gökhan SEVER wrote: > On Wed, Jun 17, 2009 at 9:25 AM, Nicolas Rougier > <Nic...@lo...> wrote: > > Hello, > > To give you some hints on performances using OpenGL, you can > have a look > at glumpy: http://www.loria.fr/~rougier/tmp/glumpy.tgz > (It requires pyglet for the OpenGL backend). > > It is not yet finished but it is usable. Current version > allows to > visualize static numpy float32 array up to 8000x8000 and > dynamic numpy > float32 array around 500x500 depending on GPU hardware > (dynamic means > that you update image at around 30 fps/second). > > The idea behind glumpy is to directly translate a numpy array > into a > texture and to use shaders to make the colormap transformation > and > filtering (nearest, bilinear or bicubic). > > Nicolas > > Nicholas, > > How do you run a the demo scripts in glumpy? > > I get errors both with Ipython run and python script_name.py > > In [1]: run demo-simple.py > --------------------------------------------------------------------------- > AttributeError Traceback (most recent call > last) > > /home/gsever/glumpy/demo-simple.py in <module>() > 20 # > 21 # > ----------------------------------------------------------------------------- > ---> 22 import glumpy > 23 import numpy as np > 24 import pyglet, pyglet.gl as gl > > /home/gsever/glumpy/glumpy/__init__.py in <module>() > 23 import colormap > 24 from color import Color > ---> 25 from image import Image > 26 from trackball import Trackball > 27 from app import app, proxy > > /home/gsever/glumpy/glumpy/image.py in <module>() > 25 > 26 > ---> 27 class Image(object): > 28 ''' ''' > 29 def __init__(self, Z, format=None, > cmap=colormap.IceAndFire, vmin=None, > > /home/gsever/glumpy/glumpy/image.py in Image() > 119 return self._cmap > 120 > --> 121 @cmap.setter > 122 def cmap(self, cmap): > 123 ''' Colormap to be used to represent the array. ''' > > AttributeError: 'property' object has no attribute 'setter' > WARNING: Failure executing file: <demo-simple.py> > > > > > > [gsever@ccn glumpy]$ python demo-cube.py > Traceback (most recent call last): > File "demo-cube.py", line 22, in <module> > import glumpy > File "/home/gsever/glumpy/glumpy/__init__.py", line 25, in <module> > from image import Image > File "/home/gsever/glumpy/glumpy/image.py", line 27, in <module> > class Image(object): > File "/home/gsever/glumpy/glumpy/image.py", line 121, in Image > @cmap.setter > AttributeError: 'property' object has no attribute 'setter' > > > Have Python 2.5.2... > > |