|
From: Martin M. <mmo...@gm...> - 2013-10-10 14:20:18
|
Michael Droettboom wrote: > On 10/10/2013 09:47 AM, Martin MOKREJŠ wrote: >> Benjamin Root wrote: >>> >>> >>> On Thu, Oct 10, 2013 at 9:05 AM, Martin MOKREJŠ <mmo...@gm... <mailto:mmo...@gm...>> wrote: >>> >>> Hi, >>> rendering some of my charts takes almost 50GB of RAM. I believe below is a stracktrace >>> of one such situation when it already took 15GB. Would somebody comments on what is >>> matplotlib doing at the very moment? Why the recursion? >>> >>> The charts had to have 262422 data points in a 2D scatter plot, each point has assigned >>> its own color. They are in batches so that there are 153 distinct colors but nevertheless, >>> I assigned to each data point a color value. There are 153 legend items also (one color >>> won't be used). >>> >>> ^CTraceback (most recent call last): >>> ... >>> _figure.savefig(filename, dpi=100) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1421, in savefig >>> self.canvas.print_figure(*args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backend_bases.py", line 2220, in print_figure >>> **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 505, in print_png >>> FigureCanvasAgg.draw(self) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_agg.py", line 451, in draw >>> self.figure.draw(self.renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/figure.py", line 1034, in draw >>> func(*args) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/axes.py", line 2086, in draw >>> a.draw(renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 718, in draw >>> return Collection.draw(self, renderer) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/artist.py", line 54, in draw_wrapper >>> draw(artist, renderer, *args, **kwargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 276, in draw >>> offsets, transOffset, self.get_facecolor(), self.get_edgecolor(), >>> File "/usr/lib64/python2.7/site-packages/matplotlib/collections.py", line 551, in get_edgecolor >>> return self._edgecolors >>> KeyboardInterrupt >>> ^CError in atexit._run_exitfuncs: >>> Traceback (most recent call last): >>> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >>> func(*targs, **kargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >>> gc.collect() >>> KeyboardInterrupt >>> Error in sys.exitfunc: >>> Traceback (most recent call last): >>> File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs >>> func(*targs, **kargs) >>> File "/usr/lib64/python2.7/site-packages/matplotlib/_pylab_helpers.py", line 90, in destroy_all >>> gc.collect() >>> KeyboardInterrupt >>> >>> ^C >>> >>> >>> Clues what is the code doing? I use mpl-1.3.0. >>> Thank you, >>> Martin >>> >>> >>> Unfortunately, that stacktrace isn't very useful. There is no recursion there, but rather the perfectly normal drawing of the figure object that has a child axes, which has child collections which have child artist objects. >>> >>> Without the accompanying code, it would be difficult to determine where the memory hog is. >> Could there be places where gc.collect() could be introduced? Are there places where matplotlib >> could del() unnecessary objects right away? I think the problem is with huge lists or pythonic >> dicts. I could save 10GB of RAM when I converted one python dict to a bsddb3 file having just >> 10MB on disk. I speculate matplotlib in that code keeps the data in some huge list or more likely >> a dict and that is the same issue. >> >> Are you sure you cannot see where a problem is? It happens (is visible) only with huge number of >> dots, of course. > > Matplotlib generally keeps data in Numpy arrays, not lists or > dictionaries (though given that matplotlib predates Numpy, there are > some corner cases we've found recently where arrays are converted to > lists and back unintentionally). Just a brief note. I don't use Numpy myself in my code, so consider that while replicating my use case. ;) The code is merely what I think Tony Yu of Chao Yue proposed or somebody, sorry, don't remember now, proposed to me on this list in the past. I am writing it now really from top of my head, maybe I remember rubbish. ;) Martin |