From: Andrew H. <HA...@no...> - 2010-03-18 23:36:20
Attachments:
semilogPerformance.py
|
I've observed a significant difference in the time required by different plotting functions. With a plot of 5000 random data points (all positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot. (Data for the case of saving to PDF, ratio changes to about 3.1 for PNG on my machine.) I used cProfile (script attached) and found several significant differences between the profiles of each plotting command. On my first analysis, it appears that most of the difference is due to increased use of mathtext in semilogx: ================================== Plotting command ================================================================== cumtime (s) plot semilogx semilogy loglog ================================================================== total running time 0.618 2.192 0.953 1.362 axis.py:181(draw) 0.118 1.500 0.412 0.569 text.py:504(draw) 0.056 1.353 0.290 0.287 mathtext.py:2765(__init__) 0.000 1.018 0.104 0.103 mathtext.py:2772(parse) --- 1.294 0.143 0.254 pyparsing.py:1018(parseString) --- 0.215 0.216 0.221 pyparsing.py:3129(oneOf) --- 0.991 --- --- pyparsing.py:3147(<lambda>) --- 0.358 --- --- lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352 ================================================================= It seems that semilogx could be made as fast as semilogy since they have to do the same amount of work, but I'm not sure where the differences lie. Can anyone suggest where I should look first? Much thanks, Andrew Hawryluk matplotlib.__version__ = '0.99.1' Windows XP Professional Version 2002, Service Pack 3 Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM |
From: Gökhan S. <gok...@gm...> - 2010-03-19 15:39:15
|
On Thu, Mar 18, 2010 at 6:21 PM, Andrew Hawryluk <HA...@no...>wrote: > I've observed a significant difference in the time required by different > plotting functions. With a plot of 5000 random data points (all > positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot. > (Data for the case of saving to PDF, ratio changes to about 3.1 for PNG > on my machine.) > > I used cProfile (script attached) and found several significant > differences between the profiles of each plotting command. On my first > analysis, it appears that most of the difference is due to increased use > of mathtext in semilogx: > > ================================== > Plotting command > ================================================================== > cumtime (s) plot semilogx semilogy loglog > ================================================================== > total running time 0.618 2.192 0.953 1.362 > axis.py:181(draw) 0.118 1.500 0.412 0.569 > text.py:504(draw) 0.056 1.353 0.290 0.287 > mathtext.py:2765(__init__) 0.000 1.018 0.104 0.103 > mathtext.py:2772(parse) --- 1.294 0.143 0.254 > pyparsing.py:1018(parseString) --- 0.215 0.216 0.221 > pyparsing.py:3129(oneOf) --- 0.991 --- --- > pyparsing.py:3147(<lambda>) --- 0.358 --- --- > lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352 > ================================================================= > > It seems that semilogx could be made as fast as semilogy since they have > to do the same amount of work, but I'm not sure where the differences > lie. Can anyone suggest where I should look first? > > Much thanks, > > Andrew Hawryluk > > matplotlib.__version__ = '0.99.1' > Windows XP Professional > Version 2002, Service Pack 3 > Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM > > > Hello, How did you get the cumtime listing? The output of the run doesn't produce a cumulative sum table as you showed here. ================================================================================ Platform : Linux-2.6.31.9-174.fc12.i686.PAE-i686-with-fedora-12-Constantine Python : ('CPython', 'tags/r262', '71600') NumPy : 1.5.0.dev8038 Matplotlib : 1.0.svn ================================================================================ -- Gökhan |
From: Andrew H. <HA...@no...> - 2010-03-19 16:26:57
|
> Hello, > How did you get the cumtime listing? The output of the run doesn't produce a > cumulative sum table as you showed here. > Gökhan No, it doesn't. The output of the run is four huge cProfile listings, one for each plotting command tested. I manually searched the data for long cumtime's that differed between the plots and typed the table myself. I have also confirmed the speed differences on matplotlib 0.99.0 under Ubuntu 9.10: plot 0.629 CPU seconds semilogx 3.430 CPU seconds semilogy 1.044 CPU seconds loglog 1.479 CPU seconds I'll try to figure out why semilogx uses so much more mathtext than semilogy, but if anyone familiar with the code is curious enough to look into it they will probably beat me to the answer. Andrew |
From: Michael D. <md...@st...> - 2010-03-19 16:39:44
|
This is indeed a very interesting result and I am able to reproduce similar ratios for total running time. However, I think the semilogx result is somewhat of a red herring. If you change the order of the tests in your script, you'll notice that the first "*log*" plot always takes the longest run time. If you run each test in a separate process, all of the "*log*" run times are approximately equal (with loglog being slightly slower). The reason for this is the caching of mathtext expressions. I agree that mathtext is the bottleneck -- but mathtext expressions are only parsed and rendered the first time they are encountered, and simply pulled from a cache after that. It's sort of a "known issue" that mathtext is slow-ish. It's a very function-call heavy and object-oriented bit of code and most attempts at optimization seem to lead to too much uglification. The algorithms themselves are from TeX, so I don't know if there's much room for improvement, but there is something about the translation from Pascal/C to Python that creates a very different performance profile. An interesting result may be to disable the mathtext rendering for log plots (by setting the axis formatters to something static) and comparing those numbers. That would give a better sense of the overhead of merely log-transforming the points and the transformation system itself. I don't think a factor of 2 is too problematic, given all of the extra work that has to be done to maintain two copies of the data, extra care to calculate xlim and ylim etc. Mike Andrew Hawryluk wrote: > I've observed a significant difference in the time required by different > plotting functions. With a plot of 5000 random data points (all > positive, non-zero), plt.semilogx takes 3.5 times as long as plt.plot. > (Data for the case of saving to PDF, ratio changes to about 3.1 for PNG > on my machine.) > > I used cProfile (script attached) and found several significant > differences between the profiles of each plotting command. On my first > analysis, it appears that most of the difference is due to increased use > of mathtext in semilogx: > > ================================== > Plotting command > ================================================================== > cumtime (s) plot semilogx semilogy loglog > ================================================================== > total running time 0.618 2.192 0.953 1.362 > axis.py:181(draw) 0.118 1.500 0.412 0.569 > text.py:504(draw) 0.056 1.353 0.290 0.287 > mathtext.py:2765(__init__) 0.000 1.018 0.104 0.103 > mathtext.py:2772(parse) --- 1.294 0.143 0.254 > pyparsing.py:1018(parseString) --- 0.215 0.216 0.221 > pyparsing.py:3129(oneOf) --- 0.991 --- --- > pyparsing.py:3147(<lambda>) --- 0.358 --- --- > lines.py:918(_draw_solid) 0.243 0.358 0.234 0.352 > ================================================================= > > It seems that semilogx could be made as fast as semilogy since they have > to do the same amount of work, but I'm not sure where the differences > lie. Can anyone suggest where I should look first? > > Much thanks, > > Andrew Hawryluk > > matplotlib.__version__ = '0.99.1' > Windows XP Professional > Version 2002, Service Pack 3 > Intel Pentium 4 CPU 3.00 GHz, 2.99 GHz, 0.99 GB of RAM > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > ------------------------------------------------------------------------ > > _______________________________________________ > Matplotlib-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel > -- Michael Droettboom Science Software Branch Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |