## Re: [Matplotlib-users] large data sets and performance

 Re: [Matplotlib-users] large data sets and performance From: John Hunter - 2004-02-12 04:26:33 ```>>>>> "Peter" == Peter Groszkowski writes: Peter> Although the data I'm playing with right now is monotonic Peter> (in x), I cannot assume that this will always be the case, Peter> and need an efficient solutions for all situations. Agreed. Peter> the 'lod' option in: l = plot(arange(10000), Peter> arange(20000,30000)) #dummy data.. 10,000 pairs set(l, Peter> 'lod', True) option does not work for me. It's still Peter> roughly 1000 points/second I left out a *critical* detail. The new gd backend code implements antialiased drawing by default. Very slow. Check out the numbers below based on the demo script you supplied backend = 'GD' import matplotlib matplotlib.use(backend) from matplotlib.matlab import * l = plot(arange(10000), arange(20000,30000)) #dummy data.. 10,000 pairs lod, aa = False, False print 'Backend: %s, LOD %d, AA %d' % (backend, lod, aa) set(l, 'lod', lod, 'antialiased', aa) savefig('test') Backend: GD, LOD 1, AA 1 23.770u 0.030s 0:23.77 100.1% 0+0k 0+0io 793pf+0w Backend: GD, LOD 0, AA 1 23.500u 0.020s 0:23.52 100.0% 0+0k 0+0io 793pf+0w Backend: GD, LOD 1, AA 0 0.270u 0.000s 0:00.28 96.4% 0+0k 0+0io 794pf+0w Backend: GD, LOD 0, AA 0 0.240u 0.030s 0:00.27 100.0% 0+0k 0+0io 794pf+0w In other words, if you are using the new GD in it's default configuration, you are paying a *100 fold performance hit* for antialiased line drawing. Without it, I can draw and save your figure (including python startup time, etc, etc) in 0.25s on a 2GHz Pentium 4. Is this in the ballpark for you, performance wise? While we're on the subject of performance, I took the opportunity to test the other backends. Note the numbers are not strictly comparable (discussed below) but are informative. Backend: Paint, LOD 0, AA 0 0.520u 0.000s 0:00.52 100.0% 0+0k 0+0io 726pf+0w Backend: PS, LOD 0, AA 0 1.030u 0.040s 0:01.08 99.0% 0+0k 0+0io 582pf+0w Backend: Agg, LOD 0, AA 0 0.320u 0.010s 0:00.28 117.8% 0+0k 0+0io 681pf+0w Backend: GTK, LOD 0, AA 0 0.650u 0.020s 0:00.66 101.5% 0+0k 0+0io 3031pf+0w The GTK results are in xvfb so it appears to be a no-go for you even if we could figure out how to print to stdout. These numbers are repeatable and consistent. Worthy of comment: * GD with antialiased off wins * paint is not as fast as I hoped * GTK is not as fast as I thought * Agg is an interesting case. It is doing antialiased drawing despite the AA 0 flag because I haven't made this conditional in the backend. It draws antialised unconditionally currently. But it hasn't implemented text yet. So it's not strictly comparable, but it is noteworthy that it is 100 times faster than GD at AA lines. It remains to be seen what speed we can get with plain vanilla aliased rendering. My guess is: when you turn off antialiasing you'll be a whole lot happier. Let me know. The last thing I looked at was how the GD numbers scale with line size. Below, N is the number of data points (with LOD false the numbers are very close to these results where LOD is true) Backend: GD, LOD 1, AA 0, N 10000 0.230u 0.040s 0:00.24 112.5% 0+0k 0+0io 794pf+0w Backend: GD, LOD 1, AA 0, N 20000 0.260u 0.060s 0:00.31 103.2% 0+0k 0+0io 794pf+0w Backend: GD, LOD 1, AA 0, N 40000 0.390u 0.030s 0:00.41 102.4% 0+0k 0+0io 794pf+0w Backend: GD, LOD 1, AA 0, N 80000 0.590u 0.060s 0:00.60 108.3% 0+0k 0+0io 815pf+0w Backend: GD, LOD 1, AA 0, N 160000 1.070u 0.090s 0:01.13 102.6% 0+0k 0+0io 818pf+0w JDH ```