From: Boris B. <ba...@en...> - 2008-08-10 13:06:24
|
Hi, I have lots of data acquired via analogue to digital conversion. The data is consequently represented as integers (often 16 bit resolution). To obtain the correct signal and plot it, these data must of course be multiplied by a floating point scale factor. This seems potentially wasteful of resources (time and memory), especially as I would prefer to keep the original data untouched. It occurs to me that a more efficient plotting method would be to plot the original data but scale the axes by the appropriate factor. In that way a simple numpy array view could be passed to plot. Does a method for doing this exist? I think I can do it in a rather convoluted way by plotting the original data and then superimposing empty axes at the adjusted scale. However, I haven't yet tested this and I'm a bit skeptical about the overhead of two plots. Another possibility might be the units mechanism, but according to the documentation that is discouraged, and it might be awkward to implement. If the possibility doesn't exist, I wonder whether it might be feasible - and not too difficult - to add to the axis methods? One could add a scale parameter with a default value of 1 that should not affect existing code. Boris |
From: Eric F. <ef...@ha...> - 2008-08-10 21:03:04
|
Boris Barbour wrote: > Hi, > > I have lots of data acquired via analogue to digital conversion. The data is > consequently represented as integers (often 16 bit resolution). To obtain the > correct signal and plot it, these data must of course be multiplied by a > floating point scale factor. This seems potentially wasteful of resources > (time and memory), especially as I would prefer to keep the original data > untouched. I don't understand this last clause; scaling your original integer data prior to plotting does not in any way inhibit your storage and use of that original integer data. > > It occurs to me that a more efficient plotting method would be to plot the > original data but scale the axes by the appropriate factor. In that way a > simple numpy array view could be passed to plot. Does a method for doing this > exist? I think I can do it in a rather convoluted way by plotting the > original data and then superimposing empty axes at the adjusted scale. > However, I haven't yet tested this and I'm a bit skeptical about the overhead > of two plots. Another possibility might be the units mechanism, but according > to the documentation that is discouraged, and it might be awkward to > implement. > > If the possibility doesn't exist, I wonder whether it might be feasible - and > not too difficult - to add to the axis methods? One could add a scale > parameter with a default value of 1 that should not affect existing code. For ordinary plots in matplotlib the data will be converted to double precision anyway, and the time required for you to do your own scaling and conversion is utterly negligible compared to the total plotting time. I don't think it will make any difference in memory usage, either. Matplotlib uses asarray(), so there will not be a copy if the input is already a double precision array. It sounds like you may be thinking about optimizations in the wrong place. Are you actually running up against speed or memory problems? Eric |
From: Boris B. <ba...@en...> - 2008-08-14 23:02:49
|
Eric and John, Thanks for the information. You are right that this probably would have been a premature optimisation, even if it weren't rendered useless by matplotlib using doubles internally (which I hadn't realised). The thought just occurred to me as I was writing the data-scaling part of my script. The script is intended to be somewhat interactive. Initial tests suggest that plotting or updating several subplots from memory does take a quite noticeable time (e.g. 1.2 -- 1.5 seconds for 3 subplots of 10000 points) that will probably become annoying in routine use. As you indicated, basically all that time is spent within matplotlib. I'm just using standard default calls: for i in subplot subplot plot xlabel ylabel title Each of these calls seems to take roughly the same time (60--100ms). If anybody has pointers on speeding things up significantly, I'm all ears. (Predefining data limits? Using lower-level commands? Use of a non-default backend?) Boris |
From: John H. <jd...@gm...> - 2008-08-10 21:09:21
|
On Sun, Aug 10, 2008 at 8:06 AM, Boris Barbour <ba...@en...> wrote: > > Hi, > > I have lots of data acquired via analogue to digital conversion. The data is > consequently represented as integers (often 16 bit resolution). To obtain the > correct signal and plot it, these data must of course be multiplied by a > floating point scale factor. This seems potentially wasteful of resources > (time and memory), especially as I would prefer to keep the original data > untouched. > > It occurs to me that a more efficient plotting method would be to plot the > original data but scale the axes by the appropriate factor. In that way a > simple numpy array view could be passed to plot. Does a method for doing this > exist? I think I can do it in a rather convoluted way by plotting the > original data and then superimposing empty axes at the adjusted scale. > However, I haven't yet tested this and I'm a bit skeptical about the overhead > of two plots. Another possibility might be the units mechanism, but according > to the documentation that is discouraged, and it might be awkward to > implement. The easiest way is to define a custom formatter -- this is responsible for taking your numeric data and converting it to strings for the tick labels and navigation toolbar coordinate reporting. Eg import numpy as np import matplotlib.pyplot as plt import matplotlib.ticker as ticker t = np.arange(1000)*0.01 s = (np.random.rand(1000)*4096).astype(int) # this controls the formatting of the tick labels class VoltFormatter(ticker.Formatter): """ take input and convert to +/- 5V 0->-5, 2048->0, 4096->5 """ def __call__(self, x, pos=None): return '%1.2f'%(5*(x-2048)/4096.) formatter = VoltFormatter() fig = plt.figure() ax = fig.add_subplot(111) ax.plot(t, s) ax.yaxis.set_major_formatter(formatter) plt.show() One problem with this solution is that the tick choices are poor, since the tick locator doesn't know where to put multiple of volts. To solve this, you can write your own locator, eg as described in the user's guide, to place ticks on multiples of the integer scale. But as Eric notes, mpl will be converting your data under the hoods to doubles anyway, so you won't be getting any space and cpu savings |
From: Eric F. <ef...@ha...> - 2008-08-15 01:35:00
Attachments:
xy.py
|
Boris Barbour wrote: > Eric and John, > > Thanks for the information. You are right that this probably would have been a > premature optimisation, even if it weren't rendered useless by matplotlib > using doubles internally (which I hadn't realised). The thought just occurred > to me as I was writing the data-scaling part of my script. > > The script is intended to be somewhat interactive. Initial tests suggest that > plotting or updating several subplots from memory does take a quite > noticeable time (e.g. 1.2 -- 1.5 seconds for 3 subplots of 10000 points) that > will probably become annoying in routine use. As you indicated, basically all > that time is spent within matplotlib. I'm just using standard default calls: > > for i in subplot > subplot > plot > xlabel > ylabel > title > > Each of these calls seems to take roughly the same time (60--100ms). If It sounds like you have interactive mode on, in which case each pylab function redraws the figure. The solution is to use the object-oriented interface for almost everything. See the attached example. > anybody has pointers on speeding things up significantly, I'm all ears. > (Predefining data limits? Using lower-level commands? Use of a non-default > backend?) If the suggestion above is not enough, we will need to know more about what your script looks like, the environment in which it is running (e.g., ipython? embedded in wx? straight command line? what operating system? what backend?), your constraints, and what you are trying to accomplish. The best thing would be if you could post a very short self-contained script, typically using fake random data, that shows your present approach and that illustrates the speed problem; then we can try to figure out what the bottlenecks are, and whether there are simple ways to speed up the script or to modify mpl for better speed. Eric |
From: Boris B. <ba...@en...> - 2008-08-17 12:30:22
|
> It sounds like you have interactive mode on, in which case each pylab > function redraws the figure. Yes - it was that simple (and stupid); thanks for your patience. Turning off interactive mode and using the set_data approach leads to an execution time of about 0.05 seconds (~30-fold speed-up), which is _fine_. Thanks again for yor help. Boris |