|
From: Sergi P. F. <spo...@gm...> - 2012-05-23 08:12:06
|
I'm plotting several images at once, sharing axes, because I use it
for exploratory purposes. Each image is the same satellite image at
different dates. I'm experimenting a slow response from matplotlib
when zooming and panning, and I would like to ask for any tips that
could speed up the process.
What I am doing now is:
- Load data from several netcdf files.
- Calculate maximum value of all the data, for normalization.
- Create a grid of subplots using ImageGrid. As each subplot is
generated, I delete the array to free some memory (each array is
stored in a list, the "deletion" is just a list.pop()). See the code
below.
It's 15 images, single-channel, of 4600x3840 pixels each. I've noticed
that the bottleneck is not the RAM (I have 8 GB), but the processor.
Python spikes to 100% usage on one of the cores when zooming or
panning (it's an Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz, 4 cores, 64
bit).
The code is:
-------------------------------------------
import os
import sys
import numpy as np
import netCDF4 as ncdf
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from matplotlib.colors import LogNorm
MIN = 0.001 # Hardcoded minimum data value used in normalization
variable = 'conc_chl'
units = r'$mg/m^3$'
data = []
dates = []
# Get a list of only netCDF files
filelist = os.listdir(sys.argv[1])
filelist = [f for f in filelist if os.path.splitext(f)[1] == '.nc']
filelist.sort()
filelist.reverse()
# Load data and extract dates from filenames
for f in filelist:
dataset = ncdf.Dataset(os.path.join(sys.argv[1],f), 'r')
data.append(dataset.variables[variable][:])
dataset.close()
dates.append((f.split('_')[2][:-3],f.split('_')[1]))
# Get the maximum value of all data. Will be used for normalization
maxc = np.array(data).max()
# Plot the grid of images + dates
fig = plt.figure()
grid = ImageGrid(fig, 111,\
nrows_ncols = (3, 5),\
axes_pad = 0.0,\
share_all=True,\
aspect = False,\
cbar_location = "right",\
cbar_mode = "single",\
cbar_size = '2.5%',\
)
for g in grid:
v = data.pop()
d = dates.pop()
im = g.imshow(v, interpolation='none', norm=LogNorm(), vmin=MIN, vmax=maxc)
g.text(0.01, 0.01, '-'.join(d), transform = g.transAxes) # Date on a corner
cticks = np.logspace(np.log10(MIN), np.log10(maxc), 5)
cbar = grid.cbar_axes[0].colorbar(im)
cbar.ax.set_yticks(cticks)
cbar.ax.set_yticklabels([str(np.round(t, 2)) for t in cticks])
cbar.set_label_text(units)
# Fine-tune figure; make subplots close to each other and hide x ticks for
# all
fig.subplots_adjust(left=0.02, bottom=0.02, right=0.95, top=0.98,
hspace=0, wspace=0)
grid.axes_llc.set_yticklabels([], visible=False)
grid.axes_llc.set_xticklabels([], visible=False)
plt.show()
-------------------------------------------
Any clue about what could be improved to make it more responsive?
PD: This question has been posted previously on Stackoverflow, but it
hasn't got any answer:
http://stackoverflow.com/questions/10635901/slow-imshow-when-zooming-or-panning-with-several-synced-subplots
|
|
From: Guillaume G. <gui...@mi...> - 2012-05-23 09:22:32
|
Hello
What is the size of a single image file? If they are very big, it is
better to do everything from processing to ploting at once for each file.
Le 23/05/2012 10:11, Sergi Pons Freixes a écrit :
> I'm plotting several images at once, sharing axes, because I use it
> for exploratory purposes. Each image is the same satellite image at
> different dates. I'm experimenting a slow response from matplotlib
> when zooming and panning, and I would like to ask for any tips that
> could speed up the process.
>
> What I am doing now is:
> - Load data from several netcdf files.
> - Calculate maximum value of all the data, for normalization.
> - Create a grid of subplots using ImageGrid. As each subplot is
> generated, I delete the array to free some memory (each array is
> stored in a list, the "deletion" is just a list.pop()). See the code
> below.
>
> It's 15 images, single-channel, of 4600x3840 pixels each.
This is a lot of data. 8bit or 16bit ?
> I've noticed
> that the bottleneck is not the RAM (I have 8 GB), but the processor.
> Python spikes to 100% usage on one of the cores when zooming or
> panning (it's an Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz, 4 cores, 64
> bit).
>
> The code is:
> -------------------------------------------
> import os
> import sys
>
> import numpy as np
> import netCDF4 as ncdf
> import matplotlib.pyplot as plt
> from mpl_toolkits.axes_grid1 import ImageGrid
> from matplotlib.colors import LogNorm
>
> MIN = 0.001 # Hardcoded minimum data value used in normalization
>
> variable = 'conc_chl'
> units = r'$mg/m^3$'
> data = []
> dates = []
>
> # Get a list of only netCDF files
> filelist = os.listdir(sys.argv[1])
> filelist = [f for f in filelist if os.path.splitext(f)[1] == '.nc']
> filelist.sort()
> filelist.reverse()
>
> # Load data and extract dates from filenames
> for f in filelist:
everything should happen in this loop
> dataset = ncdf.Dataset(os.path.join(sys.argv[1],f), 'r')
> data.append(dataset.variables[variable][:])
instead of creating this big list, use a temporary array (which will be
overwritten)
> dataset.close()
> dates.append((f.split('_')[2][:-3],f.split('_')[1]))
>
> # Get the maximum value of all data. Will be used for normalization
> maxc = np.array(data).max()
>
> # Plot the grid of images + dates
> fig = plt.figure()
> grid = ImageGrid(fig, 111,\
> nrows_ncols = (3, 5),\
> axes_pad = 0.0,\
> share_all=True,\
> aspect = False,\
> cbar_location = "right",\
> cbar_mode = "single",\
> cbar_size = '2.5%',\
> )
> for g in grid:
> v = data.pop()
> d = dates.pop()
> im = g.imshow(v, interpolation='none', norm=LogNorm(), vmin=MIN, vmax=maxc)
> g.text(0.01, 0.01, '-'.join(d), transform = g.transAxes) # Date on a corner
> cticks = np.logspace(np.log10(MIN), np.log10(maxc), 5)
> cbar = grid.cbar_axes[0].colorbar(im)
> cbar.ax.set_yticks(cticks)
> cbar.ax.set_yticklabels([str(np.round(t, 2)) for t in cticks])
> cbar.set_label_text(units)
>
> # Fine-tune figure; make subplots close to each other and hide x ticks for
> # all
> fig.subplots_adjust(left=0.02, bottom=0.02, right=0.95, top=0.98,
> hspace=0, wspace=0)
> grid.axes_llc.set_yticklabels([], visible=False)
> grid.axes_llc.set_xticklabels([], visible=False)
>
> plt.show()
> -------------------------------------------
>
> Any clue about what could be improved to make it more responsive?
>
> PD: This question has been posted previously on Stackoverflow, but it
> hasn't got any answer:
> http://stackoverflow.com/questions/10635901/slow-imshow-when-zooming-or-panning-with-several-synced-subplots
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>
>
|
|
From: Sergi P. F. <spo...@gm...> - 2012-05-23 13:04:32
|
On Wed, May 23, 2012 at 11:00 AM, Guillaume Gay
<gui...@mi...> wrote:
> Hello
>
>
> What is the size of a single image file? If they are very big, it is
> better to do everything from processing to ploting at once for each file.
As stated below, each image is single-channel, of 4600x3840 pixels. As
you can see on the code, there is not much processing, just loading
the images and plotting them. What it's slow is not the execution of
the code, is the interactive zooming and panning once the plots "are
in the screen".
>> It's 15 images, single-channel, of 4600x3840 pixels each.
> This is a lot of data. 8bit or 16bit ?
They are floating point values (for example, from 0 to 45.xxx). If I
understood correctly, setting the vmin and vmax, matplotlib should
normalize the values to an appropriate number of bits.
>> for f in filelist:
> everything should happen in this loop
>
>> dataset = ncdf.Dataset(os.path.join(sys.argv[1],f), 'r')
>> data.append(dataset.variables[variable][:])
> instead of creating this big list, use a temporary array (which will be
> overwritten)
>> dataset.close()
>> dates.append((f.split('_')[2][:-3],f.split('_')[1]))
Why? It's true that this way at the beginning it eats a lot of RAM,
but then it is released after each pop() (and calculating the maximum
of all the data without plotting is needed to use the same
normalization level on all the plots). Anyway, the slowness ocurrs
during the interaction of the plot, not during the execution of the
code.
|
|
From: Guillaume G. <gui...@mi...> - 2012-05-23 15:27:30
|
Le 23/05/2012 15:04, Sergi Pons Freixes a écrit :
> On Wed, May 23, 2012 at 11:00 AM, Guillaume Gay
> <gui...@mi...> wrote:
>> Hello
>>
>>
>> What is the size of a single image file? If they are very big, it is
>> better to do everything from processing to ploting at once for each file.
> As stated below, each image is single-channel, of 4600x3840 pixels. As
> you can see on the code, there is not much processing, just loading
> the images and plotting them. What it's slow is not the execution of
> the code, is the interactive zooming and panning once the plots "are
> in the screen".
>
>>> It's 15 images, single-channel, of 4600x3840 pixels each.
>> This is a lot of data. 8bit or 16bit ?
> They are floating point values (for example, from 0 to 45.xxx). If I
> understood correctly, setting the vmin and vmax, matplotlib should
> normalize the values to an appropriate number of bits.
>
>>> for f in filelist:
>> everything should happen in this loop
>>
>>> dataset = ncdf.Dataset(os.path.join(sys.argv[1],f), 'r')
>>> data.append(dataset.variables[variable][:])
>> instead of creating this big list, use a temporary array (which will be
>> overwritten)
>>> dataset.close()
>>> dates.append((f.split('_')[2][:-3],f.split('_')[1]))
> Why? It's true that this way at the beginning it eats a lot of RAM,
> but then it is released after each pop()
oh I didn't see the pop()...
So now then I don't know...
Do you have to show them full-scale? Maybe you can just use thumbnails
of sort?
G.
> (and calculating the maximum
> of all the data without plotting is needed to use the same
> normalization level on all the plots). Anyway, the slowness ocurrs
> during the interaction of the plot, not during the execution of the
> code.
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>
>
|
|
From: Tony Yu <ts...@gm...> - 2012-05-23 16:28:33
|
On Wed, May 23, 2012 at 9:04 AM, Sergi Pons Freixes <spo...@gm...>wrote: > On Wed, May 23, 2012 at 11:00 AM, Guillaume Gay > <gui...@mi...> wrote: > > Hello > > > > > > What is the size of a single image file? If they are very big, it is > > better to do everything from processing to ploting at once for each file. > > As stated below, each image is single-channel, of 4600x3840 pixels. As > you can see on the code, there is not much processing, just loading > the images and plotting them. What it's slow is not the execution of > the code, is the interactive zooming and panning once the plots "are > in the screen". > > >> It's 15 images, single-channel, of 4600x3840 pixels each. > > This is a lot of data. 8bit or 16bit ? > > They are floating point values (for example, from 0 to 45.xxx). If I > understood correctly, setting the vmin and vmax, matplotlib should > normalize the values to an appropriate number of bits. > > I'm not sure what you mean by "normalize the values to an appropriate number of bits", but I don't think setting `vmin` or `vmax` will change the data type of the image. So if you have 64-bit floating point images (100+ Mb per image), then that's what you're going to be moving/scaling when you pan and zoom. -Tony |
|
From: Sergi P. F. <spo...@gm...> - 2012-05-24 13:14:39
|
On Wed, May 23, 2012 at 6:27 PM, Tony Yu <ts...@gm...> wrote:
>
> I'm not sure what you mean by "normalize the values to an appropriate number
> of bits", but I don't think setting `vmin` or `vmax` will change the data
> type of the image. So if you have 64-bit floating point images (100+ Mb per
> image), then that's what you're going to be moving/scaling when you pan and
> zoom.
I was just guessing that it is part of the process of converting
actual data (32 bit floats) to images on the screen (24 bit for RGB
(32 with transparency) or 8 bit for grayscale).
I tried converting the data to 8 bit, with .astype('uint8'), and it
keeps being poorly responsive on zooming and panning.
|
|
From: Tony Yu <ts...@gm...> - 2012-05-24 14:00:14
|
On Thu, May 24, 2012 at 9:14 AM, Sergi Pons Freixes
<spo...@gm...>wrote:
> On Wed, May 23, 2012 at 6:27 PM, Tony Yu <ts...@gm...> wrote:
> >
> > I'm not sure what you mean by "normalize the values to an appropriate
> number
> > of bits", but I don't think setting `vmin` or `vmax` will change the data
> > type of the image. So if you have 64-bit floating point images (100+ Mb
> per
> > image), then that's what you're going to be moving/scaling when you pan
> and
> > zoom.
>
> I was just guessing that it is part of the process of converting
> actual data (32 bit floats) to images on the screen (24 bit for RGB
> (32 with transparency) or 8 bit for grayscale).
>
> I tried converting the data to 8 bit, with .astype('uint8'), and it
> keeps being poorly responsive on zooming and panning.
>
>
It seems that setting `interpolation='none'` is significantly slower than
setting it to 'nearest' (or even 'bilinear'). On supported backends (e.g.
any Agg backend) the code paths for 'none' and 'nearest' are different:
'nearest' gets passed to Agg's interpolation routine, whereas 'none' does
an unsampled rescale of the image (I'm just reading the code comments
here). Could you check whether changing to `interpolation='nearest'` fixes
this issue?
-Tony
(Note: copied to stackoverflow)
PS: These different approaches *do* give different qualitative results; for
example, the code snippet below gives a slight moiré pattern, which doesn't
appear when `interpolation='none'`. I *think* that 'none' is roughly the
same as 'nearest' when zooming in (image pixels are larger than screen
pixels) but gives a higher-order interpolation result when zooming out
(image pixels smaller than screen pixels). I think the delay comes from
some extra Matplotlib/Python calculations needed for the rescaling.
#~~~
import matplotlib.pyplot as plt
import numpy as np
img = np.random.uniform(0, 255, size=(2000, 2000)).astype(np.uint8)
plt.imshow(img, interpolation='nearest')
plt.show()
|
|
From: Sergi P. F. <spo...@gm...> - 2012-05-25 08:30:53
|
> It seems that setting `interpolation='none'` is significantly slower than > setting it to 'nearest' (or even 'bilinear'). On supported backends (e.g. > any Agg backend) the code paths for 'none' and 'nearest' are different: > 'nearest' gets passed to Agg's interpolation routine, whereas 'none' does an > unsampled rescale of the image (I'm just reading the code comments here). > Could you check whether changing to `interpolation='nearest'` fixes this > issue? Yes, changing it really speeds-up the interactivity! The delay is now just a few ms, you can notice it's not completely smooth, but perfectly usable. I'll compare if when zoomed in any artifacts/distortion appear. Thank you! |