|
From: Hartmut K. <har...@gm...> - 2014-08-10 17:43:54
|
All,
I'm running into a crash while trying to construct a
tri.LinearTriInterpolator. Here is the short version of the code:
import netCDF4
import matplotlib.tri as tri
var = netCDF4.Dataset('filename.cdf').variables
x = var['x'][:]
y = var['y'][:]
data = var['attrname'][:]
elems = var['element'][:,:]-1
triang = tri.Triangulation(x, y, triangles=elems)
# this crashes the python interpreter
interp = tri.LinearTriInterpolator(triang, data)
The data arrays (x, y, data, elems) are fairly large (>1 mio elements), all
represented as numpy arrays (as returned by netCDF4). The 'data' array is a
masked array and contains masked values.
If somebody cares, I'd be able to post a link to the netCDF data file
causing this.
All this happens when using matplotlib 1.3.1, Win32, Python 2.7.
Any help would be highly appreciated!
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu
|
|
From: Ian T. <ian...@gm...> - 2014-08-11 07:15:02
|
On 10 August 2014 18:43, Hartmut Kaiser <har...@gm...> wrote:
> All,
>
> I'm running into a crash while trying to construct a
> tri.LinearTriInterpolator. Here is the short version of the code:
>
> import netCDF4
> import matplotlib.tri as tri
>
> var = netCDF4.Dataset('filename.cdf').variables
> x = var['x'][:]
> y = var['y'][:]
> data = var['attrname'][:]
> elems = var['element'][:,:]-1
>
> triang = tri.Triangulation(x, y, triangles=elems)
>
> # this crashes the python interpreter
> interp = tri.LinearTriInterpolator(triang, data)
>
> The data arrays (x, y, data, elems) are fairly large (>1 mio elements), all
> represented as numpy arrays (as returned by netCDF4). The 'data' array is a
> masked array and contains masked values.
>
> If somebody cares, I'd be able to post a link to the netCDF data file
> causing this.
>
> All this happens when using matplotlib 1.3.1, Win32, Python 2.7.
>
> Any help would be highly appreciated!
> Regards Hartmut
>
Hartmut,
That is an excellent issue report; all the relevant information and nothing
extraneous. Hence the quick response.
The second argument to TriLinearInterpolator (and other TriInterpolator
classes), i.e. your 'data' array, is expected to be an array of the same
size as the 'x' and 'y' arrays. It is not expecting a masked array. If a
masked array is used the mask will be ignored, and so the values behind the
mask will be used as though they were real values. If my memory of netCDF
is correct, this will be whatever 'FillValue' is defined for the file, but
it may depend on what is used to generate the netCDF file.
I would normally expect the code to work but produce useless output. A
crash is possible though. It would be best if you could post a link to the
netCDF file and I will take a closer look to check there is not something
else going wrong.
Ian Thomas
|
|
From: Hartmut K. <har...@gm...> - 2014-08-11 13:54:23
|
Ian,
> I'm running into a crash while trying to construct a
> tri.LinearTriInterpolator. Here is the short version of the code:
>
> import netCDF4
> import matplotlib.tri as tri
>
> var = netCDF4.Dataset('filename.cdf').variables
> x = var['x'][:]
> y = var['y'][:]
> data = var['zeta_max'][:]
> elems = var['element'][:, :]-1
>
> triang = tri.Triangulation(x, y, triangles=elems)
>
> # this crashes the python interpreter
> interp = tri.LinearTriInterpolator(triang, data)
>
> The data arrays (x, y, data, elems) are fairly large (>1 mio elements),
> all
> represented as numpy arrays (as returned by netCDF4). The 'data' array is
> a
> masked array and contains masked values.
>
> If somebody cares, I'd be able to post a link to the netCDF data file
> causing this.
>
> All this happens when using matplotlib 1.3.1, Win32, Python 2.7.
>
> Any help would be highly appreciated!
> Regards Hartmut
>
> Hartmut,
> That is an excellent issue report; all the relevant information and
> nothing extraneous. Hence the quick response.
> The second argument to TriLinearInterpolator (and other TriInterpolator
> classes), i.e. your 'data' array, is expected to be an array of the same
> size as the 'x' and 'y' arrays. It is not expecting a masked array. If a
> masked array is used the mask will be ignored, and so the values behind
> the mask will be used as though they were real values. If my memory of
> netCDF is correct, this will be whatever 'FillValue' is defined for the
> file, but it may depend on what is used to generate the netCDF file.
> I would normally expect the code to work but produce useless output. A
> crash is possible though. It would be best if you could post a link to
> the netCDF file and I will take a closer look to check there is not
> something else going wrong.
Thanks for the quick response!
Here is the data file: http://tinyurl.com/ms7vzxw. I did some more experiments. The picture stays unchanged, even if I fill the masked values in the array with some real numbers (I'm not saying that this would give me any sensible results...):
import netCDF4
import matplotlib.tri as tri
var = netCDF4.Dataset('maxele.63.nc').variables
x = var['x'][:]
y = var['y'][:]
data = var['zeta_max'][:]
elems = var['element'][:, :]-1
triang = tri.Triangulation(x, y, triangles=elems)
data = data.filled(0.0)
# this still crashes the python interpreter
interp = tri.LinearTriInterpolator(triang, data)
Thanks again!
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu
|
|
From: Andrew D. <da...@at...> - 2014-08-11 18:28:54
|
Hi Hartmut.
I ran the example on my machine (which is a 64-bit Linux box with 8 GB of
RAM; Python 2.7, matplotlib 1.3.1) and it runs fine. However, it does use
around 2 GB of memory, perhaps slightly more. I think the memory usage
might be a problem for you if you are using 32-bit Windows. I'm not
familiar with the details but I believe the memory available to a single
32-bit process on Win32 may be only 2 GB. I'm also not familiar with the
data you provided, but is it possible to reduce to number of points in
order to test if memory limitations are the underlying problemhere?
On 11 August 2014 14:54, Hartmut Kaiser <har...@gm...> wrote:
> Ian,
>
> > I'm running into a crash while trying to construct a
> > tri.LinearTriInterpolator. Here is the short version of the code:
> >
> > import netCDF4
> > import matplotlib.tri as tri
> >
> > var = netCDF4.Dataset('filename.cdf').variables
> > x = var['x'][:]
> > y = var['y'][:]
> > data = var['zeta_max'][:]
> > elems = var['element'][:, :]-1
> >
> > triang = tri.Triangulation(x, y, triangles=elems)
> >
> > # this crashes the python interpreter
> > interp = tri.LinearTriInterpolator(triang, data)
> >
> > The data arrays (x, y, data, elems) are fairly large (>1 mio elements),
> > all
> > represented as numpy arrays (as returned by netCDF4). The 'data' array is
> > a
> > masked array and contains masked values.
> >
> > If somebody cares, I'd be able to post a link to the netCDF data file
> > causing this.
> >
> > All this happens when using matplotlib 1.3.1, Win32, Python 2.7.
> >
> > Any help would be highly appreciated!
> > Regards Hartmut
> >
> > Hartmut,
> > That is an excellent issue report; all the relevant information and
> > nothing extraneous. Hence the quick response.
> > The second argument to TriLinearInterpolator (and other TriInterpolator
> > classes), i.e. your 'data' array, is expected to be an array of the same
> > size as the 'x' and 'y' arrays. It is not expecting a masked array. If
> a
> > masked array is used the mask will be ignored, and so the values behind
> > the mask will be used as though they were real values. If my memory of
> > netCDF is correct, this will be whatever 'FillValue' is defined for the
> > file, but it may depend on what is used to generate the netCDF file.
> > I would normally expect the code to work but produce useless output. A
> > crash is possible though. It would be best if you could post a link to
> > the netCDF file and I will take a closer look to check there is not
> > something else going wrong.
>
> Thanks for the quick response!
>
> Here is the data file: http://tinyurl.com/ms7vzxw. I did some more
> experiments. The picture stays unchanged, even if I fill the masked values
> in the array with some real numbers (I'm not saying that this would give me
> any sensible results...):
>
> import netCDF4
> import matplotlib.tri as tri
>
> var = netCDF4.Dataset('maxele.63.nc').variables
> x = var['x'][:]
> y = var['y'][:]
> data = var['zeta_max'][:]
> elems = var['element'][:, :]-1
>
> triang = tri.Triangulation(x, y, triangles=elems)
>
> data = data.filled(0.0)
>
> # this still crashes the python interpreter
> interp = tri.LinearTriInterpolator(triang, data)
>
> Thanks again!
> Regards Hartmut
> ---------------
> http://boost-spirit.com
> http://stellar.cct.lsu.edu
>
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>
--
Dr Andrew Dawson
Atmospheric, Oceanic & Planetary Physics
Clarendon Laboratory
Parks Road
Oxford OX1 3PU, UK
Tel: +44 (0)1865 282438
Email: da...@at...
Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson
|
|
From: Hartmut K. <har...@gm...> - 2014-08-11 21:10:39
|
Andrew, > I ran the example on my machine (which is a 64-bit Linux box with 8 GB of > RAM; Python 2.7, matplotlib 1.3.1) and it runs fine. However, it does use > around 2 GB of memory, perhaps slightly more. I think the memory usage > might be a problem for you if you are using 32-bit Windows. I'm not > familiar with the details but I believe the memory available to a single > 32-bit process on Win32 may be only 2 GB. I'm also not familiar with the > data you provided, but is it possible to reduce to number of points in > order to test if memory limitations are the underlying problemhere? Nod, your suspicion is correct. The python interpreter bails out once the memory footprint reaches 2GBytes. That leaves us with the question if this is a quality of implementation issue - using up 2GBytes of main memory for 1 million node elements seems to be a bit excessive... Thanks everybody for verifying anyways! Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu > > > > On 11 August 2014 14:54, Hartmut Kaiser <har...@gm...> wrote: > Ian, > > > I'm running into a crash while trying to construct a > > tri.LinearTriInterpolator. Here is the short version of the code: > > > > import netCDF4 > > import matplotlib.tri as tri > > > > var = netCDF4.Dataset('filename.cdf').variables > > x = var['x'][:] > > y = var['y'][:] > > data = var['zeta_max'][:] > > elems = var['element'][:, :]-1 > > > > triang = tri.Triangulation(x, y, triangles=elems) > > > > # this crashes the python interpreter > > interp = tri.LinearTriInterpolator(triang, data) > > > > The data arrays (x, y, data, elems) are fairly large (>1 mio elements), > > all > > represented as numpy arrays (as returned by netCDF4). The 'data' array > is > > a > > masked array and contains masked values. > > > > If somebody cares, I'd be able to post a link to the netCDF data file > > causing this. > > > > All this happens when using matplotlib 1.3.1, Win32, Python 2.7. > > > > Any help would be highly appreciated! > > Regards Hartmut > > > > Hartmut, > > That is an excellent issue report; all the relevant information and > > nothing extraneous. Hence the quick response. > > The second argument to TriLinearInterpolator (and other TriInterpolator > > classes), i.e. your 'data' array, is expected to be an array of the same > > size as the 'x' and 'y' arrays. It is not expecting a masked array. If > a > > masked array is used the mask will be ignored, and so the values behind > > the mask will be used as though they were real values. If my memory of > > netCDF is correct, this will be whatever 'FillValue' is defined for the > > file, but it may depend on what is used to generate the netCDF file. > > I would normally expect the code to work but produce useless output. A > > crash is possible though. It would be best if you could post a link to > > the netCDF file and I will take a closer look to check there is not > > something else going wrong. > Thanks for the quick response! > > Here is the data file: http://tinyurl.com/ms7vzxw. I did some more > experiments. The picture stays unchanged, even if I fill the masked values > in the array with some real numbers (I'm not saying that this would give > me any sensible results...): > > import netCDF4 > import matplotlib.tri as tri > var = netCDF4.Dataset('maxele.63.nc').variables > x = var['x'][:] > y = var['y'][:] > data = var['zeta_max'][:] > elems = var['element'][:, :]-1 > > triang = tri.Triangulation(x, y, triangles=elems) > data = data.filled(0.0) > > # this still crashes the python interpreter > interp = tri.LinearTriInterpolator(triang, data) > > Thanks again! > Regards Hartmut > --------------- > http://boost-spirit.com > http://stellar.cct.lsu.edu > > > > -------------------------------------------------------------------------- > ---- > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > > > > -- > Dr Andrew Dawson > Atmospheric, Oceanic & Planetary Physics > Clarendon Laboratory > Parks Road > Oxford OX1 3PU, UK > Tel: +44 (0)1865 282438 > Email: da...@at... > Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson |
|
From: Hartmut K. <har...@gm...> - 2014-08-11 22:09:51
|
> > I ran the example on my machine (which is a 64-bit Linux box with 8 GB > of > > RAM; Python 2.7, matplotlib 1.3.1) and it runs fine. However, it does > use > > around 2 GB of memory, perhaps slightly more. I think the memory usage > > might be a problem for you if you are using 32-bit Windows. I'm not > > familiar with the details but I believe the memory available to a single > > 32-bit process on Win32 may be only 2 GB. I'm also not familiar with the > > data you provided, but is it possible to reduce to number of points in > > order to test if memory limitations are the underlying problemhere? > > Nod, your suspicion is correct. The python interpreter bails out once the > memory footprint reaches 2GBytes. That leaves us with the question if this > is a quality of implementation issue - using up 2GBytes of main memory for > 1 million node elements seems to be a bit excessive... > > Thanks everybody for verifying anyways! Just to round that issue up - I tried running this using Python 2.7 (64Bit) and it does not crash anymore. The memory requirement grows up to almost 4GByte. I will verify whether I can get the results I hope for and will report back. Thanks again! Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu > > Regards Hartmut > --------------- > http://boost-spirit.com > http://stellar.cct.lsu.edu > > > > > > > > > On 11 August 2014 14:54, Hartmut Kaiser <har...@gm...> > wrote: > > Ian, > > > > > I'm running into a crash while trying to construct a > > > tri.LinearTriInterpolator. Here is the short version of the code: > > > > > > import netCDF4 > > > import matplotlib.tri as tri > > > > > > var = netCDF4.Dataset('filename.cdf').variables > > > x = var['x'][:] > > > y = var['y'][:] > > > data = var['zeta_max'][:] > > > elems = var['element'][:, :]-1 > > > > > > triang = tri.Triangulation(x, y, triangles=elems) > > > > > > # this crashes the python interpreter > > > interp = tri.LinearTriInterpolator(triang, data) > > > > > > The data arrays (x, y, data, elems) are fairly large (>1 mio > elements), > > > all > > > represented as numpy arrays (as returned by netCDF4). The 'data' array > > is > > > a > > > masked array and contains masked values. > > > > > > If somebody cares, I'd be able to post a link to the netCDF data file > > > causing this. > > > > > > All this happens when using matplotlib 1.3.1, Win32, Python 2.7. > > > > > > Any help would be highly appreciated! > > > Regards Hartmut > > > > > > Hartmut, > > > That is an excellent issue report; all the relevant information and > > > nothing extraneous. Hence the quick response. > > > The second argument to TriLinearInterpolator (and other > TriInterpolator > > > classes), i.e. your 'data' array, is expected to be an array of the > same > > > size as the 'x' and 'y' arrays. It is not expecting a masked > array. If > > a > > > masked array is used the mask will be ignored, and so the values > behind > > > the mask will be used as though they were real values. If my memory > of > > > netCDF is correct, this will be whatever 'FillValue' is defined for > the > > > file, but it may depend on what is used to generate the netCDF file. > > > I would normally expect the code to work but produce useless > output. A > > > crash is possible though. It would be best if you could post a link > to > > > the netCDF file and I will take a closer look to check there is not > > > something else going wrong. > > Thanks for the quick response! > > > > Here is the data file: http://tinyurl.com/ms7vzxw. I did some more > > experiments. The picture stays unchanged, even if I fill the masked > values > > in the array with some real numbers (I'm not saying that this would give > > me any sensible results...): > > > > import netCDF4 > > import matplotlib.tri as tri > > var = netCDF4.Dataset('maxele.63.nc').variables > > x = var['x'][:] > > y = var['y'][:] > > data = var['zeta_max'][:] > > elems = var['element'][:, :]-1 > > > > triang = tri.Triangulation(x, y, triangles=elems) > > data = data.filled(0.0) > > > > # this still crashes the python interpreter > > interp = tri.LinearTriInterpolator(triang, data) > > > > Thanks again! > > Regards Hartmut > > --------------- > > http://boost-spirit.com > > http://stellar.cct.lsu.edu > > > > > > > > ------------------------------------------------------------------------ > -- > > ---- > > _______________________________________________ > > Matplotlib-users mailing list > > Mat...@li... > > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > > > > > > > > > -- > > Dr Andrew Dawson > > Atmospheric, Oceanic & Planetary Physics > > Clarendon Laboratory > > Parks Road > > Oxford OX1 3PU, UK > > Tel: +44 (0)1865 282438 > > Email: da...@at... > > Web Site: http://www2.physics.ox.ac.uk/contacts/people/dawson |
|
From: Ian T. <ian...@gm...> - 2014-08-12 09:35:04
|
Here are the results of my investigation. There is probably more
information here than anyone else wants, but it is useful information for
future improvements.
Most of the RAM is taken up by a trifinder object which is at the heart of
a triinterpolator, and is used to find the triangles of a Triangulation in
which (x,y) points lie. The code
interp = tri.LinearTriInterpolator(triang, data)
is equivalent to
trifinder = tri.TrapezoidMapTriFinder(triang)
interp = tri.LinearTriInterpolator(triang, data, trifinder=trifinder)
Using the latter with memory_profiler (
https://pypi.python.org/pypi/memory_profiler) indicates that this is where
most of the RAM is being used. Here are some figures for trifinder RAM
usage as a function of ntri, the number of triangles in the triangulation:
ntri trifinder MB
---- ------------
1000 26
10000 33
100000 116
1000000 912
2140255 1936
The RAM usage is less than linear in ntri, but clearly too much for large
triangulations unless you have a lot of RAM.
The trifinder precomputes a tree of nodes to make looking up triangles
quick. Searching through 2 million triangles in an ad-hoc manner would be
very slow; the trifinder is very fast in comparison. Here are some stats
for the tree that trifinder uses (the columns are number of nodes in the
tree, maximum node depth, and mean node depth):
ntri nodes max depth mean depth
------- --------- --------- ----------
1000 179097 37 23.24
10000 3271933 53 30.74
100000 36971309 69 37.15
1000000 853117229 87 48.66
The mean depth is the mean number of nodes that have to be traversed to
find a triangle, and the max depth is the worst case. The search time is
therefore O(log ntri).
The triangle interpolator code is structured in such a way that it is easy
to plug in a different trifinder if the default one isn't appropriate. At
the moment there is only the one available however
(TrapezoidMapTriFinder). For the problem at hand, a trifinder that is
slower but consumes less RAM would be preferable. There are various
possibilities, they just have to be implemented! I will take a look at it
sometime, but it probably will not be soon.
Ian Thomas
|
|
From: Hartmut K. <har...@gm...> - 2014-08-12 12:55:12
|
Thanks for your insights, Ian! A somewhat slower trifinder which requires less memory might be even faster in the end as creating the trifinder itself takes a lot of time (almost a minute in our case). Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu > -----Original Message----- > From: Ian Thomas [mailto:ian...@gm...] > Sent: Tuesday, August 12, 2014 4:35 AM > To: Hartmut Kaiser > Cc: Andrew Dawson; Carola Kaiser; matplotlib-users > Subject: Re: [Matplotlib-users] Crash when using > matplotlib.tri.LinearTriInterpolator > > Here are the results of my investigation. There is probably more > information here than anyone else wants, but it is useful information for > future improvements. > > Most of the RAM is taken up by a trifinder object which is at the heart of > a triinterpolator, and is used to find the triangles of a Triangulation in > which (x,y) points lie. The code > interp = tri.LinearTriInterpolator(triang, data) > is equivalent to > trifinder = tri.TrapezoidMapTriFinder(triang) > interp = tri.LinearTriInterpolator(triang, data, trifinder=trifinder) > > Using the latter with memory_profiler > (https://pypi.python.org/pypi/memory_profiler) indicates that this is > where most of the RAM is being used. Here are some figures for trifinder > RAM usage as a function of ntri, the number of triangles in the > triangulation: > > ntri trifinder MB > ---- ------------ > 1000 26 > 10000 33 > 100000 116 > 1000000 912 > 2140255 1936 > > The RAM usage is less than linear in ntri, but clearly too much for large > triangulations unless you have a lot of RAM. > > The trifinder precomputes a tree of nodes to make looking up triangles > quick. Searching through 2 million triangles in an ad-hoc manner would be > very slow; the trifinder is very fast in comparison. Here are some stats > for the tree that trifinder uses (the columns are number of nodes in the > tree, maximum node depth, and mean node depth): > ntri nodes max depth mean depth > ------- --------- --------- ---------- > 1000 179097 37 23.24 > 10000 3271933 53 30.74 > 100000 36971309 69 37.15 > 1000000 853117229 87 48.66 > The mean depth is the mean number of nodes that have to be traversed to > find a triangle, and the max depth is the worst case. The search time is > therefore O(log ntri). > The triangle interpolator code is structured in such a way that it is easy > to plug in a different trifinder if the default one isn't appropriate. At > the moment there is only the one available however > (TrapezoidMapTriFinder). For the problem at hand, a trifinder that is > slower but consumes less RAM would be preferable. There are various > possibilities, they just have to be implemented! I will take a look at it > sometime, but it probably will not be soon. > Ian Thomas |
|
From: Dale C. <da...@ld...> - 2014-08-11 19:25:51
Attachments:
signature.asc
|
Runs to completion without errors on my installation:
OS X 10.9.4
MacBook Air w/ 8GB of memory
Python 2.7 and matplotlib 1.3.1-1 lib
-Dale
On Aug 10, 2014, at 13:43 , Hartmut Kaiser <har...@gm...> wrote:
> All,
>
> I'm running into a crash while trying to construct a
> tri.LinearTriInterpolator. Here is the short version of the code:
>
> import netCDF4
> import matplotlib.tri as tri
>
> var = netCDF4.Dataset('filename.cdf').variables
> x = var['x'][:]
> y = var['y'][:]
> data = var['attrname'][:]
> elems = var['element'][:,:]-1
>
> triang = tri.Triangulation(x, y, triangles=elems)
>
> # this crashes the python interpreter
> interp = tri.LinearTriInterpolator(triang, data)
>
> The data arrays (x, y, data, elems) are fairly large (>1 mio elements), all
> represented as numpy arrays (as returned by netCDF4). The 'data' array is a
> masked array and contains masked values.
>
> If somebody cares, I'd be able to post a link to the netCDF data file
> causing this.
>
> All this happens when using matplotlib 1.3.1, Win32, Python 2.7.
>
> Any help would be highly appreciated!
> Regards Hartmut
> ---------------
> http://boost-spirit.com
> http://stellar.cct.lsu.edu
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
|