Thread: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

Brought to you by: cjgohlke, dsdale, efiring, heeres, and 7 others

matplotlib-users

[Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: antonv <vas...@ya...> - 2009-04-08 18:05:41

Hi all,

I am processing a lot of grib data from noaa with the use of matplotlib and
basemap. On my actual laptop (p4 3ghz, 512mb ram) the whole process takes
close to 3 hours... so it's time for a new machine but still on a very tight
budget :)

My main question is what should i emphasize more, a quad core processor
running on 64 bit vista/xp, or more memory and a fast hard drive, even a
raid drive? Will python, mpl and basemap take full advantage of multiple
cores or will they use only one? Also, would they work on a 64 bit
environment or would I be better off just sticking to XP32? Now memory wise,
it seems that on my actual machine the app uses all the available ram, how
much should i buy to make sure that all it's needs would be meet? 

Processor wise, i see that both Intel and AMD have a plethora of options to
choose from... What would you recommend? 

And the last question is about hard drives. From your experience, what
drives should I look at? Is a SCSI raid still that much faster than a 10.000
rpm hdd? I've also seen that there are some 15.000 rpm drives that have a
controller, would they worth the money or should I just get a 10.000 rpm hdd
and be done?

Thanks for any help as lately I haven't kept up with the technology and I
feel like a noob :(

Anton
-- 
View this message in context: http://www.nabble.com/Computer-specs-for-fast-matplotlib-and-basemap-processing-tp22956400p22956400.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: Eric F. <ef...@ha...> - 2009-04-08 19:23:34

antonv wrote:
> Hi all,
> 
> I am processing a lot of grib data from noaa with the use of matplotlib and
> basemap. On my actual laptop (p4 3ghz, 512mb ram) the whole process takes
> close to 3 hours... so it's time for a new machine but still on a very tight
> budget :)
> 
> My main question is what should i emphasize more, a quad core processor
> running on 64 bit vista/xp, or more memory and a fast hard drive, even a
> raid drive? Will python, mpl and basemap take full advantage of multiple
> cores or will they use only one? Also, would they work on a 64 bit
> environment or would I be better off just sticking to XP32? Now memory wise,
> it seems that on my actual machine the app uses all the available ram, how
> much should i buy to make sure that all it's needs would be meet? 

Just a few comments; I am sure others are more knowledgeable about most 
of this.

First, I think you need to try to figure out what the bottlenecks are. 
Can you monitor disk use, memory use, and cpu use?  Is the disk maxed 
out and the cpu idle?  If the disk is heavily used, is it swapping? 
 From what you have said, it is impossible to tell whether the disk 
speed would make a difference, for example.  My guess is that it is 
going to be low priority.

Second, as part of the above, you might review your code and see whether 
there are some very inefficient parts.  How much time is spent in loops 
that could be vectorized?  Are lists being used where arrays would be 
more efficient?  In basemap, are you re-using instances where possible, 
or are you unnecessarily re-extracting coastlines, for example?  Is it 
possible that you are running out of memory and then swapping because 
you are using pylab/pyplot and failing to close figures when you have 
finished with them?

If your budget is tight, I would be very surprised if SCSI would be 
cost-effective.  Generally, SATA is the way to go these days.

I suspect there won't be much speed difference between 32-bit and 64-bit 
OS versions.

RAM: I expect 4GB will be both cheap and adequate.

To use multiple processors efficiently with matplotlib, you will need 
multiple processes; mpl and numpy do not automatically dispatch parts of 
a single job out to multiple processors.  (I'm not sure what happens if 
you use threads--I think it will still be one job per processor--but the 
general advice is, don't use threads unless you really know what you are 
doing, really need them, and are willing to put in some heavy debugging 
time.)  My guess is that your 3-hour-job could easily be split up into 
independent jobs working on independent chunks of data, in which case 
such a split would give you a big speed-up with more processor cores, 
assuming the work is CPU-intensive; if it is disk IO-bound, then the 
split won't help.  Anyway, dual-core is pretty standard now, and you 
will want at least that.  Quad might or might not help, as indicated above.

Eric

> 
> Processor wise, i see that both Intel and AMD have a plethora of options to
> choose from... What would you recommend? 
> 
> And the last question is about hard drives. From your experience, what
> drives should I look at? Is a SCSI raid still that much faster than a 10.000
> rpm hdd? I've also seen that there are some 15.000 rpm drives that have a
> controller, would they worth the money or should I just get a 10.000 rpm hdd
> and be done?
> 
> Thanks for any help as lately I haven't kept up with the technology and I
> feel like a noob :(
> 
> Anton

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: João L. S. <js...@fc...> - 2009-04-08 20:40:46

antonv wrote:
> Hi all,
> 
> I am processing a lot of grib data from noaa with the use of matplotlib and
> basemap. On my actual laptop (p4 3ghz, 512mb ram) the whole process takes
> close to 3 hours... so it's time for a new machine but still on a very tight
> budget :)
> 

You should profile your application to see why it's taking so long. 
Maybe you just coded something in a slow way. Python is a great 
language, but if you don't know it well you might have programmed some 
parts in a way that takes orders of magnitude more time than other 
solutions. Even if your code reasonably optimized, you should know first 
why it's slow: Has the computer run out of memory and is swapping? Is 
the CPU at 100%? I'd recommend you ask a local python expert for some help.

JLS

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: antonv <vas...@ya...> - 2009-04-08 20:57:25

I have a bit of experience programming and I am pretty sure I get my parts of
the code pretty well optimized. I made sure that in the loop I have only the
stuff needed and I'm loading all the stuff before.

The biggest bottleneck is happening because I'm unpacking grib files to csv
files using Degrib in command line. That operation is usually around half an
hour using no more than 50% of the processor but it maxes out the memory
usage and it definitely is hard drive intensive as it ends up writing over 4
GB of data. I have noticed also that on a lower spec AMD desktop this runs
faster than on my P4 Intel Laptop, my guess being that the laptop hdd is
5400 rpm and the desktop is 7200 rpm.

Next step is to take all those csv files and make images from them. For this
one I haven't dug too deep to see what is happening but it seems to be the
other way, using the cpu a lot more while keeping the memory usage high too.

Thanks,
Anton
-- 
View this message in context: http://www.nabble.com/Computer-specs-for-fast-matplotlib-and-basemap-processing-tp22956400p22959409.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: Jose Gómez-D. <jgo...@gm...> - 2009-04-09 12:13:01

On Wednesday 08 April 2009 21:57:21 antonv wrote:
> The biggest bottleneck is happening because I'm unpacking grib files to csv
> files using Degrib in command line. That operation is usually around half
> an hour using no more than 50% of the processor but it maxes out the memory
> usage and it definitely is hard drive intensive as it ends up writing over
> 4 GB of data. I have noticed also that on a lower spec AMD desktop this
> runs faster than on my P4 Intel Laptop, my guess being that the laptop hdd

I do the same sort of processing, and use GDAL to read the GRIB (I think 
grib2, whatever ECMWF provides) files directly into numpy arrays. It's as 
easy as 

from osgeo import gdal
g = gdal.Open("my_grib_file.grib")
data = g.GetRasterBand( my_band ).ReadAsArray()
pylab.imshow 
blah blah blah

It doesn't take long at all, unless your files are huge and are stored over a 
slow and busy network. But then, there's little you can do about that!

J

-- 
RSU ■ Dept. of Geography ■ University College ■ Gower St, London WC1E 6BT UK
EMM ■ Dept. of Geography ■ King's College ■  Strand, London WC2R 2LS UK

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: Eric F. <ef...@ha...> - 2009-04-08 21:37:35

antonv wrote:
> I have a bit of experience programming and I am pretty sure I get my parts of
> the code pretty well optimized. I made sure that in the loop I have only the
> stuff needed and I'm loading all the stuff before.
> 
> The biggest bottleneck is happening because I'm unpacking grib files to csv
> files using Degrib in command line. That operation is usually around half an

Instead of going to csv files--which are *very* inefficient to write, 
store, and then read in again--why not convert directly to netcdf, and 
then read your data in from netcdf as needed for plotting?  I suspect 
this will speed things up quite a bit.  Numpy support for netcdf is very 
good.  Of course, direct numpy-enabled access to the grib files might be 
even better, eliminating the translation phase entirely.  Have you 
looked into http://www.pyngl.ucar.edu/Nio.shtml?

Eric

> hour using no more than 50% of the processor but it maxes out the memory
> usage and it definitely is hard drive intensive as it ends up writing over 4
> GB of data. I have noticed also that on a lower spec AMD desktop this runs
> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is
> 5400 rpm and the desktop is 7200 rpm.
> 
> Next step is to take all those csv files and make images from them. For this
> one I haven't dug too deep to see what is happening but it seems to be the
> other way, using the cpu a lot more while keeping the memory usage high too.
> 
> Thanks,
> Anton

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: antonv <vas...@ya...> - 2009-04-08 22:54:24

I know that using the csv files is very slow but I have no knowledge of
working with the netcdf format and I was in a bit of a rush when I wrote
this. I will take a look again at it. How would you translate a grib in
netcdf? Are there any secific applications or straight through numpy?

As for pyngl, if i remember correctly I looked at it but it was not working
on windows.

Thanks,
Anton



efiring wrote:
> 
> antonv wrote:
>> I have a bit of experience programming and I am pretty sure I get my
>> parts of
>> the code pretty well optimized. I made sure that in the loop I have only
>> the
>> stuff needed and I'm loading all the stuff before.
>> 
>> The biggest bottleneck is happening because I'm unpacking grib files to
>> csv
>> files using Degrib in command line. That operation is usually around half
>> an
> 
> Instead of going to csv files--which are *very* inefficient to write, 
> store, and then read in again--why not convert directly to netcdf, and 
> then read your data in from netcdf as needed for plotting?  I suspect 
> this will speed things up quite a bit.  Numpy support for netcdf is very 
> good.  Of course, direct numpy-enabled access to the grib files might be 
> even better, eliminating the translation phase entirely.  Have you 
> looked into http://www.pyngl.ucar.edu/Nio.shtml?
> 
> Eric
> 
> 
>> hour using no more than 50% of the processor but it maxes out the memory
>> usage and it definitely is hard drive intensive as it ends up writing
>> over 4
>> GB of data. I have noticed also that on a lower spec AMD desktop this
>> runs
>> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is
>> 5400 rpm and the desktop is 7200 rpm.
>> 
>> Next step is to take all those csv files and make images from them. For
>> this
>> one I haven't dug too deep to see what is happening but it seems to be
>> the
>> other way, using the cpu a lot more while keeping the memory usage high
>> too.
>> 
>> Thanks,
>> Anton
> 
> 
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> High Quality Requirements in a Collaborative Environment.
> Download a free trial of Rational Requirements Composer Now!
> http://p.sf.net/sfu/www-ibm-com
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
> 
> 

-- 
View this message in context: http://www.nabble.com/Computer-specs-for-fast-matplotlib-and-basemap-processing-tp22956400p22961419.html
Sent from the matplotlib - users mailing list archive at Nabble.com.

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: Jeff W. <js...@fa...> - 2009-04-08 23:02:26

antonv wrote:
> I know that using the csv files is very slow but I have no knowledge of
> working with the netcdf format and I was in a bit of a rush when I wrote
> this. I will take a look again at it. How would you translate a grib in
> netcdf? Are there any secific applications or straight through numpy?
>
> As for pyngl, if i remember correctly I looked at it but it was not working
> on windows.
>
> Thanks,
> Anton
>   

Anton:  If these are grib version 2 files, another option is 
http://code.google.com/p/pygrib2.  I have made a windows installer.

-Jeff
>
>
> efiring wrote:
>   
>> antonv wrote:
>>     
>>> I have a bit of experience programming and I am pretty sure I get my
>>> parts of
>>> the code pretty well optimized. I made sure that in the loop I have only
>>> the
>>> stuff needed and I'm loading all the stuff before.
>>>
>>> The biggest bottleneck is happening because I'm unpacking grib files to
>>> csv
>>> files using Degrib in command line. That operation is usually around half
>>> an
>>>       
>> Instead of going to csv files--which are *very* inefficient to write, 
>> store, and then read in again--why not convert directly to netcdf, and 
>> then read your data in from netcdf as needed for plotting?  I suspect 
>> this will speed things up quite a bit.  Numpy support for netcdf is very 
>> good.  Of course, direct numpy-enabled access to the grib files might be 
>> even better, eliminating the translation phase entirely.  Have you 
>> looked into http://www.pyngl.ucar.edu/Nio.shtml?
>>
>> Eric
>>
>>
>>     
>>> hour using no more than 50% of the processor but it maxes out the memory
>>> usage and it definitely is hard drive intensive as it ends up writing
>>> over 4
>>> GB of data. I have noticed also that on a lower spec AMD desktop this
>>> runs
>>> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is
>>> 5400 rpm and the desktop is 7200 rpm.
>>>
>>> Next step is to take all those csv files and make images from them. For
>>> this
>>> one I haven't dug too deep to see what is happening but it seems to be
>>> the
>>> other way, using the cpu a lot more while keeping the memory usage high
>>> too.
>>>
>>> Thanks,
>>> Anton
>>>       
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by:
>> High Quality Requirements in a Collaborative Environment.
>> Download a free trial of Rational Requirements Composer Now!
>> http://p.sf.net/sfu/www-ibm-com
>> _______________________________________________
>> Matplotlib-users mailing list
>> Mat...@li...
>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>>
>>
>>     
>
>   


-- 
Jeffrey S. Whitaker         Phone  : (303)497-6313
Meteorologist               FAX    : (303)497-6449
NOAA/OAR/PSD  R/PSD1        Email  : Jef...@no...
325 Broadway                Office : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web    : http://tinyurl.com/5telg

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: Anton V. <vas...@ya...> - 2009-04-08 23:06:39

Wow Jeff! You save me again! I remember looking at it last year and thinking it would be awesome if there would be a windows installer for it!
I will install and play with it tonight! Thanks a lot!

Anton




________________________________
From: Jeff Whitaker <js...@fa...>
To: antonv <vas...@ya...>
Cc: mat...@li...
Sent: Wednesday, April 8, 2009 4:02:22 PM
Subject: Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

antonv wrote:
> I know that using the csv files is very slow but I have no knowledge of
> working with the netcdf format and I was in a bit of a rush when I wrote
> this. I will take a look again at it. How would you translate a grib in
> netcdf? Are there any secific applications or straight through numpy?
> 
> As for pyngl, if i remember correctly I looked at it but it was not working
> on windows.
> 
> Thanks,
> Anton
>  

Anton:  If these are grib version 2 files, another option is http://code.google.com/p/pygrib2.  I have made a windows installer.

-Jeff
> 
> 
> efiring wrote:
>  
>> antonv wrote:
>>    
>>> I have a bit of experience programming and I am pretty sure I get my
>>> parts of
>>> the code pretty well optimized. I made sure that in the loop I have only
>>> the
>>> stuff needed and I'm loading all the stuff before.
>>> 
>>> The biggest bottleneck is happening because I'm unpacking grib files to
>>> csv
>>> files using Degrib in command line. That operation is usually around half
>>> an
>>>      
>> Instead of going to csv files--which are *very* inefficient to write, store, and then read in again--why not convert directly to netcdf, and then read your data in from netcdf as needed for plotting?  I suspect this will speed things up quite a bit.  Numpy support for netcdf is very good.  Of course, direct numpy-enabled access to the grib files might be even better, eliminating the translation phase entirely.  Have you looked into http://www.pyngl.ucar.edu/Nio.shtml?
>> 
>> Eric
>> 
>> 
>>    
>>> hour using no more than 50% of the processor but it maxes out the memory
>>> usage and it definitely is hard drive intensive as it ends up writing
>>> over 4
>>> GB of data. I have noticed also that on a lower spec AMD desktop this
>>> runs
>>> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is
>>> 5400 rpm and the desktop is 7200 rpm.
>>> 
>>> Next step is to take all those csv files and make images from them. For
>>> this
>>> one I haven't dug too deep to see what is happening but it seems to be
>>> the
>>> other way, using the cpu a lot more while keeping the memory usage high
>>> too.
>>> 
>>> Thanks,
>>> Anton
>>>      
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by:
>> High Quality Requirements in a Collaborative Environment.
>> Download a free trial of Rational Requirements Composer Now!
>> http://p.sf.net/sfu/www-ibm-com
>> _______________________________________________
>> Matplotlib-users mailing list
>> Mat...@li...
>> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
>> 
>> 
>>    
> 
>  


-- Jeffrey S. Whitaker         Phone  : (303)497-6313
Meteorologist               FAX    : (303)497-6449
NOAA/OAR/PSD  R/PSD1        Email  : Jef...@no...
325 Broadway                Office : Skaggs Research Cntr 1D-113
Boulder, CO, USA 80303-3328 Web    : http://tinyurl.com/5telg

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: Eric F. <ef...@ha...> - 2009-04-08 23:38:30

antonv wrote:
> I know that using the csv files is very slow but I have no knowledge of
> working with the netcdf format and I was in a bit of a rush when I wrote
> this. I will take a look again at it. How would you translate a grib in
> netcdf? Are there any secific applications or straight through numpy?

The program you are already using is said to convert grib2 to netcdf:
http://www.nws.noaa.gov/mdl/NDFD_GRIB2Decoder/
and there are several modules providing a netcdf interface for numpy.  I 
like this one:
http://code.google.com/p/netcdf4-python/
and it is included in Enthought Python Distribution.

For GRIB to numpy, googling turned up http://code.google.com/p/pygrib2/
as well as PyNIO.  My guess is that this (pygrib2) will be exactly what 
you need.  It is by Jeffrey Whitaker, the author of the above-mentioned 
netcdf4 interface as well as of basemap.

> 
> As for pyngl, if i remember correctly I looked at it but it was not working
> on windows.

Well, I recommend switching to linux anyway, but that is another story.

Eric

Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing

From: Christopher B. <Chr...@no...> - 2009-04-09 17:12:51

Eric Firing wrote:
>> The biggest bottleneck is happening because I'm unpacking grib files to csv
>> files using Degrib in command line. That operation is usually around half an

disk speed -- you might want to try SATA RAID 0 (striping) -- I"d get a 
good hardware vendor's advise in maximizing your disk IO.

You can also multi-task that process easily, but if you're disk-bound, 
that won't help anyway.

> Instead of going to csv files--which are *very* inefficient to write, 
> store, and then read in again--why not convert directly to netcdf,

Or HDF, via PyTables. Or even direct binary numpy arrays, with either 
fromfile / to file, or, more robustly, with numpy.save and numpy.load.

> direct numpy-enabled access to the grib files might be 
> even better, eliminating the translation phase entirely.  Have you 
> looked into http://www.pyngl.ucar.edu/Nio.shtml?

Also, I think GDAL support GRIB, and can directly give you numpy arrays.

>> I have noticed also that on a lower spec AMD desktop this runs
>> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is
>> 5400 rpm and the desktop is 7200 rpm.

yup, those laptop hard drives are SLOW -- you culd look into a Solic 
State drive, if you have some money to spend.

>> Next step is to take all those csv files and make images from them. For this
>> one I haven't dug too deep to see what is happening but it seems to be the
>> other way, using the cpu a lot more while keeping the memory usage high too.

mulit-cores aren't going to help here, unless yuo run a few separate 
processes -- also, how much memory? All 64 bits will buy you is more 
memory, which you may or may not need.

Also, as for Windows 64 bits -- is numpy supported there yet? I'd make 
sure, there are issues, as there is no MingGW for 64 bit Windows.

antonv wrote:
> I know that using the csv files is very slow but I have no knowledge of
> working with the netcdf format and I was in a bit of a rush when I wrote
> this. I will take a look again at it. How would you translate a grib in
> netcdf?

See if degrib supports any binary formats (I now, I'm form NOAA, I 
should know...). Otherewise yuo could use the hGDAL command-line tools 
to translate into something else binary that may be easier to deal with. 
Though it looks like Jeff may have solved this problem for you (One 
NOAA, Jeff!)

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chr...@no...