|
From: antonv <vas...@ya...> - 2009-04-08 18:05:41
|
Hi all, I am processing a lot of grib data from noaa with the use of matplotlib and basemap. On my actual laptop (p4 3ghz, 512mb ram) the whole process takes close to 3 hours... so it's time for a new machine but still on a very tight budget :) My main question is what should i emphasize more, a quad core processor running on 64 bit vista/xp, or more memory and a fast hard drive, even a raid drive? Will python, mpl and basemap take full advantage of multiple cores or will they use only one? Also, would they work on a 64 bit environment or would I be better off just sticking to XP32? Now memory wise, it seems that on my actual machine the app uses all the available ram, how much should i buy to make sure that all it's needs would be meet? Processor wise, i see that both Intel and AMD have a plethora of options to choose from... What would you recommend? And the last question is about hard drives. From your experience, what drives should I look at? Is a SCSI raid still that much faster than a 10.000 rpm hdd? I've also seen that there are some 15.000 rpm drives that have a controller, would they worth the money or should I just get a 10.000 rpm hdd and be done? Thanks for any help as lately I haven't kept up with the technology and I feel like a noob :( Anton -- View this message in context: http://www.nabble.com/Computer-specs-for-fast-matplotlib-and-basemap-processing-tp22956400p22956400.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
|
From: Eric F. <ef...@ha...> - 2009-04-08 19:23:34
|
antonv wrote: > Hi all, > > I am processing a lot of grib data from noaa with the use of matplotlib and > basemap. On my actual laptop (p4 3ghz, 512mb ram) the whole process takes > close to 3 hours... so it's time for a new machine but still on a very tight > budget :) > > My main question is what should i emphasize more, a quad core processor > running on 64 bit vista/xp, or more memory and a fast hard drive, even a > raid drive? Will python, mpl and basemap take full advantage of multiple > cores or will they use only one? Also, would they work on a 64 bit > environment or would I be better off just sticking to XP32? Now memory wise, > it seems that on my actual machine the app uses all the available ram, how > much should i buy to make sure that all it's needs would be meet? Just a few comments; I am sure others are more knowledgeable about most of this. First, I think you need to try to figure out what the bottlenecks are. Can you monitor disk use, memory use, and cpu use? Is the disk maxed out and the cpu idle? If the disk is heavily used, is it swapping? From what you have said, it is impossible to tell whether the disk speed would make a difference, for example. My guess is that it is going to be low priority. Second, as part of the above, you might review your code and see whether there are some very inefficient parts. How much time is spent in loops that could be vectorized? Are lists being used where arrays would be more efficient? In basemap, are you re-using instances where possible, or are you unnecessarily re-extracting coastlines, for example? Is it possible that you are running out of memory and then swapping because you are using pylab/pyplot and failing to close figures when you have finished with them? If your budget is tight, I would be very surprised if SCSI would be cost-effective. Generally, SATA is the way to go these days. I suspect there won't be much speed difference between 32-bit and 64-bit OS versions. RAM: I expect 4GB will be both cheap and adequate. To use multiple processors efficiently with matplotlib, you will need multiple processes; mpl and numpy do not automatically dispatch parts of a single job out to multiple processors. (I'm not sure what happens if you use threads--I think it will still be one job per processor--but the general advice is, don't use threads unless you really know what you are doing, really need them, and are willing to put in some heavy debugging time.) My guess is that your 3-hour-job could easily be split up into independent jobs working on independent chunks of data, in which case such a split would give you a big speed-up with more processor cores, assuming the work is CPU-intensive; if it is disk IO-bound, then the split won't help. Anyway, dual-core is pretty standard now, and you will want at least that. Quad might or might not help, as indicated above. Eric > > Processor wise, i see that both Intel and AMD have a plethora of options to > choose from... What would you recommend? > > And the last question is about hard drives. From your experience, what > drives should I look at? Is a SCSI raid still that much faster than a 10.000 > rpm hdd? I've also seen that there are some 15.000 rpm drives that have a > controller, would they worth the money or should I just get a 10.000 rpm hdd > and be done? > > Thanks for any help as lately I haven't kept up with the technology and I > feel like a noob :( > > Anton |
|
From: João L. S. <js...@fc...> - 2009-04-08 20:40:46
|
antonv wrote: > Hi all, > > I am processing a lot of grib data from noaa with the use of matplotlib and > basemap. On my actual laptop (p4 3ghz, 512mb ram) the whole process takes > close to 3 hours... so it's time for a new machine but still on a very tight > budget :) > You should profile your application to see why it's taking so long. Maybe you just coded something in a slow way. Python is a great language, but if you don't know it well you might have programmed some parts in a way that takes orders of magnitude more time than other solutions. Even if your code reasonably optimized, you should know first why it's slow: Has the computer run out of memory and is swapping? Is the CPU at 100%? I'd recommend you ask a local python expert for some help. JLS |
|
From: antonv <vas...@ya...> - 2009-04-08 20:57:25
|
I have a bit of experience programming and I am pretty sure I get my parts of the code pretty well optimized. I made sure that in the loop I have only the stuff needed and I'm loading all the stuff before. The biggest bottleneck is happening because I'm unpacking grib files to csv files using Degrib in command line. That operation is usually around half an hour using no more than 50% of the processor but it maxes out the memory usage and it definitely is hard drive intensive as it ends up writing over 4 GB of data. I have noticed also that on a lower spec AMD desktop this runs faster than on my P4 Intel Laptop, my guess being that the laptop hdd is 5400 rpm and the desktop is 7200 rpm. Next step is to take all those csv files and make images from them. For this one I haven't dug too deep to see what is happening but it seems to be the other way, using the cpu a lot more while keeping the memory usage high too. Thanks, Anton -- View this message in context: http://www.nabble.com/Computer-specs-for-fast-matplotlib-and-basemap-processing-tp22956400p22959409.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
|
From: Jose Gómez-D. <jgo...@gm...> - 2009-04-09 12:13:01
|
On Wednesday 08 April 2009 21:57:21 antonv wrote:
> The biggest bottleneck is happening because I'm unpacking grib files to csv
> files using Degrib in command line. That operation is usually around half
> an hour using no more than 50% of the processor but it maxes out the memory
> usage and it definitely is hard drive intensive as it ends up writing over
> 4 GB of data. I have noticed also that on a lower spec AMD desktop this
> runs faster than on my P4 Intel Laptop, my guess being that the laptop hdd
I do the same sort of processing, and use GDAL to read the GRIB (I think
grib2, whatever ECMWF provides) files directly into numpy arrays. It's as
easy as
from osgeo import gdal
g = gdal.Open("my_grib_file.grib")
data = g.GetRasterBand( my_band ).ReadAsArray()
pylab.imshow
blah blah blah
It doesn't take long at all, unless your files are huge and are stored over a
slow and busy network. But then, there's little you can do about that!
J
--
RSU ■ Dept. of Geography ■ University College ■ Gower St, London WC1E 6BT UK
EMM ■ Dept. of Geography ■ King's College ■ Strand, London WC2R 2LS UK
|
|
From: Eric F. <ef...@ha...> - 2009-04-08 21:37:35
|
antonv wrote: > I have a bit of experience programming and I am pretty sure I get my parts of > the code pretty well optimized. I made sure that in the loop I have only the > stuff needed and I'm loading all the stuff before. > > The biggest bottleneck is happening because I'm unpacking grib files to csv > files using Degrib in command line. That operation is usually around half an Instead of going to csv files--which are *very* inefficient to write, store, and then read in again--why not convert directly to netcdf, and then read your data in from netcdf as needed for plotting? I suspect this will speed things up quite a bit. Numpy support for netcdf is very good. Of course, direct numpy-enabled access to the grib files might be even better, eliminating the translation phase entirely. Have you looked into http://www.pyngl.ucar.edu/Nio.shtml? Eric > hour using no more than 50% of the processor but it maxes out the memory > usage and it definitely is hard drive intensive as it ends up writing over 4 > GB of data. I have noticed also that on a lower spec AMD desktop this runs > faster than on my P4 Intel Laptop, my guess being that the laptop hdd is > 5400 rpm and the desktop is 7200 rpm. > > Next step is to take all those csv files and make images from them. For this > one I haven't dug too deep to see what is happening but it seems to be the > other way, using the cpu a lot more while keeping the memory usage high too. > > Thanks, > Anton |
|
From: antonv <vas...@ya...> - 2009-04-08 22:54:24
|
I know that using the csv files is very slow but I have no knowledge of working with the netcdf format and I was in a bit of a rush when I wrote this. I will take a look again at it. How would you translate a grib in netcdf? Are there any secific applications or straight through numpy? As for pyngl, if i remember correctly I looked at it but it was not working on windows. Thanks, Anton efiring wrote: > > antonv wrote: >> I have a bit of experience programming and I am pretty sure I get my >> parts of >> the code pretty well optimized. I made sure that in the loop I have only >> the >> stuff needed and I'm loading all the stuff before. >> >> The biggest bottleneck is happening because I'm unpacking grib files to >> csv >> files using Degrib in command line. That operation is usually around half >> an > > Instead of going to csv files--which are *very* inefficient to write, > store, and then read in again--why not convert directly to netcdf, and > then read your data in from netcdf as needed for plotting? I suspect > this will speed things up quite a bit. Numpy support for netcdf is very > good. Of course, direct numpy-enabled access to the grib files might be > even better, eliminating the translation phase entirely. Have you > looked into http://www.pyngl.ucar.edu/Nio.shtml? > > Eric > > >> hour using no more than 50% of the processor but it maxes out the memory >> usage and it definitely is hard drive intensive as it ends up writing >> over 4 >> GB of data. I have noticed also that on a lower spec AMD desktop this >> runs >> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is >> 5400 rpm and the desktop is 7200 rpm. >> >> Next step is to take all those csv files and make images from them. For >> this >> one I haven't dug too deep to see what is happening but it seems to be >> the >> other way, using the cpu a lot more while keeping the memory usage high >> too. >> >> Thanks, >> Anton > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > -- View this message in context: http://www.nabble.com/Computer-specs-for-fast-matplotlib-and-basemap-processing-tp22956400p22961419.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
|
From: Jeff W. <js...@fa...> - 2009-04-08 23:02:26
|
antonv wrote: > I know that using the csv files is very slow but I have no knowledge of > working with the netcdf format and I was in a bit of a rush when I wrote > this. I will take a look again at it. How would you translate a grib in > netcdf? Are there any secific applications or straight through numpy? > > As for pyngl, if i remember correctly I looked at it but it was not working > on windows. > > Thanks, > Anton > Anton: If these are grib version 2 files, another option is http://code.google.com/p/pygrib2. I have made a windows installer. -Jeff > > > efiring wrote: > >> antonv wrote: >> >>> I have a bit of experience programming and I am pretty sure I get my >>> parts of >>> the code pretty well optimized. I made sure that in the loop I have only >>> the >>> stuff needed and I'm loading all the stuff before. >>> >>> The biggest bottleneck is happening because I'm unpacking grib files to >>> csv >>> files using Degrib in command line. That operation is usually around half >>> an >>> >> Instead of going to csv files--which are *very* inefficient to write, >> store, and then read in again--why not convert directly to netcdf, and >> then read your data in from netcdf as needed for plotting? I suspect >> this will speed things up quite a bit. Numpy support for netcdf is very >> good. Of course, direct numpy-enabled access to the grib files might be >> even better, eliminating the translation phase entirely. Have you >> looked into http://www.pyngl.ucar.edu/Nio.shtml? >> >> Eric >> >> >> >>> hour using no more than 50% of the processor but it maxes out the memory >>> usage and it definitely is hard drive intensive as it ends up writing >>> over 4 >>> GB of data. I have noticed also that on a lower spec AMD desktop this >>> runs >>> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is >>> 5400 rpm and the desktop is 7200 rpm. >>> >>> Next step is to take all those csv files and make images from them. For >>> this >>> one I haven't dug too deep to see what is happening but it seems to be >>> the >>> other way, using the cpu a lot more while keeping the memory usage high >>> too. >>> >>> Thanks, >>> Anton >>> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by: >> High Quality Requirements in a Collaborative Environment. >> Download a free trial of Rational Requirements Composer Now! >> http://p.sf.net/sfu/www-ibm-com >> _______________________________________________ >> Matplotlib-users mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/matplotlib-users >> >> >> > > -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-113 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Anton V. <vas...@ya...> - 2009-04-08 23:06:39
|
Wow Jeff! You save me again! I remember looking at it last year and thinking it would be awesome if there would be a windows installer for it! I will install and play with it tonight! Thanks a lot! Anton ________________________________ From: Jeff Whitaker <js...@fa...> To: antonv <vas...@ya...> Cc: mat...@li... Sent: Wednesday, April 8, 2009 4:02:22 PM Subject: Re: [Matplotlib-users] Computer specs for fast matplotlib and basemap processing antonv wrote: > I know that using the csv files is very slow but I have no knowledge of > working with the netcdf format and I was in a bit of a rush when I wrote > this. I will take a look again at it. How would you translate a grib in > netcdf? Are there any secific applications or straight through numpy? > > As for pyngl, if i remember correctly I looked at it but it was not working > on windows. > > Thanks, > Anton > Anton: If these are grib version 2 files, another option is http://code.google.com/p/pygrib2. I have made a windows installer. -Jeff > > > efiring wrote: > >> antonv wrote: >> >>> I have a bit of experience programming and I am pretty sure I get my >>> parts of >>> the code pretty well optimized. I made sure that in the loop I have only >>> the >>> stuff needed and I'm loading all the stuff before. >>> >>> The biggest bottleneck is happening because I'm unpacking grib files to >>> csv >>> files using Degrib in command line. That operation is usually around half >>> an >>> >> Instead of going to csv files--which are *very* inefficient to write, store, and then read in again--why not convert directly to netcdf, and then read your data in from netcdf as needed for plotting? I suspect this will speed things up quite a bit. Numpy support for netcdf is very good. Of course, direct numpy-enabled access to the grib files might be even better, eliminating the translation phase entirely. Have you looked into http://www.pyngl.ucar.edu/Nio.shtml? >> >> Eric >> >> >> >>> hour using no more than 50% of the processor but it maxes out the memory >>> usage and it definitely is hard drive intensive as it ends up writing >>> over 4 >>> GB of data. I have noticed also that on a lower spec AMD desktop this >>> runs >>> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is >>> 5400 rpm and the desktop is 7200 rpm. >>> >>> Next step is to take all those csv files and make images from them. For >>> this >>> one I haven't dug too deep to see what is happening but it seems to be >>> the >>> other way, using the cpu a lot more while keeping the memory usage high >>> too. >>> >>> Thanks, >>> Anton >>> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by: >> High Quality Requirements in a Collaborative Environment. >> Download a free trial of Rational Requirements Composer Now! >> http://p.sf.net/sfu/www-ibm-com >> _______________________________________________ >> Matplotlib-users mailing list >> Mat...@li... >> https://lists.sourceforge.net/lists/listinfo/matplotlib-users >> >> >> > > -- Jeffrey S. Whitaker Phone : (303)497-6313 Meteorologist FAX : (303)497-6449 NOAA/OAR/PSD R/PSD1 Email : Jef...@no... 325 Broadway Office : Skaggs Research Cntr 1D-113 Boulder, CO, USA 80303-3328 Web : http://tinyurl.com/5telg |
|
From: Eric F. <ef...@ha...> - 2009-04-08 23:38:30
|
antonv wrote: > I know that using the csv files is very slow but I have no knowledge of > working with the netcdf format and I was in a bit of a rush when I wrote > this. I will take a look again at it. How would you translate a grib in > netcdf? Are there any secific applications or straight through numpy? The program you are already using is said to convert grib2 to netcdf: http://www.nws.noaa.gov/mdl/NDFD_GRIB2Decoder/ and there are several modules providing a netcdf interface for numpy. I like this one: http://code.google.com/p/netcdf4-python/ and it is included in Enthought Python Distribution. For GRIB to numpy, googling turned up http://code.google.com/p/pygrib2/ as well as PyNIO. My guess is that this (pygrib2) will be exactly what you need. It is by Jeffrey Whitaker, the author of the above-mentioned netcdf4 interface as well as of basemap. > > As for pyngl, if i remember correctly I looked at it but it was not working > on windows. Well, I recommend switching to linux anyway, but that is another story. Eric |
|
From: Christopher B. <Chr...@no...> - 2009-04-09 17:12:51
|
Eric Firing wrote: >> The biggest bottleneck is happening because I'm unpacking grib files to csv >> files using Degrib in command line. That operation is usually around half an disk speed -- you might want to try SATA RAID 0 (striping) -- I"d get a good hardware vendor's advise in maximizing your disk IO. You can also multi-task that process easily, but if you're disk-bound, that won't help anyway. > Instead of going to csv files--which are *very* inefficient to write, > store, and then read in again--why not convert directly to netcdf, Or HDF, via PyTables. Or even direct binary numpy arrays, with either fromfile / to file, or, more robustly, with numpy.save and numpy.load. > direct numpy-enabled access to the grib files might be > even better, eliminating the translation phase entirely. Have you > looked into http://www.pyngl.ucar.edu/Nio.shtml? Also, I think GDAL support GRIB, and can directly give you numpy arrays. >> I have noticed also that on a lower spec AMD desktop this runs >> faster than on my P4 Intel Laptop, my guess being that the laptop hdd is >> 5400 rpm and the desktop is 7200 rpm. yup, those laptop hard drives are SLOW -- you culd look into a Solic State drive, if you have some money to spend. >> Next step is to take all those csv files and make images from them. For this >> one I haven't dug too deep to see what is happening but it seems to be the >> other way, using the cpu a lot more while keeping the memory usage high too. mulit-cores aren't going to help here, unless yuo run a few separate processes -- also, how much memory? All 64 bits will buy you is more memory, which you may or may not need. Also, as for Windows 64 bits -- is numpy supported there yet? I'd make sure, there are issues, as there is no MingGW for 64 bit Windows. antonv wrote: > I know that using the csv files is very slow but I have no knowledge of > working with the netcdf format and I was in a bit of a rush when I wrote > this. I will take a look again at it. How would you translate a grib in > netcdf? See if degrib supports any binary formats (I now, I'm form NOAA, I should know...). Otherewise yuo could use the hGDAL command-line tools to translate into something else binary that may be easier to deal with. Though it looks like Jeff may have solved this problem for you (One NOAA, Jeff!) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |