From: Francesc A. <fa...@py...> - 2012-10-21 15:41:05
|
Hi, I'm going to give a tutorial on PyTables next Thursday during the PyData conference in New York (http://nyc2012.pydata.org/) and I'd like to use some real life data files. So, if you have some public repository with data generated with PyTables, please tell me. I'm looking for files that are not very large (< 1GB), and that use the Table object significantly. A small description of the data included will be more that welcome too! Thanks! -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2012-10-21 18:03:49
|
Hello Francesc, I look forward to your pydata hearing how your tutorial goes! Here [1] is a file that stores some basic nuclear data that is freely redistributable. It stores atomic weights, bound neutron scattering lengths, and pre-compiled neutron cross sections (xs) for 5 different energy regimes. Everything in here is a table. The file is rather (at about 165 kb). There are integer, float, and complex columns. I hope that this helps! Be Well Anthony 1. https://s3.amazonaws.com/pyne/prebuilt_nuc_data.h5 On Sun, Oct 21, 2012 at 10:41 AM, Francesc Alted <fa...@py...>wrote: > Hi, > > I'm going to give a tutorial on PyTables next Thursday during the PyData > conference in New York (http://nyc2012.pydata.org/) and I'd like to use > some real life data files. So, if you have some public repository with > data generated with PyTables, please tell me. I'm looking for files > that are not very large (< 1GB), and that use the Table object > significantly. A small description of the data included will be more > that welcome too! > > Thanks! > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Andy W. <wil...@gm...> - 2012-10-21 20:02:26
|
On Sun, Oct 21, 2012 at 10:41 AM, Francesc Alted <fa...@py...> wrote: > Hi, > > I'm going to give a tutorial on PyTables next Thursday during the PyData > conference in New York (http://nyc2012.pydata.org/) and I'd like to use > some real life data files. So, if you have some public repository with > data generated with PyTables, please tell me. I'm looking for files > that are not very large (< 1GB), and that use the Table object > significantly. A small description of the data included will be more > that welcome too! > > Thanks! > > -- > Francesc Alted Hi Francesc. I've been working on a library for accessing climatology data that uses pytables to cache data from the USGS. It could easily be used to create a sample dataset for some area of interest. File size is determined by how much data gets queried. The general layout is: /usgs/sites - the sites table contains information and metadata about a site /usgs/values/<AGENCY>/<SITE_CODE>/<PARAMETER_CODE> - a table containing all the timeseries data for each site and parameter is created as data are queried - parameter codes are a bit obscure but a dict with descriptive metadata stashed at table.attrs.variable - the datetime column has a CSIndex on it and stored as as a string because some sites have data prior to the year 1901 - pretty inefficient in terms of disk space (lots of large-ish string columns) because it handles a very general class of data types Here's what the code would look like to download and create the hdf5 file for 10 random sites in New York: import ulmo # the default location for the hdf5 file is OS dependent, so provide the path you want to use hdf5_file_path = './usgs_data.h5' # get list of sites in NY ulmo.usgs.pytables.update_site_list(state_code='NY', path=hdf5_file_path) sites = ulmo.usgs.pytables.get_sites(path=hdf5_file_path) # download data for a few random sites for site in sites.keys()[:10]: ulmo.usgs.pytables.update_site_data(site, path=hdf5_file_path) The project is on github: https://github.com/swtools/ulmo and the code that does all the pytables stuff (including the table descriptions) is here: https://github.com/swtools/ulmo/blob/master/ulmo/usgs/pytables.py -andy |
From: Jason M. <jk...@uc...> - 2012-10-21 20:55:48
|
This is a PyTables generated file with data collected from vehicle (bicycle) dynamics measurements. Meta data are in tables and time series are stored in array objects. http://mae.ucdavis.edu/~biosport/InstrumentedBicycleData/InstrumentedBicycleData.h5.bz2 It is about 308 mb compressed and 610 mb uncompressed. Jason On Sun, Oct 21, 2012 at 1:01 PM, Andy Wilson <wil...@gm...>wrote: > On Sun, Oct 21, 2012 at 10:41 AM, Francesc Alted <fa...@py...> > wrote: > > > Hi, > > > > I'm going to give a tutorial on PyTables next Thursday during the PyData > > conference in New York (http://nyc2012.pydata.org/) and I'd like to use > > some real life data files. So, if you have some public repository with > > data generated with PyTables, please tell me. I'm looking for files > > that are not very large (< 1GB), and that use the Table object > > significantly. A small description of the data included will be more > > that welcome too! > > > > Thanks! > > > > -- > > Francesc Alted > > > > Hi Francesc. > > I've been working on a library for accessing climatology data that > uses pytables to cache data from the USGS. It could easily be used to > create a sample dataset for some area of interest. File size is > determined by how much data gets queried. > > > The general layout is: > > /usgs/sites > - the sites table contains information and metadata about a site > > > /usgs/values/<AGENCY>/<SITE_CODE>/<PARAMETER_CODE> > - a table containing all the timeseries data for each site and > parameter is created as data are queried > - parameter codes are a bit obscure but a dict with descriptive > metadata stashed at table.attrs.variable > - the datetime column has a CSIndex on it and stored as as a string > because some sites have data prior to the year 1901 > - pretty inefficient in terms of disk space (lots of large-ish string > columns) because it handles a very general class of data types > > > Here's what the code would look like to download and create the hdf5 > file for 10 random sites in New York: > > import ulmo > > # the default location for the hdf5 file is OS dependent, so provide > the path you want to use > hdf5_file_path = './usgs_data.h5' > > # get list of sites in NY > ulmo.usgs.pytables.update_site_list(state_code='NY', path=hdf5_file_path) > sites = ulmo.usgs.pytables.get_sites(path=hdf5_file_path) > > # download data for a few random sites > for site in sites.keys()[:10]: > ulmo.usgs.pytables.update_site_data(site, path=hdf5_file_path) > > > > The project is on github: https://github.com/swtools/ulmo > and the code that does all the pytables stuff (including the table > descriptions) is here: > https://github.com/swtools/ulmo/blob/master/ulmo/usgs/pytables.py > > -andy > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- Jason K. Moore, Ph.D. Personal Website <http://biosport.ucdavis.edu/lab-members/jason-moore> Sports Biomechanics Lab <http://biosport.ucdavis.edu>, UC Davis Davis Open Science <http://daviswiki.org/Davis_Open_Science> Google Voice: +01 530-601-9791 |
From: Francesc A. <fa...@py...> - 2012-10-22 12:46:11
|
Hey, thanks to everybody that contributed datasets! I'll look into them and hope to be able to select something to show. Francesc On 10/21/12 10:55 PM, Jason Moore wrote: > This is a PyTables generated file with data collected from vehicle > (bicycle) dynamics measurements. Meta data are in tables and time > series are stored in array objects. > > http://mae.ucdavis.edu/~biosport/InstrumentedBicycleData/InstrumentedBicycleData.h5.bz2 > <http://mae.ucdavis.edu/%7Ebiosport/InstrumentedBicycleData/InstrumentedBicycleData.h5.bz2> > > It is about 308 mb compressed and 610 mb uncompressed. > > Jason > > On Sun, Oct 21, 2012 at 1:01 PM, Andy Wilson > <wil...@gm... <mailto:wil...@gm...>> wrote: > > On Sun, Oct 21, 2012 at 10:41 AM, Francesc Alted > <fa...@py... <mailto:fa...@py...>> wrote: > > > Hi, > > > > I'm going to give a tutorial on PyTables next Thursday during > the PyData > > conference in New York (http://nyc2012.pydata.org/) and I'd like > to use > > some real life data files. So, if you have some public > repository with > > data generated with PyTables, please tell me. I'm looking for files > > that are not very large (< 1GB), and that use the Table object > > significantly. A small description of the data included will be > more > > that welcome too! > > > > Thanks! > > > > -- > > Francesc Alted > > > > Hi Francesc. > > I've been working on a library for accessing climatology data that > uses pytables to cache data from the USGS. It could easily be used to > create a sample dataset for some area of interest. File size is > determined by how much data gets queried. > > > The general layout is: > > /usgs/sites > - the sites table contains information and metadata about a site > > > /usgs/values/<AGENCY>/<SITE_CODE>/<PARAMETER_CODE> > - a table containing all the timeseries data for each site and > parameter is created as data are queried > - parameter codes are a bit obscure but a dict with descriptive > metadata stashed at table.attrs.variable > - the datetime column has a CSIndex on it and stored as as a string > because some sites have data prior to the year 1901 > - pretty inefficient in terms of disk space (lots of large-ish string > columns) because it handles a very general class of data types > > > Here's what the code would look like to download and create the hdf5 > file for 10 random sites in New York: > > import ulmo > > # the default location for the hdf5 file is OS dependent, so provide > the path you want to use > hdf5_file_path = './usgs_data.h5' > > # get list of sites in NY > ulmo.usgs.pytables.update_site_list(state_code='NY', > path=hdf5_file_path) > sites = ulmo.usgs.pytables.get_sites(path=hdf5_file_path) > > # download data for a few random sites > for site in sites.keys()[:10]: > ulmo.usgs.pytables.update_site_data(site, path=hdf5_file_path) > > > > The project is on github: https://github.com/swtools/ulmo > and the code that does all the pytables stuff (including the table > descriptions) is here: > https://github.com/swtools/ulmo/blob/master/ulmo/usgs/pytables.py > > -andy > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > -- > Jason K. Moore, Ph.D. > Personal Website <http://biosport.ucdavis.edu/lab-members/jason-moore> > Sports Biomechanics Lab <http://biosport.ucdavis.edu>, UC Davis > Davis Open Science <http://daviswiki.org/Davis_Open_Science> > Google Voice: +01 530-601-9791 > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted |