From: Oleksandr H. <guz...@gm...> - 2013-08-05 19:43:41
|
Thank you Anthony and Jeff: I will try compression and if that won't be enough, I'll try using pandas for working with dates. Cheers -- Sasha 2013/8/5 Jeff Reback <jr...@ya...> > Here is a pandas solution for doing just this (which uses PyTables under > the hood): > > # create a frame > In [45]: df = > DataFrame(randn(1000,2),index=date_range('20000101',periods=1000)) > > In [53]: df > Out[53]: > <class 'pandas.core.frame.DataFrame'> > DatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00 > Freq: D > Data columns (total 2 columns): > 0 1000 non-null values > 1 1000 non-null values > dtypes: float64(2) > > # store it as a table > In [46]: store = pd.HDFStore('test.h5',mode='w') > > In [47]: store.append('df',df) > > # select out the index (a datetimeindex in this case) > In [48]: c = store.select_column('df','index') > > # get the coordinates of matching index > In [49]: coords = c[pd.DatetimeIndex(c).month==5] > > # select those rows > In [51]: from pandas.io.pytables import Coordinates > > In [50]: store.select('df',where=Coordinates(coords.index,None,None)) > Out[50]: > <class 'pandas.core.frame.DataFrame'> > DatetimeIndex: 93 entries, 2000-05-01 00:00:00 to 2002-05-31 00:00:00 > Data columns (total 2 columns): > 0 93 non-null values > 1 93 non-null values > dtypes: float64(2) > > > ------------------------------ > *From:* Anthony Scopatz <sc...@gm...> > *To:* Discussion list for PyTables <pyt...@li...> > *Sent:* Monday, August 5, 2013 2:54 PM > *Subject:* Re: [Pytables-users] dates and space > > On Mon, Aug 5, 2013 at 1:38 PM, Oleksandr Huziy <guz...@gm...>wrote: > > Hi Pytables users and developers: > > I have a few questions to which I could not find the answer in the > documentation. Thank you in advance for any help. > > 1. If I store dates in Pytables, does it mean I could write queries like > table.where('date.month == 5')? Is there a common way to pass from python's > datetime to pytable's datetime and inversely? > > > Hello Sasha, > > Pytables times are the actual based off of C time, not Python's date > times. This is because they use the HDF5 time types. So unfortunately you > can't write queries like the one above. (You'd need to talk to numexpr > about getting that kind of query implemented ~_~.) > > Instead I would suggest that you store your times as Float64Atoms and > Float64Cols and then use arithmetic to figure out the query: > > table.where("(x / 3600 / 24)%12 == 5") > > This is not perfect... > > > 2. I have several variables stored in the same file in a separate table > for each variable. And I use separate columns year, month, day, hour, > minute, second - to mark the time for a record (the records are not > necessarily ordered in time) and this is for each variable. I was thinking > to put all the variables in the same table and put missing values for the > variables which do not have outputs for a given time step. Is it possible > to put None as a default value into a table (so I could easily filter dummy > rows). > > > It is not possible to use "None" since that is a Python object of a > different type than the other integers you are trying to stick in the > column. I would suggest that you use values with no actual meaning. If > you are using normal ints you can use -1 to represent missing values. If > you are using unsigned ints you have to pick other values, like 13 for > month on the Julian calendar. > > > But then again the data comes in chunks, does this mean I would have to > check if a row with the same date already exist for a different variable? > > > No you wouldn't you can store the same data multiple times in different > rows. > > > I don't really like the ideas in 2, which are intended to save space, but > maybe all I need is a good compression level? Can somebody advise me on > this? > > > Compression would definitely help here since the date numbers are all > fairly similar. Probably even a compression level of 1 would work. Keep > in mind that sometime using compression actually speeds things up (see the > starving CPU problem). You might just need to experiment with a few > different compression level to see how things go. 0, 1, 5, 9 gives you a > good spread. > > Be Well > Anthony > > > > > > Cheers > -- > Oleksandr (Sasha) Huziy > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |