|
From: Oleksandr H. <guz...@gm...> - 2013-08-05 19:43:41
|
Thank you Anthony and Jeff:
I will try compression and if that won't be enough, I'll try using pandas
for working with dates.
Cheers
--
Sasha
2013/8/5 Jeff Reback <jr...@ya...>
> Here is a pandas solution for doing just this (which uses PyTables under
> the hood):
>
> # create a frame
> In [45]: df =
> DataFrame(randn(1000,2),index=date_range('20000101',periods=1000))
>
> In [53]: df
> Out[53]:
> <class 'pandas.core.frame.DataFrame'>
> DatetimeIndex: 1000 entries, 2000-01-01 00:00:00 to 2002-09-26 00:00:00
> Freq: D
> Data columns (total 2 columns):
> 0 1000 non-null values
> 1 1000 non-null values
> dtypes: float64(2)
>
> # store it as a table
> In [46]: store = pd.HDFStore('test.h5',mode='w')
>
> In [47]: store.append('df',df)
>
> # select out the index (a datetimeindex in this case)
> In [48]: c = store.select_column('df','index')
>
> # get the coordinates of matching index
> In [49]: coords = c[pd.DatetimeIndex(c).month==5]
>
> # select those rows
> In [51]: from pandas.io.pytables import Coordinates
>
> In [50]: store.select('df',where=Coordinates(coords.index,None,None))
> Out[50]:
> <class 'pandas.core.frame.DataFrame'>
> DatetimeIndex: 93 entries, 2000-05-01 00:00:00 to 2002-05-31 00:00:00
> Data columns (total 2 columns):
> 0 93 non-null values
> 1 93 non-null values
> dtypes: float64(2)
>
>
> ------------------------------
> *From:* Anthony Scopatz <sc...@gm...>
> *To:* Discussion list for PyTables <pyt...@li...>
> *Sent:* Monday, August 5, 2013 2:54 PM
> *Subject:* Re: [Pytables-users] dates and space
>
> On Mon, Aug 5, 2013 at 1:38 PM, Oleksandr Huziy <guz...@gm...>wrote:
>
> Hi Pytables users and developers:
>
> I have a few questions to which I could not find the answer in the
> documentation. Thank you in advance for any help.
>
> 1. If I store dates in Pytables, does it mean I could write queries like
> table.where('date.month == 5')? Is there a common way to pass from python's
> datetime to pytable's datetime and inversely?
>
>
> Hello Sasha,
>
> Pytables times are the actual based off of C time, not Python's date
> times. This is because they use the HDF5 time types. So unfortunately you
> can't write queries like the one above. (You'd need to talk to numexpr
> about getting that kind of query implemented ~_~.)
>
> Instead I would suggest that you store your times as Float64Atoms and
> Float64Cols and then use arithmetic to figure out the query:
>
> table.where("(x / 3600 / 24)%12 == 5")
>
> This is not perfect...
>
>
> 2. I have several variables stored in the same file in a separate table
> for each variable. And I use separate columns year, month, day, hour,
> minute, second - to mark the time for a record (the records are not
> necessarily ordered in time) and this is for each variable. I was thinking
> to put all the variables in the same table and put missing values for the
> variables which do not have outputs for a given time step. Is it possible
> to put None as a default value into a table (so I could easily filter dummy
> rows).
>
>
> It is not possible to use "None" since that is a Python object of a
> different type than the other integers you are trying to stick in the
> column. I would suggest that you use values with no actual meaning. If
> you are using normal ints you can use -1 to represent missing values. If
> you are using unsigned ints you have to pick other values, like 13 for
> month on the Julian calendar.
>
>
> But then again the data comes in chunks, does this mean I would have to
> check if a row with the same date already exist for a different variable?
>
>
> No you wouldn't you can store the same data multiple times in different
> rows.
>
>
> I don't really like the ideas in 2, which are intended to save space, but
> maybe all I need is a good compression level? Can somebody advise me on
> this?
>
>
> Compression would definitely help here since the date numbers are all
> fairly similar. Probably even a compression level of 1 would work. Keep
> in mind that sometime using compression actually speeds things up (see the
> starving CPU problem). You might just need to experiment with a few
> different compression level to see how things go. 0, 1, 5, 9 gives you a
> good spread.
>
> Be Well
> Anthony
>
>
>
>
>
> Cheers
> --
> Oleksandr (Sasha) Huziy
>
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
|