Re: [Pytables-users] PyTables in-kernel query using Time64Col returns wrong results

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

And here is the issue: https://github.com/PyTables/PyTables/issues/230


On Mon, Apr 15, 2013 at 6:07 PM, Anthony Scopatz <sc...@gm...> wrote:

> Hi Charles,
>
> This is very likely a bug with respect to querying based off of Time64Cols
> not being converted to Float64s for the query itself.  Under the covers,
> HDF5 and PyTables represent Time64 as a posix times, which are structs of
> two 4 byte ints [1].  These obviously have a very different memory layout
> than your standard float64.  This is why this comparison is failing.
>
> numexpr doesn't support the time64 datatype, nor does it support bit shift
> operators.  This makes it difficult to impossible to use time64 columns
> properly from within a query right now.
>
> I'll make open a ticket for this, but if you want something working right
> now using Float64Col is probably your best bet.  This is what I have always
> done, and it works just fine.  I think that the Time64 stuff is in there
> largely for C/HDF5 compliance.  Sorry about the confusion.
>
> Be Well
> Anthony
>
> 1. http://pubs.opengroup.org/onlinepubs/000095399/basedefs/sys/time.h.html
>
>
> On Mon, Apr 15, 2013 at 2:20 PM, Charles de Villiers <ch...@ya...>wrote:
>
>> Hi Anthony,
>>
>> Thanks for your response.
>>
>> I had come across that discussion, but I don't think the floating-point
>> precision thing really explains my results, because I'm querying for
>> intervals, not instants.
>> if I have a table containing, say, one-second samples between 500.0 and
>> 1500.0, and I use a where clause like this:
>> '(update_seconds >= 1000.0) & (update_seconds <= 1060.0)'
>> then I expect to get at least 58 samples, even with floating-point
>> 'fuzziness' - but in fact I get none.
>> However, I have now tried the approach of storing my epoch seconds in
>> Float64Cols and that seems to be working just fine.
>> The question I'm left with is - just what does a Time64Col represent?
>> Since there's no standard Python Time class with a float representation, I
>> just guessed I  could assign it float seconds a la time.time(), but
>> Float64 works just as well for that (and as it turns out, better). How
>> could you use a Time64Col in practice?
>>
>> Thanks again,
>>
>> Charles de Villiers
>>
>> "They have computers, and they may have other weapons of mass
>> destruction."
>> (Janet Reno)
>>
>>   ------------------------------
>>  *From:* Anthony Scopatz <sc...@gm...>
>> *To:* Charles de Villiers <ch...@ya...>; Discussion list for
>> PyTables <pyt...@li...>
>> *Sent:* Monday, April 15, 2013 5:13 PM
>> *Subject:* Re: [Pytables-users] PyTables in-kernel query using Time64Col
>> returns wrong results
>>
>> Hi Charles,
>>
>> We just discussed this last week and I am too lazy to retype it all so
>> here is a link to the archive post [1].
>>
>> Be Well
>> Anthony
>>
>> 1. http://sourceforge.net/mailarchive/message.php?msg_id=30708089
>>
>>
>> On Mon, Apr 15, 2013 at 9:20 AM, Charles de Villiers <ch...@ya...>wrote:
>>
>>
>> 0down votefavorite<http://stackoverflow.com/questions/16013711/pytables-in-kernel-search-on-time64col#>
>> **
>>  I'm using PyTables 2.4.0 and Python 2.7 I've got a database that
>> contains the following typical table:
>>
>> /anc/asc_wind_speed (Table(87591,), shuffle, blosc(3)) 'Wind speed'
>>   description := {
>>   "value_seconds": Time64Col(shape=(), dflt=0.0, pos=0),
>>   "update_seconds": Time64Col(shape=(),
>>  dflt=0.0, pos=1),
>>   "status": UInt8Col(shape=(), dflt=0, pos=2),
>>   "value": Float64Col(shape=(), dflt=0.0, pos=3)}
>>   byteorder := 'little'
>>   chunkshape := (2621,)
>>   autoIndex := True
>>   colindexes := {
>>     "update_seconds": Index(9,
>>  full, shuffle, zlib(1)).is_CSI=True,
>>     "value": Index(9,
>>  full, shuffle, zlib(1)).is_CSI=True}
>>
>> I populate the timestamp columns using float seconds.
>> The data looks OK in my IPython session:
>>
>> array([(1343779432.2160001, 1343779431.8529999, 0, 5.2975000000000003),
>>        (1343779433.2190001, 1343779432.9430001, 0, 5.7474999999999996),
>>        (1343779434.217, 1343779433.9809999, 0, 5.8600000000000003), ...,
>>        (1343866301.934, 1343866301.5139999, 0, 3.8424999999999998),
>>        (1343866302.934, 1343866302.5799999, 0, 4.0599999999999996),
>>        (1343866303.934, 1343866303.642, 0, 3.7825000000000002)],
>>
>>   dtype=[('value_seconds', '<f8'), ('update_seconds', '<f8'), ('status', '|u1'), ('value', '<f8')])
>>
>> .. but when I try to do an in-kernel search using the indexed column
>> 'update_seconds', everything goes pear-shaped:
>>
>> len(wstable.readWhere('(update_seconds <= 1343866303.642)'))0
>>
>> ie I get 0 rows returned when I was expecting all 87591 of them.
>> Occasionally I do manage to get some rows with a '>=' query, but the
>> timestamp columns are then returned as huge floats (~10^79). It seems that
>> there is some implicit type-conversion going on that causes the Time64Col
>> values to be misinterpreted. Can someone spot my mistake, or should I
>> forget about Time64Cols and convert them all to Float64 (and how do I do
>> this?)
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>>
>>
>>
>