|
From: Anthony S. <sc...@gm...> - 2013-04-15 23:15:45
|
And here is the issue: https://github.com/PyTables/PyTables/issues/230 On Mon, Apr 15, 2013 at 6:07 PM, Anthony Scopatz <sc...@gm...> wrote: > Hi Charles, > > This is very likely a bug with respect to querying based off of Time64Cols > not being converted to Float64s for the query itself. Under the covers, > HDF5 and PyTables represent Time64 as a posix times, which are structs of > two 4 byte ints [1]. These obviously have a very different memory layout > than your standard float64. This is why this comparison is failing. > > numexpr doesn't support the time64 datatype, nor does it support bit shift > operators. This makes it difficult to impossible to use time64 columns > properly from within a query right now. > > I'll make open a ticket for this, but if you want something working right > now using Float64Col is probably your best bet. This is what I have always > done, and it works just fine. I think that the Time64 stuff is in there > largely for C/HDF5 compliance. Sorry about the confusion. > > Be Well > Anthony > > 1. http://pubs.opengroup.org/onlinepubs/000095399/basedefs/sys/time.h.html > > > On Mon, Apr 15, 2013 at 2:20 PM, Charles de Villiers <ch...@ya...>wrote: > >> Hi Anthony, >> >> Thanks for your response. >> >> I had come across that discussion, but I don't think the floating-point >> precision thing really explains my results, because I'm querying for >> intervals, not instants. >> if I have a table containing, say, one-second samples between 500.0 and >> 1500.0, and I use a where clause like this: >> '(update_seconds >= 1000.0) & (update_seconds <= 1060.0)' >> then I expect to get at least 58 samples, even with floating-point >> 'fuzziness' - but in fact I get none. >> However, I have now tried the approach of storing my epoch seconds in >> Float64Cols and that seems to be working just fine. >> The question I'm left with is - just what does a Time64Col represent? >> Since there's no standard Python Time class with a float representation, I >> just guessed I could assign it float seconds a la time.time(), but >> Float64 works just as well for that (and as it turns out, better). How >> could you use a Time64Col in practice? >> >> Thanks again, >> >> Charles de Villiers >> >> "They have computers, and they may have other weapons of mass >> destruction." >> (Janet Reno) >> >> ------------------------------ >> *From:* Anthony Scopatz <sc...@gm...> >> *To:* Charles de Villiers <ch...@ya...>; Discussion list for >> PyTables <pyt...@li...> >> *Sent:* Monday, April 15, 2013 5:13 PM >> *Subject:* Re: [Pytables-users] PyTables in-kernel query using Time64Col >> returns wrong results >> >> Hi Charles, >> >> We just discussed this last week and I am too lazy to retype it all so >> here is a link to the archive post [1]. >> >> Be Well >> Anthony >> >> 1. http://sourceforge.net/mailarchive/message.php?msg_id=30708089 >> >> >> On Mon, Apr 15, 2013 at 9:20 AM, Charles de Villiers <ch...@ya...>wrote: >> >> >> 0down votefavorite<http://stackoverflow.com/questions/16013711/pytables-in-kernel-search-on-time64col#> >> ** >> I'm using PyTables 2.4.0 and Python 2.7 I've got a database that >> contains the following typical table: >> >> /anc/asc_wind_speed (Table(87591,), shuffle, blosc(3)) 'Wind speed' >> description := { >> "value_seconds": Time64Col(shape=(), dflt=0.0, pos=0), >> "update_seconds": Time64Col(shape=(), >> dflt=0.0, pos=1), >> "status": UInt8Col(shape=(), dflt=0, pos=2), >> "value": Float64Col(shape=(), dflt=0.0, pos=3)} >> byteorder := 'little' >> chunkshape := (2621,) >> autoIndex := True >> colindexes := { >> "update_seconds": Index(9, >> full, shuffle, zlib(1)).is_CSI=True, >> "value": Index(9, >> full, shuffle, zlib(1)).is_CSI=True} >> >> I populate the timestamp columns using float seconds. >> The data looks OK in my IPython session: >> >> array([(1343779432.2160001, 1343779431.8529999, 0, 5.2975000000000003), >> (1343779433.2190001, 1343779432.9430001, 0, 5.7474999999999996), >> (1343779434.217, 1343779433.9809999, 0, 5.8600000000000003), ..., >> (1343866301.934, 1343866301.5139999, 0, 3.8424999999999998), >> (1343866302.934, 1343866302.5799999, 0, 4.0599999999999996), >> (1343866303.934, 1343866303.642, 0, 3.7825000000000002)], >> >> dtype=[('value_seconds', '<f8'), ('update_seconds', '<f8'), ('status', '|u1'), ('value', '<f8')]) >> >> .. but when I try to do an in-kernel search using the indexed column >> 'update_seconds', everything goes pear-shaped: >> >> len(wstable.readWhere('(update_seconds <= 1343866303.642)'))0 >> >> ie I get 0 rows returned when I was expecting all 87591 of them. >> Occasionally I do manage to get some rows with a '>=' query, but the >> timestamp columns are then returned as huge floats (~10^79). It seems that >> there is some implicit type-conversion going on that causes the Time64Col >> values to be misinterpreted. Can someone spot my mistake, or should I >> forget about Time64Cols and convert them all to Float64 (and how do I do >> this?) >> >> >> >> ------------------------------------------------------------------------------ >> Precog is a next-generation analytics platform capable of advanced >> analytics on semi-structured data. The platform includes APIs for building >> apps and a phenomenal toolset for data science. Developers can use >> our toolset for easy data analysis & visualization. Get a free account! >> http://www2.precog.com/precogplatform/slashdotnewsletter >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> > |