From: <ben...@lf...> - 2012-08-01 07:10:18
|
> There might be an easier way to do this with numpy dtypes. In pseudo- > code: > > np.dtype([(colname, np.int16) for colname in colnames]) > Can we use time and enum kinds that way as well? This makes me think I should probably flatten my table. Having nested columns is quite natural to group cells under a data item (it's nice to access a field like I030_180/SPEED). But that's more a naming convention and I can't perform in-kernel searches with nested columns. > > If you want to mark the whole column as valid, you can use a boolean > attribute on the table itself for each column. They could be named > like colname_valid. > > See > http://pytables.github.com/usersguide/libref.html#tables.Leaf.setAttr > and http://pytables.github.com/usersguide/libref.html#the-attributeset- > class for more info. Good to know. > > This is more for flagging individual cells as valid or not. For > integers you need to pick a values which means invalid (like -999999). > > Yes, you are right. It's not a column I want to flag but a group of cells in a row (all cells of a particular data item). Problem is that if I have a uint8 for a cell, there is no invalid value I can use. Maybe I could use a "larger type" (uint16 in this case) to be able to pick an invalid value. Or a bool for this group of cells. > > If you don't want to use a VLArray, then maxlen is probably your best > option. > > If you want to do something a little more sophisticated, you could > break you data out into a main table and then a helper VLarray. Every > row in the table is matched by the same row in vlarray. Then when you > want to get your full data back out, you have to go to the table and > the vlarray. This makes things a little more annoying to work with, > but it does what you want. Interesting. I'll look more into it. > > Hope this helps. Feel free to ask more questions! Yes, it does. Thanks! Benjamin > Be Well > Anthony > > > > > > > Cheers, > > > > Benjamin > > > > > > class I030_180_DESC(tables.IsDescription): > > """Calculated Track Velocity (Polar)""" > > SPEED = tables.UInt16Col(pos=0) > > HEADING = tables.UInt16Col(pos=1) > > > > class I030_181_DESC(tables.IsDescription): > > """Calculated Track Velocity (Cartesian)""" > > X = tables.Int16Col(pos=0) > > Y = tables.Int16Col(pos=1) > > > > class I030_340_DESC(tables.IsDescription): > > """Last Measured Mode 3/A""" > > V = tables.EnumCol(tables.Enum({ > > "Code validated": 0, > > "Code not validated": 1, > > "uninitialized": 255 > > }), "uninitialized", > > base="uint8", > > pos=0) > > G = tables.EnumCol(tables.Enum({ > > "Default": 0, > > "Garbled code": 1, > > "uninitialized": 255 > > }), "uninitialized", > > base="uint8", > > pos=1) > > L = tables.EnumCol(tables.Enum({ > > "MODE 3/A code as derived from the reply of the > transponder,": 0, > > "Smoothed MODE 3/A code as provided by a local > tracker": 1 > > "uninitialized": 255 > > }), "uninitialized", > > base="uint8", > > pos=2) > > sb = tables.UInt8Col(pos=3) > > mode_3_a = tables.UInt16Col(pos=4) > > > > class I030_400_DESC(tables.IsDescription): > > """Callsign""" > > callsign = tables.StringCol(7, pos=0) > > > > class I030_050_DESC(tables.IsDescription): > > """Artas Track Number""" > > AUI = tables.UInt8Col(pos=0) > > unused = tables.UInt8Col(pos=1) > > STN = tables.UInt16Col(pos=2) > > FX = tables.EnumCol(tables.Enum({ > > "end of data item": 0, > > "extension into next extent": 1, > > "uninitialized": 255 > > }), "uninitialized", > > base="uint8", > > pos=3) > > > > class I030Record(tables.IsDescription): > > """Cat 030 record""" > > ff_timestamp = tables.Time32Col() > > I030_010 = I030_010_DESC() > > I030_015 = I030_015_DESC() > > I030_030 = I030_030_DESC() > > I030_035 = I030_035_DESC() > > I030_040 = I030_040_DESC() > > I030_070 = I030_070_DESC() > > I030_170 = I030_170_DESC() > > I030_100 = I030_100_DESC() > > I030_180 = I030_180_DESC() > > I030_181 = I030_181_DESC() > > I030_060 = I030_060_DESC() > > I030_150 = I030_150_DESC() > > I030_140 = I030_140_DESC() > > I030_340 = I030_340_DESC() > > I030_400 = I030_400_DESC() > > ... > > I030_210 = I030_210_DESC() > > I030_120 = I030_120_DESC() > > I030_050 = I030_050_DESC() > > I030_270 = I030_270_DESC() > > I030_370 = I030_370_DESC() > > > > > > Från: Anthony Scopatz [mailto:sc...@gm...] > Skickat: den 12 juli 2012 00:02 > Till: Discussion list for PyTables > Ämne: Re: [Pytables-users] advice on using PyTables > > > > Hello Benjamin, > > > > Not knowing to much about the ASTERIX format, other than > what you said and what is in the links, I would say that this is a good > fit for HDF5 and PyTables. PyTables will certainly help you read in > the data and manipulate it. > > > > However, before you abandon hachoir completely, I will say > it is a lot easier to write hdf5 files in PyTables than to use the HDF5 > C API. If hachoir is too slow, have you tried profiling the code to > see what is taking up the most time? Maybe you could just rewrite > these parts in C? Have you looked into Cythonizing it? Also, you > don't seem to be using numpy to read in the data... (there are some > tricks given ASTERIX here, but not insurmountable). > > > > I ask the above, just so you don't have to completely > rewrite everything. You are correct though that pure python is > probably not sufficient. Feel free to ask more questions here. > > > > Be Well > > Anthony > > > > On Wed, Jul 11, 2012 at 6:52 AM, <ben...@lf...> > wrote: > > Hi, > > I'm working with Air Traffic Management and would like to > perform checks / compute statistics on ASTERIX data. > ASTERIX is an ATM Surveillance Data Binary Messaging Format > (http://www.eurocontrol.int/asterix/public/standard_page/overview.html) > > The data consist of a concatenation of consecutive data > blocks. > Each data block consists of data category + length + > records. > Each record is of variable length and consists of several > data items (that are well defined for each category). > Some data items might be present or not depending on a field > specification (bitfield). > > I started to write a parser using hachoir > (https://bitbucket.org/haypo/hachoir/overview) a pure python library. > But the parsing was really too slow and taking a lot of > memory. > That's not really useable. > > >From what I read, PyTables could really help to manipulate > and analyze the data. > So I've been thinking about writing a tool (probably in C) > to convert my ASTERIX format to HDF5. > > Before I start, I'd like confirmation that this seems like a > suitable application for PyTables. > Is there another approach than writing a conversion tool to > HDF5? > > Thanks in advance > > Benjamin > > > > > ------------------------------------------------------------ > ------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's > security and > threat landscape has changed and how IT managers can > respond. Discussions > will include endpoint security, mobile security and the > latest in malware > threats. > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > |