From: Anthony S. <sc...@gm...> - 2012-08-01 15:16:05
|
On Wed, Aug 1, 2012 at 1:10 AM, <ben...@lf...> wrote: > > There might be an easier way to do this with numpy dtypes. In pseudo- > > code: > > > > np.dtype([(colname, np.int16) for colname in colnames]) > > > > Can we use time and enum kinds that way as well? > Ahh you might not be able to use these types, depending on your numpy version. Instead I would use dictionary comprehensions of columns: {colname: UInt16Col(pos=i) for i, colname in enumerate(colnames)} This makes me think I should probably flatten my table. > Having nested columns is quite natural to group cells under a data item > (it's nice to access a field like I030_180/SPEED). > But that's more a naming convention and I can't perform in-kernel searches > with nested columns. > It would sure be nice if we could though ;) > > > > > If you want to mark the whole column as valid, you can use a boolean > > attribute on the table itself for each column. They could be named > > like colname_valid. > > > > See > > http://pytables.github.com/usersguide/libref.html#tables.Leaf.setAttr > > and http://pytables.github.com/usersguide/libref.html#the-attributeset- > > class for more info. > > Good to know. > > > > > This is more for flagging individual cells as valid or not. For > > integers you need to pick a values which means invalid (like -999999). > > > > > > Yes, you are right. It's not a column I want to flag but a group of cells > in a row (all cells of a particular data item). > Problem is that if I have a uint8 for a cell, there is no invalid value I > can use. > Maybe I could use a "larger type" (uint16 in this case) to be able to pick > an invalid value. > Or a bool for this group of cells. > Yes, if you used a bool column next to it you could use this as a valid mask. Note that since bools are stored with a full byte, the bool column + the uint8 column the same size as uint16. However, it will be much quicker to query over the bool column. Be Well Anthony > > > > > If you don't want to use a VLArray, then maxlen is probably your best > > option. > > > > If you want to do something a little more sophisticated, you could > > break you data out into a main table and then a helper VLarray. Every > > row in the table is matched by the same row in vlarray. Then when you > > want to get your full data back out, you have to go to the table and > > the vlarray. This makes things a little more annoying to work with, > > but it does what you want. > > Interesting. I'll look more into it. > > > > > Hope this helps. Feel free to ask more questions! > > Yes, it does. > Thanks! > > > Benjamin > > > > Be Well > > Anthony > > > > > > > > > > > > > > Cheers, > > > > > > > > Benjamin > > > > > > > > > > > > class I030_180_DESC(tables.IsDescription): > > > > """Calculated Track Velocity (Polar)""" > > > > SPEED = tables.UInt16Col(pos=0) > > > > HEADING = tables.UInt16Col(pos=1) > > > > > > > > class I030_181_DESC(tables.IsDescription): > > > > """Calculated Track Velocity (Cartesian)""" > > > > X = tables.Int16Col(pos=0) > > > > Y = tables.Int16Col(pos=1) > > > > > > > > class I030_340_DESC(tables.IsDescription): > > > > """Last Measured Mode 3/A""" > > > > V = tables.EnumCol(tables.Enum({ > > > > "Code validated": 0, > > > > "Code not validated": 1, > > > > "uninitialized": 255 > > > > }), "uninitialized", > > > > base="uint8", > > > > pos=0) > > > > G = tables.EnumCol(tables.Enum({ > > > > "Default": 0, > > > > "Garbled code": 1, > > > > "uninitialized": 255 > > > > }), "uninitialized", > > > > base="uint8", > > > > pos=1) > > > > L = tables.EnumCol(tables.Enum({ > > > > "MODE 3/A code as derived from the reply of the > > transponder,": 0, > > > > "Smoothed MODE 3/A code as provided by a local > > tracker": 1 > > > > "uninitialized": 255 > > > > }), "uninitialized", > > > > base="uint8", > > > > pos=2) > > > > sb = tables.UInt8Col(pos=3) > > > > mode_3_a = tables.UInt16Col(pos=4) > > > > > > > > class I030_400_DESC(tables.IsDescription): > > > > """Callsign""" > > > > callsign = tables.StringCol(7, pos=0) > > > > > > > > class I030_050_DESC(tables.IsDescription): > > > > """Artas Track Number""" > > > > AUI = tables.UInt8Col(pos=0) > > > > unused = tables.UInt8Col(pos=1) > > > > STN = tables.UInt16Col(pos=2) > > > > FX = tables.EnumCol(tables.Enum({ > > > > "end of data item": 0, > > > > "extension into next extent": 1, > > > > "uninitialized": 255 > > > > }), "uninitialized", > > > > base="uint8", > > > > pos=3) > > > > > > > > class I030Record(tables.IsDescription): > > > > """Cat 030 record""" > > > > ff_timestamp = tables.Time32Col() > > > > I030_010 = I030_010_DESC() > > > > I030_015 = I030_015_DESC() > > > > I030_030 = I030_030_DESC() > > > > I030_035 = I030_035_DESC() > > > > I030_040 = I030_040_DESC() > > > > I030_070 = I030_070_DESC() > > > > I030_170 = I030_170_DESC() > > > > I030_100 = I030_100_DESC() > > > > I030_180 = I030_180_DESC() > > > > I030_181 = I030_181_DESC() > > > > I030_060 = I030_060_DESC() > > > > I030_150 = I030_150_DESC() > > > > I030_140 = I030_140_DESC() > > > > I030_340 = I030_340_DESC() > > > > I030_400 = I030_400_DESC() > > > > ... > > > > I030_210 = I030_210_DESC() > > > > I030_120 = I030_120_DESC() > > > > I030_050 = I030_050_DESC() > > > > I030_270 = I030_270_DESC() > > > > I030_370 = I030_370_DESC() > > > > > > > > > > > > Från: Anthony Scopatz [mailto:sc...@gm...] > > Skickat: den 12 juli 2012 00:02 > > Till: Discussion list for PyTables > > Ämne: Re: [Pytables-users] advice on using PyTables > > > > > > > > Hello Benjamin, > > > > > > > > Not knowing to much about the ASTERIX format, other than > > what you said and what is in the links, I would say that this is a good > > fit for HDF5 and PyTables. PyTables will certainly help you read in > > the data and manipulate it. > > > > > > > > However, before you abandon hachoir completely, I will say > > it is a lot easier to write hdf5 files in PyTables than to use the HDF5 > > C API. If hachoir is too slow, have you tried profiling the code to > > see what is taking up the most time? Maybe you could just rewrite > > these parts in C? Have you looked into Cythonizing it? Also, you > > don't seem to be using numpy to read in the data... (there are some > > tricks given ASTERIX here, but not insurmountable). > > > > > > > > I ask the above, just so you don't have to completely > > rewrite everything. You are correct though that pure python is > > probably not sufficient. Feel free to ask more questions here. > > > > > > > > Be Well > > > > Anthony > > > > > > > > On Wed, Jul 11, 2012 at 6:52 AM, <ben...@lf...> > > wrote: > > > > Hi, > > > > I'm working with Air Traffic Management and would like to > > perform checks / compute statistics on ASTERIX data. > > ASTERIX is an ATM Surveillance Data Binary Messaging Format > > (http://www.eurocontrol.int/asterix/public/standard_page/overview.html) > > > > The data consist of a concatenation of consecutive data > > blocks. > > Each data block consists of data category + length + > > records. > > Each record is of variable length and consists of several > > data items (that are well defined for each category). > > Some data items might be present or not depending on a field > > specification (bitfield). > > > > I started to write a parser using hachoir > > (https://bitbucket.org/haypo/hachoir/overview) a pure python library. > > But the parsing was really too slow and taking a lot of > > memory. > > That's not really useable. > > > > >From what I read, PyTables could really help to manipulate > > and analyze the data. > > So I've been thinking about writing a tool (probably in C) > > to convert my ASTERIX format to HDF5. > > > > Before I start, I'd like confirmation that this seems like a > > suitable application for PyTables. > > Is there another approach than writing a conversion tool to > > HDF5? > > > > Thanks in advance > > > > Benjamin > > > > > > > > > > ------------------------------------------------------------ > > ------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's > > security and > > threat landscape has changed and how IT managers can > > respond. Discussions > > will include endpoint security, mobile security and the > > latest in malware > > threats. > > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |