|
From: Anthony S. <sc...@gm...> - 2012-08-01 15:16:05
|
On Wed, Aug 1, 2012 at 1:10 AM, <ben...@lf...> wrote:
> > There might be an easier way to do this with numpy dtypes. In pseudo-
> > code:
> >
> > np.dtype([(colname, np.int16) for colname in colnames])
> >
>
> Can we use time and enum kinds that way as well?
>
Ahh you might not be able to use these types, depending on your numpy
version. Instead I would use dictionary comprehensions of columns:
{colname: UInt16Col(pos=i) for i, colname in enumerate(colnames)}
This makes me think I should probably flatten my table.
> Having nested columns is quite natural to group cells under a data item
> (it's nice to access a field like I030_180/SPEED).
> But that's more a naming convention and I can't perform in-kernel searches
> with nested columns.
>
It would sure be nice if we could though ;)
>
> >
> > If you want to mark the whole column as valid, you can use a boolean
> > attribute on the table itself for each column. They could be named
> > like colname_valid.
> >
> > See
> > http://pytables.github.com/usersguide/libref.html#tables.Leaf.setAttr
> > and http://pytables.github.com/usersguide/libref.html#the-attributeset-
> > class for more info.
>
> Good to know.
>
> >
> > This is more for flagging individual cells as valid or not. For
> > integers you need to pick a values which means invalid (like -999999).
> >
> >
>
> Yes, you are right. It's not a column I want to flag but a group of cells
> in a row (all cells of a particular data item).
> Problem is that if I have a uint8 for a cell, there is no invalid value I
> can use.
> Maybe I could use a "larger type" (uint16 in this case) to be able to pick
> an invalid value.
> Or a bool for this group of cells.
>
Yes, if you used a bool column next to it you could use this as a valid
mask. Note that since bools are stored with a full byte, the bool column +
the uint8 column the same size as uint16. However, it will be much quicker
to query over the bool column.
Be Well
Anthony
>
> >
> > If you don't want to use a VLArray, then maxlen is probably your best
> > option.
> >
> > If you want to do something a little more sophisticated, you could
> > break you data out into a main table and then a helper VLarray. Every
> > row in the table is matched by the same row in vlarray. Then when you
> > want to get your full data back out, you have to go to the table and
> > the vlarray. This makes things a little more annoying to work with,
> > but it does what you want.
>
> Interesting. I'll look more into it.
>
> >
> > Hope this helps. Feel free to ask more questions!
>
> Yes, it does.
> Thanks!
>
>
> Benjamin
>
>
> > Be Well
> > Anthony
> >
> >
> >
> >
> >
> >
> > Cheers,
> >
> >
> >
> > Benjamin
> >
> >
> >
> >
> >
> > class I030_180_DESC(tables.IsDescription):
> >
> > """Calculated Track Velocity (Polar)"""
> >
> > SPEED = tables.UInt16Col(pos=0)
> >
> > HEADING = tables.UInt16Col(pos=1)
> >
> >
> >
> > class I030_181_DESC(tables.IsDescription):
> >
> > """Calculated Track Velocity (Cartesian)"""
> >
> > X = tables.Int16Col(pos=0)
> >
> > Y = tables.Int16Col(pos=1)
> >
> >
> >
> > class I030_340_DESC(tables.IsDescription):
> >
> > """Last Measured Mode 3/A"""
> >
> > V = tables.EnumCol(tables.Enum({
> >
> > "Code validated": 0,
> >
> > "Code not validated": 1,
> >
> > "uninitialized": 255
> >
> > }), "uninitialized",
> >
> > base="uint8",
> >
> > pos=0)
> >
> > G = tables.EnumCol(tables.Enum({
> >
> > "Default": 0,
> >
> > "Garbled code": 1,
> >
> > "uninitialized": 255
> >
> > }), "uninitialized",
> >
> > base="uint8",
> >
> > pos=1)
> >
> > L = tables.EnumCol(tables.Enum({
> >
> > "MODE 3/A code as derived from the reply of the
> > transponder,": 0,
> >
> > "Smoothed MODE 3/A code as provided by a local
> > tracker": 1
> >
> > "uninitialized": 255
> >
> > }), "uninitialized",
> >
> > base="uint8",
> >
> > pos=2)
> >
> > sb = tables.UInt8Col(pos=3)
> >
> > mode_3_a = tables.UInt16Col(pos=4)
> >
> >
> >
> > class I030_400_DESC(tables.IsDescription):
> >
> > """Callsign"""
> >
> > callsign = tables.StringCol(7, pos=0)
> >
> >
> >
> > class I030_050_DESC(tables.IsDescription):
> >
> > """Artas Track Number"""
> >
> > AUI = tables.UInt8Col(pos=0)
> >
> > unused = tables.UInt8Col(pos=1)
> >
> > STN = tables.UInt16Col(pos=2)
> >
> > FX = tables.EnumCol(tables.Enum({
> >
> > "end of data item": 0,
> >
> > "extension into next extent": 1,
> >
> > "uninitialized": 255
> >
> > }), "uninitialized",
> >
> > base="uint8",
> >
> > pos=3)
> >
> >
> >
> > class I030Record(tables.IsDescription):
> >
> > """Cat 030 record"""
> >
> > ff_timestamp = tables.Time32Col()
> >
> > I030_010 = I030_010_DESC()
> >
> > I030_015 = I030_015_DESC()
> >
> > I030_030 = I030_030_DESC()
> >
> > I030_035 = I030_035_DESC()
> >
> > I030_040 = I030_040_DESC()
> >
> > I030_070 = I030_070_DESC()
> >
> > I030_170 = I030_170_DESC()
> >
> > I030_100 = I030_100_DESC()
> >
> > I030_180 = I030_180_DESC()
> >
> > I030_181 = I030_181_DESC()
> >
> > I030_060 = I030_060_DESC()
> >
> > I030_150 = I030_150_DESC()
> >
> > I030_140 = I030_140_DESC()
> >
> > I030_340 = I030_340_DESC()
> >
> > I030_400 = I030_400_DESC()
> >
> > ...
> >
> > I030_210 = I030_210_DESC()
> >
> > I030_120 = I030_120_DESC()
> >
> > I030_050 = I030_050_DESC()
> >
> > I030_270 = I030_270_DESC()
> >
> > I030_370 = I030_370_DESC()
> >
> >
> >
> >
> >
> > Från: Anthony Scopatz [mailto:sc...@gm...]
> > Skickat: den 12 juli 2012 00:02
> > Till: Discussion list for PyTables
> > Ämne: Re: [Pytables-users] advice on using PyTables
> >
> >
> >
> > Hello Benjamin,
> >
> >
> >
> > Not knowing to much about the ASTERIX format, other than
> > what you said and what is in the links, I would say that this is a good
> > fit for HDF5 and PyTables. PyTables will certainly help you read in
> > the data and manipulate it.
> >
> >
> >
> > However, before you abandon hachoir completely, I will say
> > it is a lot easier to write hdf5 files in PyTables than to use the HDF5
> > C API. If hachoir is too slow, have you tried profiling the code to
> > see what is taking up the most time? Maybe you could just rewrite
> > these parts in C? Have you looked into Cythonizing it? Also, you
> > don't seem to be using numpy to read in the data... (there are some
> > tricks given ASTERIX here, but not insurmountable).
> >
> >
> >
> > I ask the above, just so you don't have to completely
> > rewrite everything. You are correct though that pure python is
> > probably not sufficient. Feel free to ask more questions here.
> >
> >
> >
> > Be Well
> >
> > Anthony
> >
> >
> >
> > On Wed, Jul 11, 2012 at 6:52 AM, <ben...@lf...>
> > wrote:
> >
> > Hi,
> >
> > I'm working with Air Traffic Management and would like to
> > perform checks / compute statistics on ASTERIX data.
> > ASTERIX is an ATM Surveillance Data Binary Messaging Format
> > (http://www.eurocontrol.int/asterix/public/standard_page/overview.html)
> >
> > The data consist of a concatenation of consecutive data
> > blocks.
> > Each data block consists of data category + length +
> > records.
> > Each record is of variable length and consists of several
> > data items (that are well defined for each category).
> > Some data items might be present or not depending on a field
> > specification (bitfield).
> >
> > I started to write a parser using hachoir
> > (https://bitbucket.org/haypo/hachoir/overview) a pure python library.
> > But the parsing was really too slow and taking a lot of
> > memory.
> > That's not really useable.
> >
> > >From what I read, PyTables could really help to manipulate
> > and analyze the data.
> > So I've been thinking about writing a tool (probably in C)
> > to convert my ASTERIX format to HDF5.
> >
> > Before I start, I'd like confirmation that this seems like a
> > suitable application for PyTables.
> > Is there another approach than writing a conversion tool to
> > HDF5?
> >
> > Thanks in advance
> >
> > Benjamin
> >
> >
> >
> >
> > ------------------------------------------------------------
> > ------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's
> > security and
> > threat landscape has changed and how IT managers can
> > respond. Discussions
> > will include endpoint security, mobile security and the
> > latest in malware
> > threats.
> > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
|