Re: [Pytables-users] advice on data representation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> There might be an easier way to do this with numpy dtypes.  In pseudo-
> code:
> 
> np.dtype([(colname, np.int16) for colname in colnames])
> 

Can we use time and enum kinds that way as well?

This makes me think I should probably flatten my table.
Having nested columns is quite natural to group cells under a data item (it's nice to access a field like I030_180/SPEED).
But that's more a naming convention and I can't perform in-kernel searches with nested columns.

> 
> If you want to mark the whole column as valid, you can use a boolean
> attribute on the table itself for each column.  They could be named
> like colname_valid.
> 
> See
> http://pytables.github.com/usersguide/libref.html#tables.Leaf.setAttr
> and http://pytables.github.com/usersguide/libref.html#the-attributeset-
> class for more info.

Good to know.

> 
> This is more for flagging individual cells as valid or not.  For
> integers you need  to pick a values which means invalid (like -999999).
> 
> 

Yes, you are right. It's not a column I want to flag but a group of cells in a row (all cells of a particular data item).
Problem is that if I have a uint8 for a cell, there is no invalid value I can use.
Maybe I could use a "larger type" (uint16 in this case) to be able to pick an invalid value.
Or a bool for this group of cells.

> 
> If you don't want to use a VLArray, then maxlen is probably your best
> option.
> 
> If you want to do something a little more sophisticated, you could
> break you data out into a main table and then a helper VLarray.  Every
> row in the table is matched by the same row in vlarray.  Then when you
> want to get your full data back out, you have to go to the table and
> the vlarray.  This makes things a little more annoying to work with,
> but it does what you want.

Interesting. I'll look more into it.

> 
> Hope this helps.  Feel free to ask more questions!

Yes, it does.
Thanks!

Benjamin

> Be Well
> Anthony
> 
> 
> 
> 
> 
> 
> 	Cheers,
> 
> 
> 
> 	Benjamin
> 
> 
> 
> 
> 
> 	class I030_180_DESC(tables.IsDescription):
> 
> 	    """Calculated Track Velocity (Polar)"""
> 
> 	    SPEED = tables.UInt16Col(pos=0)
> 
> 	    HEADING = tables.UInt16Col(pos=1)
> 
> 
> 
> 	class I030_181_DESC(tables.IsDescription):
> 
> 	    """Calculated Track Velocity (Cartesian)"""
> 
> 	    X = tables.Int16Col(pos=0)
> 
> 	    Y = tables.Int16Col(pos=1)
> 
> 
> 
> 	class I030_340_DESC(tables.IsDescription):
> 
> 	    """Last Measured Mode 3/A"""
> 
> 	    V = tables.EnumCol(tables.Enum({
> 
> 	        "Code validated": 0,
> 
> 	        "Code not validated": 1,
> 
> 	        "uninitialized": 255
> 
> 	        }), "uninitialized",
> 
> 	        base="uint8",
> 
> 	        pos=0)
> 
> 	    G = tables.EnumCol(tables.Enum({
> 
> 	        "Default": 0,
> 
> 	        "Garbled code": 1,
> 
> 	        "uninitialized": 255
> 
> 	        }), "uninitialized",
> 
> 	        base="uint8",
> 
> 	        pos=1)
> 
> 	    L = tables.EnumCol(tables.Enum({
> 
> 	        "MODE 3/A code as derived from the reply of the
> transponder,": 0,
> 
> 	        "Smoothed MODE 3/A code as provided by a local
> tracker": 1
> 
> 	        "uninitialized": 255
> 
> 	        }), "uninitialized",
> 
> 	        base="uint8",
> 
> 	        pos=2)
> 
> 	    sb = tables.UInt8Col(pos=3)
> 
> 	    mode_3_a = tables.UInt16Col(pos=4)
> 
> 
> 
> 	class I030_400_DESC(tables.IsDescription):
> 
> 	    """Callsign"""
> 
> 	    callsign = tables.StringCol(7, pos=0)
> 
> 
> 
> 	class I030_050_DESC(tables.IsDescription):
> 
> 	    """Artas Track Number"""
> 
> 	    AUI = tables.UInt8Col(pos=0)
> 
> 	    unused = tables.UInt8Col(pos=1)
> 
> 	    STN = tables.UInt16Col(pos=2)
> 
> 	    FX = tables.EnumCol(tables.Enum({
> 
> 	        "end of data item": 0,
> 
> 	        "extension into next extent": 1,
> 
> 	        "uninitialized": 255
> 
> 	        }), "uninitialized",
> 
> 	        base="uint8",
> 
> 	        pos=3)
> 
> 
> 
> 	class I030Record(tables.IsDescription):
> 
> 	    """Cat 030 record"""
> 
> 	    ff_timestamp = tables.Time32Col()
> 
> 	    I030_010 = I030_010_DESC()
> 
> 	    I030_015 = I030_015_DESC()
> 
> 	    I030_030 = I030_030_DESC()
> 
> 	    I030_035 = I030_035_DESC()
> 
> 	    I030_040 = I030_040_DESC()
> 
> 	    I030_070 = I030_070_DESC()
> 
> 	    I030_170 = I030_170_DESC()
> 
> 	    I030_100 = I030_100_DESC()
> 
> 	    I030_180 = I030_180_DESC()
> 
> 	    I030_181 = I030_181_DESC()
> 
> 	    I030_060 = I030_060_DESC()
> 
> 	    I030_150 = I030_150_DESC()
> 
> 	    I030_140 = I030_140_DESC()
> 
> 	    I030_340 = I030_340_DESC()
> 
> 	    I030_400 = I030_400_DESC()
> 
> 	...
> 
> 	    I030_210 = I030_210_DESC()
> 
> 	    I030_120 = I030_120_DESC()
> 
> 	    I030_050 = I030_050_DESC()
> 
> 	    I030_270 = I030_270_DESC()
> 
> 	    I030_370 = I030_370_DESC()
> 
> 
> 
> 
> 
> 	Från: Anthony Scopatz [mailto:sc...@gm...]
> 	Skickat: den 12 juli 2012 00:02
> 	Till: Discussion list for PyTables
> 	Ämne: Re: [Pytables-users] advice on using PyTables
> 
> 
> 
> 	Hello Benjamin,
> 
> 
> 
> 	Not knowing to much about the ASTERIX format, other than
> what you said and what is in the links, I would say that this is a good
> fit for HDF5 and PyTables.  PyTables will certainly help you read in
> the data and manipulate it.
> 
> 
> 
> 	However, before you abandon hachoir completely, I will say
> it is a lot easier to write hdf5 files in PyTables than to use the HDF5
> C API.   If hachoir is too slow, have you tried profiling the code to
> see what is taking up the most time?  Maybe you could just rewrite
> these parts in C?  Have you looked into Cythonizing it?  Also, you
> don't seem to be using numpy to read in the data... (there are some
> tricks given ASTERIX here, but not insurmountable).
> 
> 
> 
> 	I ask the above, just so you don't have to completely
> rewrite everything.  You are correct though that pure python is
> probably not sufficient.  Feel free to ask more questions here.
> 
> 
> 
> 	Be Well
> 
> 	Anthony
> 
> 
> 
> 	On Wed, Jul 11, 2012 at 6:52 AM, <ben...@lf...>
> wrote:
> 
> 	Hi,
> 
> 	I'm working with Air Traffic Management and would like to
> perform checks / compute statistics on ASTERIX data.
> 	ASTERIX is an ATM Surveillance Data Binary Messaging Format
> (http://www.eurocontrol.int/asterix/public/standard_page/overview.html)
> 
> 	The data consist of a concatenation of consecutive data
> blocks.
> 	Each data block consists of data category + length +
> records.
> 	Each record is of variable length and consists of several
> data items (that are well defined for each category).
> 	Some data items might be present or not depending on a field
> specification (bitfield).
> 
> 	I started to write a parser using hachoir
> (https://bitbucket.org/haypo/hachoir/overview) a pure python library.
> 	But the parsing was really too slow and taking a lot of
> memory.
> 	That's not really useable.
> 
> 	>From what I read, PyTables could really help to manipulate
> and analyze the data.
> 	So I've been thinking about writing a tool (probably in C)
> to convert my ASTERIX format to HDF5.
> 
> 	Before I start, I'd like confirmation that this seems like a
> suitable application for PyTables.
> 	Is there another approach than writing a conversion tool to
> HDF5?
> 
> 	Thanks in advance
> 
> 	Benjamin
> 
> 
> 
> 
> 	------------------------------------------------------------
> ------------------
> 	Live Security Virtual Conference
> 	Exclusive live event will cover all the ways today's
> security and
> 	threat landscape has changed and how IT managers can
> respond. Discussions
> 	will include endpoint security, mobile security and the
> latest in malware
> 	threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> 	_______________________________________________
> 	Pytables-users mailing list
> 	Pyt...@li...
> 	https://lists.sourceforge.net/lists/listinfo/pytables-users
> 
> 
>