Re: [Pytables-users] Searching for duplicate keys...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Why not read in just the date and ID columns to start with, then do a
numpy.unique() or python set() on theses, then query based on the unique
values?  Seems like it might be faster....

Be Well
Anthony

On Mon, Jul 2, 2012 at 5:16 PM, Aquil H. Abdullah
<aqu...@gm...>wrote:

> Hello All,
>
> I have a table that is indexed by two keys, and I would like to search for
> duplicate keys.  So here is my naive slow implementation: (code I posted on
> stackoverflow)
>
> import tables
>
>
> h5f = tables.openFile('filename.h5')
>
>
> tbl = h5f.getNode('/data','data_table') # assumes group data and table data_table
>
>
> counter += 0
>
>
> for row in tbl:
>
>
>     ts = row['date'] # timestamp (ts) or date
>
>
>     uid = row['userID']
>
>
>     query = '(date == %d) & (userID == "%s")' % (ts, uid)
>
>
>     result = tbl.readWhere(query)
>
>
>     if len(result) > 1:
>
>
>         # Do something here
>
>
>         pass
>
>
>     counter += 1
>
>
>     if counter % 1000 == 0: print '%d rows processed'
>
>
>
> --
> Aquil H. Abdullah
> aqu...@gm...
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>