|
From: Anthony S. <sc...@gm...> - 2012-07-03 05:59:00
|
Why not read in just the date and ID columns to start with, then do a
numpy.unique() or python set() on theses, then query based on the unique
values? Seems like it might be faster....
Be Well
Anthony
On Mon, Jul 2, 2012 at 5:16 PM, Aquil H. Abdullah
<aqu...@gm...>wrote:
> Hello All,
>
> I have a table that is indexed by two keys, and I would like to search for
> duplicate keys. So here is my naive slow implementation: (code I posted on
> stackoverflow)
>
> import tables
>
>
> h5f = tables.openFile('filename.h5')
>
>
> tbl = h5f.getNode('/data','data_table') # assumes group data and table data_table
>
>
> counter += 0
>
>
> for row in tbl:
>
>
> ts = row['date'] # timestamp (ts) or date
>
>
> uid = row['userID']
>
>
> query = '(date == %d) & (userID == "%s")' % (ts, uid)
>
>
> result = tbl.readWhere(query)
>
>
> if len(result) > 1:
>
>
> # Do something here
>
>
> pass
>
>
> counter += 1
>
>
> if counter % 1000 == 0: print '%d rows processed'
>
>
>
> --
> Aquil H. Abdullah
> aqu...@gm...
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
|