From: Anthony S. <sc...@gm...> - 2012-07-03 05:59:00
|
Why not read in just the date and ID columns to start with, then do a numpy.unique() or python set() on theses, then query based on the unique values? Seems like it might be faster.... Be Well Anthony On Mon, Jul 2, 2012 at 5:16 PM, Aquil H. Abdullah <aqu...@gm...>wrote: > Hello All, > > I have a table that is indexed by two keys, and I would like to search for > duplicate keys. So here is my naive slow implementation: (code I posted on > stackoverflow) > > import tables > > > h5f = tables.openFile('filename.h5') > > > tbl = h5f.getNode('/data','data_table') # assumes group data and table data_table > > > counter += 0 > > > for row in tbl: > > > ts = row['date'] # timestamp (ts) or date > > > uid = row['userID'] > > > query = '(date == %d) & (userID == "%s")' % (ts, uid) > > > result = tbl.readWhere(query) > > > if len(result) > 1: > > > # Do something here > > > pass > > > counter += 1 > > > if counter % 1000 == 0: print '%d rows processed' > > > > -- > Aquil H. Abdullah > aqu...@gm... > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |