From: Brent P. <bpe...@gm...> - 2006-11-09 19:02:16
|
hi, i have a recarray of > 60K records and i'm wondering if there's a numpy/vectorized way to the following. get a new array where there will be unique column0 + column1 rows with the row that remains being chosen because it has the highest value in the last column. so in the paste below, there are 4 'AT1G01070', 'AT1G11450' rows, but i only want to keep the row: ('AT1G01070', 'AT1G11450', 78.660003662109375, 717, 140, 5, 1129L, 1838L, 1098L, 1808L, 1e-168, 592.0) because 592.0 is the highest value. i can do this using a hash and looping, but i'm guessing there's a more efficient way. any tips? thanks. -brent [ ('AT1G01030', 'AT1G13260', 70.339996337890625, 263, 75, 1, 835L, 1094L, 778L, 1040L, 5.9999999999999998e-38, 158.0) ('AT1G01040', 'AT1G01040', 100.0, 8019, 0, 0, 1L, 8019L, 1L, 8019L, 0.0, 12620.0) ('AT1G01050', 'AT1G01050', 100.0, 1968, 0, 0, 1L, 1968L, 1L, 1968L, 0.0, 3090.0) ('AT1G01060', 'AT1G01060', 100.0, 4175, 0, 0, 1L, 4175L, 1L, 4175L, 0.0, 6355.0) ('AT1G01070', 'AT1G01070', 100.0, 2192, 0, 0, 1L, 2192L, 1L, 2192L, 0.0, 3426.0) ('AT1G01070', 'AT1G11450', 78.660003662109375, 717, 140, 5, 1129L, 1838L, 1098L, 1808L, 1e-168, 592.0) ('AT1G01070', 'AT1G11450', 79.870002746582031, 303, 51, 2, 1L, 293L, 1L, 303L, 1.9999999999999999e-67 , 256.0) ('AT1G01070', 'AT1G11450', 88.400001525878906, 181, 21, 0, 587L, 767L, 724L, 904L, 6e-57, 221.0) ('AT1G01070', 'AT1G11450', 83.209999084472656, 131, 19, 1, 1875L, 2002L, 1878L, 2008L, 1.9999999999999999e-28 , 126.0) ('AT1G01070', 'AT1G11460', 82.919998168945312, 480, 77, 3, 1129L, 1607L, 1216L, 1691L, 6.9999999999999999e-132, 470.0)] |