From: sreeaurovindh v. <sre...@gm...> - 2012-03-22 16:02:48
|
Hi, If I have three columns in a table and if i wish to sort based on one field and then on the other what would be the recommended method.I would be sorting atleast 75,00,000 records at a time. ie I would like to use something equivalent the following sql query. Select * from sample.table order by query desc,keyword asc. How should i do it.. Thanks Sree aurovindh |
From: Ümit S. <uem...@gm...> - 2012-03-22 16:15:07
|
AFAIK there is no sort functionality built into PyTables. I think there are 4 ways to do it: 1.) load all 7.5 million records and sort it in memory (if it fits into the memory) 2.) implement your own external sorting algorithm (http://en.wikipedia.org/wiki/External_sorting) using pytables iterator or by slicing through your table in chunks 3.) create a vector of indices pre-sorted by your criteria and store it in your hdf5 structure. Then use this vector to retrieve the values with the correct sorting. 4.) if you know that you always want to access the data with this sorting, then you can also store the values with the appropriate sorting in the table cheers Ümit On Thu, Mar 22, 2012 at 5:02 PM, sreeaurovindh viswanathan <sre...@gm...> wrote: > Hi, > > If I have three columns in a table and if i wish to sort based on one > field and then on the other what would be the recommended method.I > would be sorting atleast 75,00,000 records at a time. > > ie I would like to use something equivalent the following sql query. > > Select * from sample.table order by query desc,keyword asc. > > How should i do it.. > > Thanks > Sree aurovindh > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Francesc A. <fa...@gm...> - 2012-03-22 16:39:29
|
On 3/22/12 11:02 AM, sreeaurovindh viswanathan wrote: > Hi, > > If I have three columns in a table and if i wish to sort based on one > field and then on the other what would be the recommended method.I > would be sorting atleast 75,00,000 records at a time. > > ie I would like to use something equivalent the following sql query. > > Select * from sample.table order by query desc,keyword asc. Provided that you have created a CSI index on this field, there are a couple of ways for doing this: 1) Table.itersorted(): retrieves the table records in sorted order 2) Table.readSorted(): retrieves the complete sorted table as a monolithic structured array Both methods follow ascending order by default. Choose a step=-1 for choosing a descending order. -- Francesc Alted |
From: sreeaurovindh v. <sre...@gm...> - 2012-03-22 16:45:24
|
Thanks Francesc Alted for your advice. Regards Sree aurovindh V On Thu, Mar 22, 2012 at 10:09 PM, Francesc Alted <fa...@gm...> wrote: > On 3/22/12 11:02 AM, sreeaurovindh viswanathan wrote: >> Hi, >> >> If I have three columns in a table and if i wish to sort based on one >> field and then on the other what would be the recommended method.I >> would be sorting atleast 75,00,000 records at a time. >> >> ie I would like to use something equivalent the following sql query. >> >> Select * from sample.table order by query desc,keyword asc. > Provided that you have created a CSI index on this field, there are a > couple of ways for doing this: > > 1) Table.itersorted(): retrieves the table records in sorted order > > 2) Table.readSorted(): retrieves the complete sorted table as a > monolithic structured array > > Both methods follow ascending order by default. Choose a step=-1 for > choosing a descending order. > > -- Francesc Alted > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Ümit S. <uem...@gm...> - 2012-03-22 16:45:33
|
I completely forgot about the CSI index. That's of course much easier than what I suggested ;-) Am 22.03.2012 17:39 schrieb "Francesc Alted" <fa...@gm...>: > On 3/22/12 11:02 AM, sreeaurovindh viswanathan wrote: > > Hi, > > > > If I have three columns in a table and if i wish to sort based on one > > field and then on the other what would be the recommended method.I > > would be sorting atleast 75,00,000 records at a time. > > > > ie I would like to use something equivalent the following sql query. > > > > Select * from sample.table order by query desc,keyword asc. > Provided that you have created a CSI index on this field, there are a > couple of ways for doing this: > > 1) Table.itersorted(): retrieves the table records in sorted order > > 2) Table.readSorted(): retrieves the complete sorted table as a > monolithic structured array > > Both methods follow ascending order by default. Choose a step=-1 for > choosing a descending order. > > -- Francesc Alted > > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: sreeaurovindh v. <sre...@gm...> - 2012-03-22 17:48:58
|
But.. Can i get sort one column by descending and the other ascending. say if i have two columns and first i would like to sort the one in ascending and then sort the second column based on the search from the first. I mean I i have 1 5 2 6 1 8 2 9 Could i get an output as 1 5 1 8 2 6 2 9 Sorry to restart the thread. Thanks Sree aurovindhV On Thu, Mar 22, 2012 at 10:09 PM, Francesc Alted <fa...@gm...> wrote: > On 3/22/12 11:02 AM, sreeaurovindh viswanathan wrote: >> Hi, >> >> If I have three columns in a table and if i wish to sort based on one >> field and then on the other what would be the recommended method.I >> would be sorting atleast 75,00,000 records at a time. >> >> ie I would like to use something equivalent the following sql query. >> >> Select * from sample.table order by query desc,keyword asc. > Provided that you have created a CSI index on this field, there are a > couple of ways for doing this: > > 1) Table.itersorted(): retrieves the table records in sorted order > > 2) Table.readSorted(): retrieves the complete sorted table as a > monolithic structured array > > Both methods follow ascending order by default. Choose a step=-1 for > choosing a descending order. > > -- Francesc Alted > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Francesc A. <fa...@gm...> - 2012-03-22 18:59:20
|
On 3/22/12 12:48 PM, sreeaurovindh viswanathan wrote: > But.. Can i get sort one column by descending and the other ascending. > say > if i have two columns and first i would like to sort the one in > ascending and then sort the second column based on the search from the > first. > > > I mean I i have > > 1 5 > 2 6 > 1 8 > 2 9 > > Could i get an output as > > 1 5 > 1 8 > 2 6 > 2 9 No, this is not supported by PyTables. But hey, you can always make use of the sorted iterator, and the additonal sorting by yourselves. In your example, let's suppose that column 0 is named 'f0' and column 1 is named 'f1'. Then, the next loop: prevval = None gf1 = [] for r in t.itersorted('f0'): if r['f0'] != prevval: if gf1: gf1.sort() print prevval, gf1[::-1] # reverse sorted prevval = r['f0'] gf1 = [] gf1.append(r['f1']) if gf1: gf1.sort() print prevval, gf1[::-1] # reverse sorted will print the next values: f0-val0 [decreasing list of f1 values] f0-val1 [decreasing list of f1 values] ... f0-valN [decreasing list of f1 values] Hope this helps, -- Francesc Alted |
From: Francesc A. <fa...@gm...> - 2012-03-22 22:46:00
Attachments:
sortby.py
groupby.py
|
On 3/22/12 1:59 PM, Francesc Alted wrote: > On 3/22/12 12:48 PM, sreeaurovindh viswanathan wrote: >> But.. Can i get sort one column by descending and the other ascending. >> say >> if i have two columns and first i would like to sort the one in >> ascending and then sort the second column based on the search from the >> first. >> >> >> I mean I i have >> >> 1 5 >> 2 6 >> 1 8 >> 2 9 >> >> Could i get an output as >> >> 1 5 >> 1 8 >> 2 6 >> 2 9 > No, this is not supported by PyTables. > > But hey, you can always make use of the sorted iterator, and the > additonal sorting by yourselves. In your example, let's suppose that > column 0 is named 'f0' and column 1 is named 'f1'. Then, the next loop: > > prevval = None > gf1 = [] > for r in t.itersorted('f0'): > if r['f0'] != prevval: > if gf1: > gf1.sort() > print prevval, gf1[::-1] # reverse sorted > prevval = r['f0'] > gf1 = [] > gf1.append(r['f1']) > if gf1: > gf1.sort() > print prevval, gf1[::-1] # reverse sorted Hmm, I just realized that there it is another, equivalent code that solves the same problem: def field_selector(row): return row['f0'] for field, rows_grouped_by_field in itertools.groupby(t.itersorted('f0'), field_selector): group = [ r['f1'] for r in rows_grouped_by_field ] group.sort() print '%s -> %s' % (field, group[::-1]) The performance of both is similar, so use whatever you find more useful. For the record, I'm attaching a couple of self-contained examples that exercises the different approaches. -- Francesc Alted |