From: Albert S. <fu...@gm...> - 2006-10-23 00:48:04
|
Hello all I'm trying to sort an array with two fields, but I'm getting a result that doesn't seem to make sense. What I tried (first attempt): I have two 2-D arrays. I would like to sort one based on the sort of the other. I managed to do this with argsort. However, the fancy indexing required to get the sorted array using what argsort returned was very slow. I followed this example: http://www.scipy.org/Numpy_Example_List#head-9f8656795227e3c43e849c6c0435eee b32afd722 What I tried (second attempt): I created an array with two fields. I think/hope/expected that sorting the array would sort it on the first field in the dtype and then on the second. This is *much* faster than the fancy indexing approach. Code: import numpy as N print N.__version__ print dt = N.dtype([('aaa', N.float64), ('bbb', N.uint8)]) x = N.empty((2, 2), dt) xa = x[dt.names[0]] xa[:] = [[1.,2], [3.,4.]] xb = x[dt.names[1]] xb[:] = [[1, 2], [3, 4]] print 'before sort:' print x print x.sort(kind='quicksort') # other kinds not supported print 'after sort:' print x Output on my system: 1.0.dev3376 before sort: [[(1.0, 1) (2.0, 2)] [(3.0, 3) (4.0, 4)]] after sort: [[(2.0, 2) (1.0, 1)] [(4.0, 4) (3.0, 3)]] The already sorted array has been unsorted in some way. Any thoughts? Thanks! Regards, Albert |
From: Charles R H. <cha...@gm...> - 2006-10-23 03:22:24
|
On 10/22/06, Albert Strasheim <fu...@gm...> wrote: > > Hello all > > I'm trying to sort an array with two fields, but I'm getting a result that > doesn't seem to make sense. > > What I tried (first attempt): I have two 2-D arrays. I would like to sort > one based on the sort of the other. I managed to do this with argsort. > However, the fancy indexing required to get the sorted array using what > argsort returned was very slow. I followed this example: It is certainly awkward. I am going to add a function to do this after 1.0comes out. Sounds like you need it now. http://www.scipy.org/Numpy_Example_List#head-9f8656795227e3c43e849c6c0435eee > b32afd722 > > What I tried (second attempt): I created an array with two fields. I > think/hope/expected that sorting the array would sort it on the first > field > in the dtype and then on the second. This is *much* faster than the fancy > indexing approach. I believe it sorts on the two fields together as one big binary blob. <snip> Output on my system: > > 1.0.dev3376 > > before sort: > [[(1.0, 1) (2.0, 2)] > [(3.0, 3) (4.0, 4)]] > > after sort: > [[(2.0, 2) (1.0, 1)] > [(4.0, 4) (3.0, 3)]] > > The already sorted array has been unsorted in some way. Any thoughts? I suspect something to do with the binary representation of the floats. Probably depends on big/little endian also. The floats 1.0, 2.0, and 4.0all have zero mantissas while 3.0 has a one. The sort behaves differently if uint8 is used for both fields. In [3]: dt = dtype([('aaa', uint8), ('bbb', uint8)]) In [4]: x = empty((2, 2), dt) In [5]: xa = x[dt.names[0]] In [6]: xa[:] = [[1,2], [3,4]] In [7]: xb = x[dt.names[1]] In [8]: xb[:] = [[1,2], [3,4]] In [9]: x.sort() In [11]: x Out[11]: array([[(1, 1), (2, 2)], [(3, 3), (4, 4)]], dtype=[('aaa', '|u1'), ('bbb', '|u1')]) Chuck |
From: Travis O. <oli...@ie...> - 2006-10-23 21:29:17
|
Albert Strasheim wrote: > Hello all > > I'm trying to sort an array with two fields, but I'm getting a result that > doesn't seem to make sense. > > What I tried (first attempt): I have two 2-D arrays. I would like to sort > one based on the sort of the other. I managed to do this with argsort. > However, the fancy indexing required to get the sorted array using what > argsort returned was very slow. I followed this example: > > http://www.scipy.org/Numpy_Example_List#head-9f8656795227e3c43e849c6c0435eee > b32afd722 > > What I tried (second attempt): I created an array with two fields. I > think/hope/expected that sorting the array would sort it on the first field > in the dtype and then on the second. This is *much* faster than the fancy > indexing approach. > That kind of sorting is what lexsort does (although with indexes) and is more complicated than what the sort routine does in NumPy. The sorting routines in NumPy use the output of the comparison operator for the type to compute the result. An array with fields is of type void. Right now, the VOID_compare routine is equivalent to the STRING_compare routine (i.e. raw bytes are compared). I doubt this will do what you want in most cases. It would be possible to adapt this compare routine when fields are present and so something like compare the first field first and the second field only if the first is equal. But, this would require a bit of work and is probably best left for 1.0.1 or later. -Travis |
From: Travis O. <oli...@ie...> - 2006-10-23 22:03:02
|
Albert Strasheim wrote: > Hello all > > I'm trying to sort an array with two fields, but I'm getting a result that > doesn't seem to make sense. > > What I tried (first attempt): I have two 2-D arrays. I would like to sort > one based on the sort of the other. I managed to do this with argsort. > However, the fancy indexing required to get the sorted array using what > argsort returned was very slow. I followed this example: > > http://www.scipy.org/Numpy_Example_List#head-9f8656795227e3c43e849c6c0435eee > b32afd722 > > What I tried (second attempt): I created an array with two fields. I > think/hope/expected that sorting the array would sort it on the first field > in the dtype and then on the second. This is *much* faster than the fancy > indexing approach. > O.K. I lied, I realized that my comments in the VOID_compare code were silly (about being unable to define > or <). It makes sense to just define them based on the first field, then the second field (if the first field is equal), and so forth. Obviously this is not the only way one could define sorting (any field could be used as "the first field", and so forth. But, is is a fairly obvious default to use the first field. It turns out, it was not so difficult to implement this and is now in SVN. So, the VOID_compare now does something a little more intelligent when fields are defined which means that record arrays can now be lexicographically sorted more easily than using lexsort and take (as long as the fields are ordered according to how you want the sort to proceed). -Travis |
From: Albert S. <fu...@gm...> - 2006-10-23 23:50:46
|
Hello all > -----Original Message----- > From: num...@li... [mailto:numpy- > dis...@li...] On Behalf Of Travis Oliphant > Sent: Tuesday, October 24, 2006 12:04 AM > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Strange results when sorting array > withfields > <snip> > It turns out, it was not so difficult to implement this and is now in SVN. > > So, the VOID_compare now does something a little more intelligent when > fields are defined which means that record arrays can now be > lexicographically sorted more easily than using lexsort and take (as > long as the fields are ordered according to how you want the sort to > proceed). Thanks very much for this! I played around and it works like a charm. Cheers, Albert |