From: Mathew Y. <my...@jp...> - 2006-08-29 22:47:03
|
My head is about to explode. I have an M by N array of floats. Associated with the columns are character labels ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates are contiguous I want to replace the 2 'b' columns with the sum of the 2 columns. Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns. The resulting array still has M rows but less than N columns. Anyone? Could be any harder than Sudoku. Mathew |
From: Keith G. <kwg...@gm...> - 2006-08-29 23:09:36
|
On 8/29/06, Mathew Yeates <my...@jp...> wrote: > I have an M by N array of floats. Associated with the columns are > character labels > ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates > are contiguous > > I want to replace the 2 'b' columns with the sum of the 2 columns. > Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns. Make a cumsum of the array. Find the index of the last 'a', last 'b', etc and make the reduced array from that. Then take the diff of the columns. I know that's vague, but so is my understanding of python/numpy. Or even more vague: make a function that does what you want. |
From: Charles R H. <cha...@gm...> - 2006-08-29 23:26:36
|
On 8/29/06, Keith Goodman <kwg...@gm...> wrote: > > On 8/29/06, Mathew Yeates <my...@jp...> wrote: > > > I have an M by N array of floats. Associated with the columns are > > character labels > > ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates > > are contiguous > > > > I want to replace the 2 'b' columns with the sum of the 2 columns. > > Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns. > > Make a cumsum of the array. Find the index of the last 'a', last 'b', > etc and make the reduced array from that. Then take the diff of the > columns. > > I know that's vague, but so is my understanding of python/numpy. > > Or even more vague: make a function that does what you want. Or you could use searchsorted on the labels to get a sequence of ranges. What you have is a sort of binning applied to columns instead of values in a vector. Or, if the overhead isn't to much, use a dictionary of with (keys: array) entries. Index thru the columns adding keys, when the key is new insert a column copy, when it is already present add the new column to the old one. Chuck |
From: Sven S. <sve...@gm...> - 2006-08-30 12:32:03
|
Mathew Yeates schrieb: > My head is about to explode. > > I have an M by N array of floats. Associated with the columns are > character labels > ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates > are contiguous > > I want to replace the 2 'b' columns with the sum of the 2 columns. > Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns. > > The resulting array still has M rows but less than N columns. Anyone? > Could be any harder than Sudoku. > Hi, I don't have time for this ;-) , but I learnt something useful along the way... import numpy as n m = n.ones([2,6]) a = ['b', 'c', 'c', 'd', 'd', 'd'] startindices = set([a.index(x) for x in a]) out = n.empty([m.shape[0], 0]) for i in startindices: temp = n.mat(m[:, i : i + a.count(a[i])]).sum(axis = 1) out = n.hstack([out, temp]) print out Not sure if axis = 1 is needed, but until the defaults have settled a bit it can't hurt. You need python 2.4 for the built-in <set>, and <out> will be a numpy matrix, use <asarray> if you don't like that. But here it's really nice to work with matrices, because otherwise .sum() will give you a 1-d array sometimes, and that will suddenly look like a row to <hstack> (instead of a nice column vector) and wouldn't work -- that's why matrices are so great and everybody should be using them ;-) hth, sven |
From: Bill B. <wb...@gm...> - 2006-08-30 21:18:38
|
On 8/30/06, Sven Schreiber <sve...@gm...> wrote: > Mathew Yeates schrieb: > will be a numpy matrix, use <asarray> if you don't like that. But here > it's really nice to work with matrices, because otherwise .sum() will > give you a 1-d array sometimes, and that will suddenly look like a row > to <hstack> (instead of a nice column vector) and wouldn't work -- > that's why matrices are so great and everybody should be using them ;-) column_stack would work perfectly in place of hstack there if it only didn't have the silly behavior of transposing arguments that already are 2-d. For reminders, here's the replacement implementation of column_stack I proposed on July 21: def column_stack(tup): def transpose_1d(array): if array.ndim<2: return _nx.transpose(atleast_2d(array)) else: return array arrays = map(transpose_1d,map(atleast_1d,tup)) return _nx.concatenate(arrays,1) This was in a big ticket I submitted about overhauling r_,c_,etc, which was largely ignored. Maybe I should resubmit this by itself... --bb |
From: Stefan v. d. W. <st...@su...> - 2006-08-30 14:52:01
Attachments:
arsum.py
|
On Tue, Aug 29, 2006 at 03:46:45PM -0700, Mathew Yeates wrote: > My head is about to explode. >=20 > I have an M by N array of floats. Associated with the columns are=20 > character labels > ['a','b','b','c','d','e','e','e'] note: already sorted so duplicates=20 > are contiguous >=20 > I want to replace the 2 'b' columns with the sum of the 2 columns.=20 > Similarly, replace the 3 'e' columns with the sum of the 3 'e' columns. >=20 > The resulting array still has M rows but less than N columns. Anyone?=20 > Could be any harder than Sudoku. I attach one possible solution (allowing for the same column name occurring in different places, i.e. ['a','b','b','a']). I'd be glad for any suggestions on how to clean up the code. Regards St=E9fan |