|
From: Todd M. <jm...@st...> - 2005-09-20 17:16:32
|
Hi H, I did some work on this problem based on your previous post but apparently my response never made it to numpy-discussion. In a nutshell, I made numarray 12x faster for a benchmark like your numarray_pb_sample.py by speeding up string comparisons and improving all(). The changes are in numarray CVS but there is no Source Forge release that contains them yet. numarray-1.4.0 is still several weeks away. If you want to try CVS from UNIX/Linux just do: % cvs -d:pserver:ano...@cv...:/cvsroot/numpy login % cvs -z3 -d:pserver:ano...@cv...:/cvsroot/numpy co -P numarray Regards, Todd Humufr wrote: > Hello, > > I have a problem with numarray and especially the function numarray.all. > > I want to compare two files to do this I read the files with a > function readcol2 who can put them in a list or numarray format > (string or numerical). > > I'm doing a comparaison on each line of the file. > If I'm using the array format and the numarray.all function, that take > forever to do the comparaison for 2 big files. If I'm using python > list object, it's very fast. I think there are some problem or at > least some improvement to do. If I understand correctly the goal of > numarray, it has been write to speed up some part of python but here > it slow down a lot. > > An very simple sample to see the effect is at the bottom of this mail. > > Thanks for numarray, I hope to not bother you. My comments are more to > improve numarray than other things. I have been able to find the > problem so no I can avoied it. > > H. > > > > > def > readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'): > > """ > Load ASCII data from fname into an array and return the array. > The data must be regular, same number of values in every row > fname can be a filename or a file handle. > > Input: > > - Fname : the name of the file to read > > Optionnal input: > - comments : a string to indicate the charactor to delimit the > domments. > the default is the matlab character '%'. > - columns : list or tuple ho contains the columns to use. > - delimiter : a string to delimit the columns > > - dep : an integer to indicate from which line you want to begin > > to use the file (useful to avoid the descriptions lines) > > - arraytype : a string to indicate which kind of array you want ot > have: numeric array (numeric) or character array > (numstring) or list (list). By default it's the > > list mode used > > matfile data is not currently supported, but see > Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz > > Example usage: > > x,y = transpose(readcol('test.dat')) # data in two columns > > X = readcol('test.dat') # a matrix of data > > x = readcol('test.dat') # a single column of data > > x = readcol('test.dat,'#') # the character use like a comment > delimiter is '#' > > initial function from pylab (J.Hunter). Change by myself for my > specific need > > """ > from numarray import array,transpose > > fh = file(fname) > > X = [] > numCols = None > nline = 0 > if columns is None: > for line in fh: > nline += 1 > if dep is not None and nline <= dep: continue > line = line[:line.find(comments)].strip() > if not len(line): continue > if arraytype=='numeric': > row = [float(val) for val in line.split(delimiter)] > else: > row = [val.strip() for val in line.split(delimiter)] > thisLen = len(row) > if numCols is not None and thisLen != numCols: > raise ValueError('All rows must have the same number of > columns') > X.append(row) > else: > for line in fh: > nline +=1 > if dep is not None and nline <= dep: continue > line = line[:line.find(comments)].strip() > if not len(line): continue > row = line.split(delimiter) > if arraytype=='numeric': > row = [float(row[i-1]) for i in columns] > elif arraytype=='numstring': > row = [row[i-1].strip() for i in columns] > else: > row = [row[i-1].strip() for i in columns] > thisLen = len(row) > if numCols is not None and thisLen != numCols: > raise ValueError('All rows must have the same number of > columns') > X.append(row) > > if arraytype=='numeric': > X = array(X) > r,c = X.shape > if r==1 or c==1: > X.shape = max([r,c]), > elif arraytype == 'numstring': > import numarray.strings # pb if numeric+pylab > X = numarray.strings.array(X) > r,c = X.shape > if r==1 or c==1: > X.shape = max([r,c]), > return X > > > ------------------------------------------- > files_test_creation.py > > ------------------------------------------- > > f1 = file('test1.dat','w') > for i in range(10000): > f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') > f1.close() > > > f2 = file('test2.dat','w') > for i in range(10000): > f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') > f2.close() > > ------------------------------------------- > numarray_pb_sample.py > > ------------------------------------------- > > import numarray > data1 = > readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' > ',dep=1,arraytype='numstring') > data2 = > readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' > ',dep=1,arraytype='numstring') > > #or in non string array form (same result) > ## data1 = > readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' > ',dep=1,arraytype='numeric') > ## data2 = > readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' > ',dep=1,arraytype='numeric') > > for a_i in range(data1.shape[0]): > for b_i in range(data2.shape[0]): > if numarray.all(data1[a_i,:] == data2[b_i,:]): > print a_i,b_i > > ------------------------------------------- > python_list_sample.py > > ------------------------------------------- > > data1 = > readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' > ',dep=1,arraytype='list') > data2 = > readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' > ',dep=1,arraytype='list') > > for a_i in range(len(data1)): > for b_i in range(len(data2)): > if data1[a_i] == data2[b_i]: > print a_i,b_i > > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. > Download it for free - -and be entered to win a 42" plasma tv or your > very > own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Numpy-discussion mailing list > Num...@li... > https://lists.sourceforge.net/lists/listinfo/numpy-discussion |