|
From: Humufr <hu...@ya...> - 2005-09-20 18:39:50
|
Thank you very much. I saw no answer before. It's why I reduce a lot the sample :) I'll try it now Todd Miller wrote: > Hi H, > > I did some work on this problem based on your previous post but > apparently my response never made it to numpy-discussion. In a > nutshell, I made numarray 12x faster for a benchmark like your > numarray_pb_sample.py by speeding up string comparisons and improving > all(). The changes are in numarray CVS but there is no Source Forge > release that contains them yet. numarray-1.4.0 is still several > weeks away. If you want to try CVS from UNIX/Linux just do: > > % cvs -d:pserver:ano...@cv...:/cvsroot/numpy login > % cvs -z3 -d:pserver:ano...@cv...:/cvsroot/numpy co > -P numarray > > Regards, > Todd > > Humufr wrote: > >> Hello, >> >> I have a problem with numarray and especially the function numarray.all. >> >> I want to compare two files to do this I read the files with a >> function readcol2 who can put them in a list or numarray format >> (string or numerical). >> >> I'm doing a comparaison on each line of the file. >> If I'm using the array format and the numarray.all function, that >> take forever to do the comparaison for 2 big files. If I'm using >> python list object, it's very fast. I think there are some problem or >> at least some improvement to do. If I understand correctly the goal >> of numarray, it has been write to speed up some part of python but >> here it slow down a lot. >> >> An very simple sample to see the effect is at the bottom of this mail. >> >> Thanks for numarray, I hope to not bother you. My comments are more >> to improve numarray than other things. I have been able to find the >> problem so no I can avoied it. >> >> H. >> >> >> >> >> def >> readcol(fname,comments='%',columns=None,delimiter=None,dep=0,arraytype='list'): >> >> """ >> Load ASCII data from fname into an array and return the array. >> The data must be regular, same number of values in every row >> fname can be a filename or a file handle. >> >> Input: >> >> - Fname : the name of the file to read >> >> Optionnal input: >> - comments : a string to indicate the charactor to delimit the >> domments. >> the default is the matlab character '%'. >> - columns : list or tuple ho contains the columns to use. >> - delimiter : a string to delimit the columns >> >> - dep : an integer to indicate from which line you want to begin >> >> to use the file (useful to avoid the descriptions lines) >> >> - arraytype : a string to indicate which kind of array you want ot >> have: numeric array (numeric) or character array >> (numstring) or list (list). By default it's the >> >> list mode used >> matfile data is not currently supported, but see >> Nigel Wade's matfile ftp://ion.le.ac.uk/matfile/matfile.tar.gz >> >> Example usage: >> >> x,y = transpose(readcol('test.dat')) # data in two columns >> >> X = readcol('test.dat') # a matrix of data >> >> x = readcol('test.dat') # a single column of data >> >> x = readcol('test.dat,'#') # the character use like a comment >> delimiter is '#' >> >> initial function from pylab (J.Hunter). Change by myself for my >> specific need >> >> """ >> from numarray import array,transpose >> >> fh = file(fname) >> >> X = [] >> numCols = None >> nline = 0 >> if columns is None: >> for line in fh: >> nline += 1 >> if dep is not None and nline <= dep: continue >> line = line[:line.find(comments)].strip() >> if not len(line): continue >> if arraytype=='numeric': >> row = [float(val) for val in line.split(delimiter)] >> else: >> row = [val.strip() for val in line.split(delimiter)] >> thisLen = len(row) >> if numCols is not None and thisLen != numCols: >> raise ValueError('All rows must have the same number >> of columns') >> X.append(row) >> else: >> for line in fh: >> nline +=1 >> if dep is not None and nline <= dep: continue >> line = line[:line.find(comments)].strip() >> if not len(line): continue >> row = line.split(delimiter) >> if arraytype=='numeric': >> row = [float(row[i-1]) for i in columns] >> elif arraytype=='numstring': >> row = [row[i-1].strip() for i in columns] >> else: >> row = [row[i-1].strip() for i in columns] >> thisLen = len(row) >> if numCols is not None and thisLen != numCols: >> raise ValueError('All rows must have the same number >> of columns') >> X.append(row) >> >> if arraytype=='numeric': >> X = array(X) >> r,c = X.shape >> if r==1 or c==1: >> X.shape = max([r,c]), >> elif arraytype == 'numstring': >> import numarray.strings # pb if numeric+pylab >> X = numarray.strings.array(X) >> r,c = X.shape >> if r==1 or c==1: >> X.shape = max([r,c]), >> return X >> >> >> ------------------------------------------- >> files_test_creation.py >> >> ------------------------------------------- >> >> f1 = file('test1.dat','w') >> for i in range(10000): >> f1.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') >> f1.close() >> >> >> f2 = file('test2.dat','w') >> for i in range(10000): >> f2.write(str(i)+' '+str(i+1)+' '+str(i+2)+'\n') >> f2.close() >> >> ------------------------------------------- >> numarray_pb_sample.py >> >> ------------------------------------------- >> >> import numarray >> data1 = >> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' >> ',dep=1,arraytype='numstring') >> data2 = >> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' >> ',dep=1,arraytype='numstring') >> >> #or in non string array form (same result) >> ## data1 = >> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' >> ',dep=1,arraytype='numeric') >> ## data2 = >> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' >> ',dep=1,arraytype='numeric') >> >> for a_i in range(data1.shape[0]): >> for b_i in range(data2.shape[0]): >> if numarray.all(data1[a_i,:] == data2[b_i,:]): >> print a_i,b_i >> >> ------------------------------------------- >> python_list_sample.py >> >> ------------------------------------------- >> >> data1 = >> readcol2.readcol('test1.dat',columns=[1,2,3],comments='#',delimiter=' >> ',dep=1,arraytype='list') >> data2 = >> readcol2.readcol('test2.dat',columns=[1,2,3],comments='#',delimiter=' >> ',dep=1,arraytype='list') >> >> for a_i in range(len(data1)): >> for b_i in range(len(data2)): >> if data1[a_i] == data2[b_i]: >> print a_i,b_i >> >> >> >> >> >> >> ------------------------------------------------------- >> SF.Net email is sponsored by: >> Tame your development challenges with Apache's Geronimo App Server. >> Download it for free - -and be entered to win a 42" plasma tv or your >> very >> own Sony(tm)PSP. Click here to play: >> http://sourceforge.net/geronimo.php >> _______________________________________________ >> Numpy-discussion mailing list >> Num...@li... >> https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > |