From: Fernando P. <Fer...@co...> - 2004-07-01 20:27:25
|
Chris Barker wrote: > Hi all, > > I'm looking for a way to read data from ascii text files quickly. I've > found that using the standard python idioms like: > > data = array((M,N),Float) > for in range(N): > data.append(map(float,file.readline().split())) > > Can be pretty slow. What I'd like is something like Matlab's fscanf: > > data = fscanf(file, "%g", [M,N] ) > > I may have the syntax a little wrong, but the gist is there. What Matlab > does keep recycling the format string until the desired number of > elements have been read. > > It is quite flexible, and ends up being pretty fast. > > Has anyone written something like this for Numeric (or numarray, but I'd > prefer Numeric at this point) ? > > I was surprised not to find something like this in SciPy, maybe I didn't > look hard enough. scipy.io.read_array? I haven't timed it, because it's been 'fast enough' for my needs. For reading binary data files, I have this little utility which is basically a wrapper around Numeric.fromstring (N below is Numeric imported 'as N'). Note that it can read binary .gz files directly, a _huge_ gain for very sparse files representing 3d arrays (I can read a 400k gz file which blows up to ~60MB when unzipped in no time at all, while reading the unzipped file is very slow): def read_bin(fname,dims,typecode,recast_type=None,offset=0,verbose=0): """Read in a binary data file. Does NOT check for endianness issues. Inputs: fname - can be .gz dims (nx1,nx2,...,nxd) typecode recast_type offset=0: # of bytes to skip in file *from the beginning* before data starts """ # config parameters item_size = N.zeros(1,typecode).itemsize() # size in bytes data_size = N.product(N.array(dims))*item_size # read in data if fname.endswith('.gz'): data_file = gzip.open(fname) else: data_file = file(fname) data_file.seek(offset) data = N.fromstring(data_file.read(data_size),typecode) data_file.close() data.shape = dims if verbose: #print 'Read',data_size/item_size,'data points. Shape:',dims print 'Read',N.size(data),'data points. Shape:',dims if recast_type is not None: data = data.astype(recast_type) return data HTH, f |