From: Francesc A. <fa...@ca...> - 2007-08-28 11:33:44
|
A Monday 27 August 2007, escrigu=C3=A9reu: > > Yeah, that's a bit strange. If 're-adding' shuffle is actually > > improving your search times, then perhaps it is not the actual > > problem. Now, I think that the main issue should be the length of > > the chunksize of 'new' files. Can you run the 'h5ls -v' utility > > that comes with HDF5 and send the 'Chunks:' fields of the output > > for > > the '/results/oef1/quad4' dataset for both 'old' and 'new' files? > > $ h5ls -v old.h5/results/oef1/quad4 > Opened "old.h5" with sec2 driver. > results/oef1/quad4 Dataset {1018/Inf, 17402/Inf, 3/3} > Location: 0:1:0:28034319 > Links: 1 > Modified: 2007-01-04 15:45:37 EST > Chunks: {119, 100, 3} 142800 bytes > Storage: 212582832 logical bytes, 196302976 allocated bytes, > 108.29% utilization > Filter-0: deflate-1 OPT {6} > Type: IEEE 32-bit big-endian float > $ h5ls -v new.h5/results/oef1/quad4 > Opened "new.h5" with sec2 driver. > results/oef1/quad4 Dataset {1022/Inf, 17759/17759, 3/3} > Attribute: CLASS scalar > Type: 7-byte null-terminated ASCII string > Data: "EARRAY" > Attribute: EXTDIM scalar > Type: native int > Data: 0 > Attribute: FLAVOR scalar > Type: 9-byte null-terminated ASCII string > Data: "numarray" > Attribute: VERSION scalar > Type: 4-byte null-terminated ASCII string > Data: "1.3" > Attribute: TITLE scalar > Type: 1-byte null-terminated ASCII string > Data: "" > Location: 0:1:0:1126352 > Links: 1 > Modified: 2007-08-21 08:08:41 EDT > Chunks: {1, 17759, 3} 213108 bytes > Storage: 217796376 logical bytes, 183047210 allocated bytes, > 118.98% utilization > Filter-0: shuffle-2 OPT {4} > Filter-1: deflate-1 OPT {6} > Type: native float > > > Also, it would be nice to know the way you are doing the search > > process (sequential or sparse access?); if you can send the search > > algorithm that would be nice. The only thing that comes to my mind > > is that, if your search process is based on a sparse access > > pattern, having a large chunksize can highly penalize the times;=20 > > in this case, using PyTables 2.0, which creates far smaller > > chunksizes by default, will help. If you are using sequential > > access, then I don't really understand what can be the cause of the > > slowdown. > > Well, the related arrays are stored in the same order. Then I use a > simple binary search of an 'index' to determine the offset to find > the related data. For example, say that in a mesh, the index is a > rank-1 array of integer identifiers, and the associated space > coordinates are stored as a rank-2 array, where the second dimension > is like a tuple of (x, y, z). Aha, so you are doing a binary search in an 'index' first; then it is=20 almost sure that most of the time is spent in performing the look-up in=20 this rank-1 array. As you are doing binary search, and the minimum=20 amount of I/O chunk in HDF5 is precisely the chunksize, having small=20 chunksizes will favor the performance. By looking at your finding=20 times, my guess is that your 'index' array is on-disk, and the sparse=20 access (i.e. the binary search) to it is your bottleneck.=20 Unfortunately, you are not sending the chunksizes for the 1-rank index=20 array, but most probably the chunksize for 'old' files must be rather=20 small compared with the 'new' arrays. In this case, and as I said in=20 other message, creating the 'new' files with PyTables 2.0 will help=20 because it uses far smaller chunksizes by default. Also, PyTables 2.0=20 will let you to set even smaller chunksizes than the default (see the=20 new 'chunkshape' parameter in the create*Array factories), allowing a=20 better fine-tuning of query times. As an aside and just in case you are not aware of that: PyTables Pro=20 allows to index columns of tables and then doing binary searches in a=20 very quick way. So, if you want to get maximum performance in your=20 lookups, one possibility is to declare a Table with a single column=20 (the indexes), index it, and then do the query: offset =3D [r['index'] for r in table.where('index =3D=3D 154092')][0] Of course, all the parameters in the Pro indexing engine has already=20 been fine-tuned so as to get pretty optimal query times (see [1] for a=20 detailed description on how Pro indexes work and their performance). [snip] > The new ptrepack seems to work OK. I did observe that if I used > --complevel and --shuffle at the same time, shuffle was always set to > "off" no matter the value of --shuffle. This is a bug in ptrepack. The attached patch should solve the problem. > Unfortunately, I can't test=20 > the effect of the new files: > > $ python test_finder.py > Testing file /cluster/stress/p20loads/gac/lev_0_test.hdf5 > HDF5-DIAG: Error detected in HDF5 library version: 1.6.5 thread 0.=20 > Back trace follows. > #000: H5A.c line 457 in H5Aopen_name(): attribute not found > major(18): Attribute layer > minor(05): Bad value > #001: H5A.c line 404 in H5A_get_index(): attribute not found > major(18): Attribute layer > minor(48): Object not found > Segmentation fault > > So I tried with PyTables 2.0: > $ python test_finder.py > Testing file /cluster/stress/p20loads/gac/lev_0_test.hdf5 > Traceback (most recent call last): > File "test_finder.py", line 16, in ? > fh.find_gpfb('121731') > File "/cluster/stress/u308168/public_html/pyloads/model/finder.py", > line 210, in find_gpfb > r =3D nasob.NodalResult(self.fileh, g, balance=3Dnot oelop) > File "../nasob.py", line 375, in __init__ > elements =3D grid.elements > File "../nasob.py", line 52, in _elements > self._elist.append(Result(self.fileh, eid, ogpf=3DTrue)) > File "../nasob.py", line 288, in __init__ > g.ogpf.T1 =3D g.ogpf.t1 =3D g.fx =3D g.FX =3D g.ogpf[:,0] > AttributeError: 'numpy.ndarray' object has no attribute 'T1' > Closing remaining open files: > /cluster/stress/p20loads/gac/lev_0_test.hdf5... done > > I guess I'll have to read the migration docs ;) Well, I think so ;) [1] http://www.carabos.com/docs/OPSI-indexes.pdf Cheers, =2D-=20 >0,0< Francesc Altet =C2=A0 =C2=A0 http://www.carabos.com/ V V C=C3=A1rabos Coop. V. =C2=A0=C2=A0Enjoy Data "-" |