El dv 15 de 06 del 2007 a les 14:30 -0700, en/na Conor Robinson va
> Thanks for your input. I think I'm getting ahead of myself, and will
> use my current method which is similar to what you described here:
> >My advice is: try to consolidate your data on a single buffer (it can be a
> > NumPy recarray, for example) and then feed the table with this buffer using
> > Table.append(). My guts are saying that, even in the hypotetical case that
> > PyTables would support parallelism (which is not the case, as I've already
> > said), this approach would be as fast as writing in parallel (unless you are
> > writing into a parallel filesystem, do you?).
> As for my current setup, I don't have access to a lustre or cfs so I'm
> bound by earthly read/writes. I'm choked by the I/O anyhow, so this
> is a little silly for now.
> It may be interesting to note that you can use mpi to compile hdf5 and
> pympi to build pytables successfully. As to innards of pytables and
> race conditions etc. I have not played with this and will trust you
> when you say its not parallel. My thinking was to not use pytables in
> "parallel threads" excuse my loose terminology, but to spawn multiple
> objects with read write methods and use pympi to execute object
> methods/tasks in parallel. I'd still like to see how an hdf5 file
> reacts when appending to multiple individual tables at the same time
> using pympi (this thought process may be flawed, and the system will
> probably order/stagger the I/O, just as an experiment).
I don't understand well what you are trying to do exactly, and besides,
have no direct experience with parallel HDF5 but if what you pretend is
to use the HDF5 parallel capabilities through PyTables I'm pretty sure
that you won't be be able to do this (even if you link PyTables with the
parallel version of HDF5 and/or use pympi).
This is because the parallel support in HDF5 is my not means
transparent, and you need to call special functions in order to start
doing parallel operations. For example, when opening a file, you have
to explicitely say to HDF5 that you are going to do parallel operations
on it; you normally do this issung the next calls:
hid_t H5Pcreate ( H5P_class_t classtype (IN) );
herr_t H5Pset_fapl_mpio ( hid_t plist_id (IN), MPI_Comm comm (IN), MPI_Info info (IN) );
and the same happens with most subsequent I/O operations.
The point is that PyTables doesn't implement any *mpio call. This is
why your approach is not going to work with PyTables.
> You seemed to portray that you can connect/merge multiple tables of
> the same column types efficiently with pytables? Is this covered in
> the Doc?
Not directly, but it is almost trivial to create a small function that
can read a buffer (using for example Table.read) from each source table
and write them (using Table.append) in order into the destination table.
Hope that helps,
Francesc Altet | Be careful about using the following code --
Carabos Coop. V. | I've only proven that it works,
http://www.carabos.com | I haven't tested it. -- Donald Knuth