From: Norbert N. <Nor...@gm...> - 2005-10-08 09:15:44
|
Hi there, I did some web-research on the topic a little while ago, but I could not find any details about it: As it seems, the HDF5 library does not take any precautions about possible power failures. Modern database systems usually have a concepts of "atomic operations" and "journalling" that protect the data against corruption. HDF5 does not provide this, so pytables is out of luck. There is no perfectly safe way of handling a HDF5 file. What I usually do is to limit the chance of breaking by doing write operations all at once at the end of a big calculation loop and flush the data immediately. Of course, this only makes sense for programs (like mine) where outputting data takes a negligible fraction of the total computation time. The chance that the program breaks in this short while is negligible. Alternatively, it might be possible to carefully select the data structures in such a way that a write operation never changes the control data in the file. (i.e. no dynamic data structures, no growing tables, etc.) this way, it should be possible to write the data in such a way that the file is in a sane conditition at any point of time, no matter where it breaks half-way in an operation. Data compression should probably be avoided in any case in such a situation. Of course, in either way, you throw away much of the power of pytables, but unfortunately, that's the price you have to pay for safety. Greetings, Norbert Francesco Del Degan wrote: >Hi, i've a question: > >Under certain circumstances, i can running into corrupting of .h5 file, >for instance when machine crash, power failure, etc... , causing >lost of all data (i'm working into gigabytes files). > >Perhaps this is an hdf5 question, but i would like to know in what manner >do you manage this in practice. > >Some considerations: > >1) Copying/rsyncing the file before writing is possible but is very very >slow on big files. >2) Truncating the file to size before writing doesn't resolve the problem, >hdf5 remains corrupted (the header is changed, hdf5 isn't pure append file) > >It's possible to implement a recovery/checkpoint system? > >I've noticed that hdf5 uses some headers in file >(http://hdf.ncsa.uiuc.edu/HDF5/doc/H5.format.html), >if i try to save (i need to study hdf5 file format) headers before >writing, for instance >in a master register/file, i can achieve recovery? > >I would like to write some hdf5/pytables extension to recovery corrupted >files, in your >opinion it's wasted work? > >It's obvious that this is possible only if i append data, because if i >change >rows, i need to track also rows changes in file (more difficult task). > >Thanks for your job, >Francesco > > > > >------------------------------------------------------- >This SF.Net email is sponsored by: >Power Architecture Resource Center: Free content, downloads, discussions, >and more. http://solutions.newsforge.com/ibmarch.tmpl >_______________________________________________ >Pytables-users mailing list >Pyt...@li... >https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > |