From: Francesc A. <fa...@py...> - 2008-07-28 18:17:36
|
A Saturday 26 July 2008, Anand Patil escrigué: > Hi all, > I seem to have lost a long-running simulation... it was writing out > to an hdf5 archive, and when I try to open the archive after the > simulation is over I get the following and several nodes are missing > from > Africa.S.db.group0: > > > In [2]: Africa.S.db._group > > HDF5-DIAG: Error detected in HDF5 (1.8.0) thread 0: > #000: H5Gdeprec.c line 777 in H5Giterate(): group iteration failed > major: Symbol table > minor: Iteration failed > #001: H5G.c line 1657 in H5G_iterate(): error iterating over links > major: Symbol table > minor: Iteration failed > #002: H5Gobj.c line 681 in H5G_obj_iterate(): can't iterate over > symbol table > major: Symbol table > minor: Iteration failed > #003: H5Gstab.c line 522 in H5G_stab_iterate(): iteration operator > failed major: Symbol table > minor: Can't move to next iterator location > #004: H5B.c line 1218 in H5B_iterate(): iterator function failed > major: B-Tree node > minor: Unable to list node > #005: H5Gnode.c line 1425 in H5G_node_iterate(): unable to load > symbol table node > major: Symbol table > minor: Unable to load metadata into cache > #006: H5AC.c line 1970 in H5AC_protect(): H5C_protect() failed. > major: Object cache > minor: Unable to protect metadata > #007: H5C.c line 5928 in H5C_protect(): can't load entry > major: Object cache > minor: Unable to load metadata into cache > #008: H5C.c line 10567 in H5C_load_entry(): unable to load entry > major: Object cache > minor: Unable to load metadata into cache > #009: H5Gnode.c line 384 in H5G_node_load(): unable to read symbol > table node > major: Symbol table > minor: Read failed > #010: H5F.c line 2974 in H5F_block_read(): file read failed > major: Low-level I/O > minor: Read failed > #011: H5FD.c line 2046 in H5FD_read(): driver read request failed > major: Virtual File Layer > minor: Read failed > #012: H5FDsec2.c line 725 in H5FD_sec2_read(): addr overflow > major: Invalid arguments to routine > minor: Address overflowed > > Out[2]: > /chain0 (Group) 'Chain #0' > children := ['__state__' (VLArray), 'group0' (Group)] > > Is there any way I can fix this? Hmm, I'm guessing that this happened after a crash of your program, right?. As fas as I know, there is not a way to recover from a corrupt HDF5. Your best bet when running large simulations is to provide a checkpoint and recovery functionality to your simulation software, so that you can restart the computations from the last sane HDF5 file saved before the crash. HTH, -- Francesc Alted Freelance developer Tel +34-964-282-249 |