Hi Martijn,
there is no vi for H5Fflush. Please add this.
For long term measurements i.e. in daily files it is neccessary to flush the file buffer periodically to disk, to avoid to much data loss at a unexpended end of the program.
Peter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello:
I have tried the flush VI and the hdf5 file size does not seem to increase after each call to Flush.vi and so until I close the file. I was expecting to see the file size grow following each flush. Am I missing something? Thx.
https://support.hdfgroup.org/HDF5/doc/RM/H5F/H5Fflush.htm reads
"Note:
HDF5 does not possess full control over buffering. H5Fflush flushes the internal HDF5 buffers then asks the operating system (the OS) to flush the system buffers for the open files. After that, the OS is responsible for ensuring that the data is actually flushed to disk"
laurent
😕
1
Last edit: Laurent 2019-12-03
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It depends on several details; I would not expect the file size to change after each flush. The main reason for this is that HDF5 preallocates space when you have chunked datasets with variable dimensions in order to save time on costly IO operations rebuilding the file when you expand the dataset. The outcome is that the file size will increase, but not after each write - or even after each chunk. HDF is flexible but not minimal - i.e. it may take up more disk space than strictly necessary for the sake of speed and functionality.
To be clear, H5Fflush should write the data to disk on typical OS/filesystem combinations (it's less simple with MPI systems though). You can verify this by making a copy of the file after the flush and checking its contents match your expectations (remember not to simultaneously the same HDF file multiple times, even just for reading). This is the recommended way of verifying that data appending is working correctly.
I would also recommend ensuring any open references to the dataset are closed before calling H5Fflush to ensure that HDF5 has pushed any "pending" operations to the filesystem, otherwise it might be buffering them in memory for performance reasons. Using H5Fclose in this library closes all associated open references and that may be influencing when write operations are occurring (i.e. HDF5 hasn't asked the OS to do anything yet because it's waiting for you to close the reference first).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes I make sure the dataset is closed, and also the group where dataset is written to. Each iteration I create a new group to write the new dataset in. When the file is openfile/write/closefile for each iteration, the filesize shows it increases. When I openfile[ iterate: write/closedataset/closegroup/flush]closefile the size does not change until the final closefile. It is Win10.... There are discussions about this (in general) here https://superuser.com/questions/1362024/how-to-programmatically-clear-buffers-and-caches-in-windows but that may be heavy-handed.... I think we are fighting the disk-write caching feature of the OS... Changing the drive settings in Windows like indicated here: https://www.thewindowsclub.com/enable-disable-disk-write-caching-windows-7-8 did not help.
Last edit: Laurent 2019-12-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
aha! Now there is a twist.... I was trying to optimize, so my code was calling the flush only at a reduced time interval compare to the data writes (so to not flush at every iteration, but 1 per n iterations, n = 50 or 100). It turns out that when flush is done strictly after each write one can see the file size increases. Then if flushing once for 10 iterations for instance: the filesize still increases (!), but not at the rate Flush is being called. So there is something going on in the background between hdf5 and OS, but at least there is progress.
For instance, requesting a flush every 50 iterations (and saving every 20ms) takes > 2 min to see a file size change.
So I will need to look at where the comprise is... In the end all this is to make sure data is really written to file in case of a system power loss, which in the past lead to corrupt hdf5 files that could not be opened anymore. Any insight on this? thx.
Last edit: Laurent 2019-12-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Martijn,
there is no vi for H5Fflush. Please add this.
For long term measurements i.e. in daily files it is neccessary to flush the file buffer periodically to disk, to avoid to much data loss at a unexpended end of the program.
Peter
Agreed - I've added it in revision e4ca2cefc5eb, which will be in the next release, along with H5Pset_scaleoffset.
Hello:
I have tried the flush VI and the hdf5 file size does not seem to increase after each call to Flush.vi and so until I close the file. I was expecting to see the file size grow following each flush. Am I missing something? Thx.
https://support.hdfgroup.org/HDF5/doc/RM/H5F/H5Fflush.htm reads
"Note:
HDF5 does not possess full control over buffering. H5Fflush flushes the internal HDF5 buffers then asks the operating system (the OS) to flush the system buffers for the open files. After that, the OS is responsible for ensuring that the data is actually flushed to disk"
laurent
Last edit: Laurent 2019-12-03
It depends on several details; I would not expect the file size to change after each flush. The main reason for this is that HDF5 preallocates space when you have chunked datasets with variable dimensions in order to save time on costly IO operations rebuilding the file when you expand the dataset. The outcome is that the file size will increase, but not after each write - or even after each chunk. HDF is flexible but not minimal - i.e. it may take up more disk space than strictly necessary for the sake of speed and functionality.
To be clear, H5Fflush should write the data to disk on typical OS/filesystem combinations (it's less simple with MPI systems though). You can verify this by making a copy of the file after the flush and checking its contents match your expectations (remember not to simultaneously the same HDF file multiple times, even just for reading). This is the recommended way of verifying that data appending is working correctly.
I would also recommend ensuring any open references to the dataset are closed before calling H5Fflush to ensure that HDF5 has pushed any "pending" operations to the filesystem, otherwise it might be buffering them in memory for performance reasons. Using H5Fclose in this library closes all associated open references and that may be influencing when write operations are occurring (i.e. HDF5 hasn't asked the OS to do anything yet because it's waiting for you to close the reference first).
Yes I make sure the dataset is closed, and also the group where dataset is written to. Each iteration I create a new group to write the new dataset in. When the file is openfile/write/closefile for each iteration, the filesize shows it increases. When I openfile[ iterate: write/closedataset/closegroup/flush]closefile the size does not change until the final closefile. It is Win10.... There are discussions about this (in general) here https://superuser.com/questions/1362024/how-to-programmatically-clear-buffers-and-caches-in-windows but that may be heavy-handed.... I think we are fighting the disk-write caching feature of the OS... Changing the drive settings in Windows like indicated here: https://www.thewindowsclub.com/enable-disable-disk-write-caching-windows-7-8 did not help.
Last edit: Laurent 2019-12-05
aha! Now there is a twist.... I was trying to optimize, so my code was calling the flush only at a reduced time interval compare to the data writes (so to not flush at every iteration, but 1 per n iterations, n = 50 or 100). It turns out that when flush is done strictly after each write one can see the file size increases. Then if flushing once for 10 iterations for instance: the filesize still increases (!), but not at the rate Flush is being called. So there is something going on in the background between hdf5 and OS, but at least there is progress.
For instance, requesting a flush every 50 iterations (and saving every 20ms) takes > 2 min to see a file size change.
So I will need to look at where the comprise is... In the end all this is to make sure data is really written to file in case of a system power loss, which in the past lead to corrupt hdf5 files that could not be opened anymore. Any insight on this? thx.
Last edit: Laurent 2019-12-05