Menu

Newbie issues with streaming images to HF5 file

2021-10-03
2021-10-05
  • Scott Stecker

    Scott Stecker - 2021-10-03

    I'm new the the h5labview library and am working on creating a data storage method where I am collecting images and other data in a streaming fashion. Ideally I would like to save each bit of data as a row in the hdf5 file i.e.
    Timestamp, X, Y, Z, I, J, K, image
    Where the majority of the data is just either a float or int, and then the image data attached to each data write / index if think of it as an array.

    I have review the save image example and can follow that such as I can save single images no problem
    I have reviewed the append 2D data set also, and can write all my image data using this method, but have run into trouble such that each image is fully appended without anything to note a break from image to image?

    In the save image example I noted the Pic-3D example that when checking this in hdfView shows a series of images by way of a 3D array which is pretty close to what I want to do, but since my images are streaming in I am unsure how to accomplish this using an "appending" type of technique?

    In my opinion I feel what I want to do is pretty simple, but haven't probably fully grasp the ins/outs of HDF5 in general.

    Any pointers or suggestions would be greatly appreciated.

    Thanks

     
  • Martijn Jasperse

    Hi,
    Based on my experience with HDF5 I would recommend that what you actually want is to use a group, where each image you stream is a new dataset in that group with associated measurement metadata, instead of trying to construct a compound object with the idea of it being one dataset.

    There are a number of reasons for this

    1. It is less sensitive to data corruption. If the file is corrupted somehow you can recover uncorrupted datasets. Using one dataset all the data will be lost.
    2. It is more space efficient. The "heap" model that HDF5 uses for free space allocation has known issues when resizing large datasets (see this FAQ entry for example).
    3. This is similarly a problem for disk space - when you append to a dataset and resize it, HDF5 sometimes has to allocate a whole new block, effectively doubling the file size. In many circumstances the "freed" space is lost and this is why "repacking" HDF5 files is a common practice.
    4. In some circumstances it is more efficient to index a group than slices through a giant (compound) array. This may only be of concern if your file is GB or TB large though.
    5. This is especially true if you only ever want to access one measurement at a time, since LabVIEW might not be able to load the entire giant dataset in memory.
    6. Compound arrays are painful if you ever want to add entries- attributes are much more convenient for metadata.

    Of course this is just general advice. I personally have lost data due to power outages in the middle of experiments, which can be pretty painful for long-running data collection, so it's a big consideration for me.

    You should be able to use H5Dappend to combine 2D slices into a 3D dataset though. If you get really stuck I could probably throw together an example, although I'm very busy with work at the moment so I doubt I'd be able to get around to it soon.

    Good luck!

    Cheers,
    Martijn

     
  • Scott Stecker

    Scott Stecker - 2021-10-05

    Martjin,
    Thanks for the follow-up, I really appreciate it as well as the additional comments of what to think about as well. Late yesterday after reading up more on the examples and experimenting a bit, i was able to get things saving information in a format I wanted. Not sure I ended up following all your suggestiong above though.

    1. I think have created in essence 3 data sets, not sure if this is the same as the "grouping" approach you suggested
    2. One dataset is strictly for images. In order to append them successfully and be able to read them as individual images, I in essence created a 3D array with 1 element (image) for each entry
    3. One dataset is comprised of a compound set of cluster data all single items,
    4. The final data set is a 2D array of info related to the image, Again I append these as 3D arrays with 1 element each much like the image data.
    5. In my application there will be one "index" or row of information for each image collected
    6. We too are concerned with data loss and corruption, our method to address this is to limit the number of element to write to a data file, then close and open a new one appending an index number, i.e. savefile-0.h5, savefile-1.h5, savefile-2.h5... Probably not really taking advantage of the full feature set of HDF5 format, But due to my experience level at the moment this is my brute force approach.
    7. I expect our application to create potentially TB's of data over the course of 1-2 days. We use a lot of NI TDMS files, but a colleague is do some post analysis using Python and it seems like the Python tools available for dealing with HDF5 files are more mature and generally faster in our experience than those available for TDMS, plus we like the fact that HDF5 has and "image" datatype
    8. I think I am "chunking" most of my data as part of the append process, does this help or prevent any of the issues you describe in your #2 & #3 concerning files size and space allocation?
     

Log in to post a comment.

MongoDB Logo MongoDB