Re: [getdata-devel] Zipped Dirfile support patch
Scientific Database Format
Brought to you by:
ketiltrout
|
From: Matthew P. <ge...@mp...> - 2020-02-28 20:16:07
|
Another minor update. The previous version of the patch prematurely closed file descriptors for the LUTs used for LINTERP fields, which sporadically caused issues. A revised patch with this bug fixed is attached. -Matthew Petroff On Thu, Feb 6, 2020 at 11:52 AM Matthew Petroff <ge...@mp...> wrote: > A minor update. The previous version of the patch had a couple file > descriptor leaks that have now been fixed. Checks with Valgrind show no > remaining file descriptor leaks (and no memory leaks). A revised patch is > attached. > > -Matthew Petroff > > On Thu, Jan 30, 2020 at 1:26 PM Matthew Petroff <ge...@mp...> > wrote: > >> Dear all, >> >> Attached is a patch that allows for reading Dirfiles that are in an >> uncompressed >> Zip file. I'm sending it to this list now in case it is useful for anyone >> else >> (CLASS has recently started using it). It should apply cleanly to SVN >> r1175. >> Development of the patch was motivated by a need to reduce the total file >> count >> for FLAC-encoded Dirfiles, to alleviate the backup and data transfer >> overheads >> that result from having a very large number of small files. >> >> Below is some documentation (also included in the patch) for the >> functionality: >> >> Separate from the Dirfile encoding scheme, GetData will read Dirfiles >> contained >> in uncompressed Zip files. This functionality is meant for reading >> archival >> data, so writing to these Zip files is not supported. Using the Info-ZIP >> `zip` >> utility, a Zip file can be created by running `zip -r0 ../dirfile.zip *` >> from >> within the root of an existing Dirfile. All encoding schemes are >> supported by >> this functionality except for the two encoding schemes that already use >> Zip >> files, *zzip* and *zzslim*. The encoding scheme must be specified using >> the >> /ENCODING directive, even if the Dirfile is unencoded. For /INCLUDE >> directives >> and LINTERP field look up table files, only relative paths are supported >> and >> only without `./` and `../` syntax. >> >> Although Zip files are most commonly created using _Deflate_ compression, >> the >> Zip standard (ISO/IEC 21320-1) also supports _Store_ compression, i.e., no >> compression at all. GetData's Zip file support requires _Store_ >> compression for >> all data files, although either _Store_ compression or _Deflate_ >> compression >> can be used for any *format* files or any LINTERP field look up table >> files. >> With _Store_ compression, a Zip file effectively concatenates a Dirfile's >> individual files together into a single file. Since a Zip file contains an >> offset table, unlike a tarball, random reads are supported without the >> need to load the entire file from disk. >> >> -Matthew Petroff >> >> >> |