[getdata-devel] Zipped Dirfile support patch
Scientific Database Format
Brought to you by:
ketiltrout
|
From: Matthew P. <ge...@mp...> - 2020-01-30 19:18:41
|
Dear all, Attached is a patch that allows for reading Dirfiles that are in an uncompressed Zip file. I'm sending it to this list now in case it is useful for anyone else (CLASS has recently started using it). It should apply cleanly to SVN r1175. Development of the patch was motivated by a need to reduce the total file count for FLAC-encoded Dirfiles, to alleviate the backup and data transfer overheads that result from having a very large number of small files. Below is some documentation (also included in the patch) for the functionality: Separate from the Dirfile encoding scheme, GetData will read Dirfiles contained in uncompressed Zip files. This functionality is meant for reading archival data, so writing to these Zip files is not supported. Using the Info-ZIP `zip` utility, a Zip file can be created by running `zip -r0 ../dirfile.zip *` from within the root of an existing Dirfile. All encoding schemes are supported by this functionality except for the two encoding schemes that already use Zip files, *zzip* and *zzslim*. The encoding scheme must be specified using the /ENCODING directive, even if the Dirfile is unencoded. For /INCLUDE directives and LINTERP field look up table files, only relative paths are supported and only without `./` and `../` syntax. Although Zip files are most commonly created using _Deflate_ compression, the Zip standard (ISO/IEC 21320-1) also supports _Store_ compression, i.e., no compression at all. GetData's Zip file support requires _Store_ compression for all data files, although either _Store_ compression or _Deflate_ compression can be used for any *format* files or any LINTERP field look up table files. With _Store_ compression, a Zip file effectively concatenates a Dirfile's individual files together into a single file. Since a Zip file contains an offset table, unlike a tarball, random reads are supported without the need to load the entire file from disk. -Matthew Petroff |