Re: [getdata-devel] Zipped Dirfile support patch
Scientific Database Format
Brought to you by:
ketiltrout
|
From: Matthew P. <ge...@mp...> - 2020-02-06 16:52:37
|
A minor update. The previous version of the patch had a couple file descriptor leaks that have now been fixed. Checks with Valgrind show no remaining file descriptor leaks (and no memory leaks). A revised patch is attached. -Matthew Petroff On Thu, Jan 30, 2020 at 1:26 PM Matthew Petroff <ge...@mp...> wrote: > Dear all, > > Attached is a patch that allows for reading Dirfiles that are in an > uncompressed > Zip file. I'm sending it to this list now in case it is useful for anyone > else > (CLASS has recently started using it). It should apply cleanly to SVN > r1175. > Development of the patch was motivated by a need to reduce the total file > count > for FLAC-encoded Dirfiles, to alleviate the backup and data transfer > overheads > that result from having a very large number of small files. > > Below is some documentation (also included in the patch) for the > functionality: > > Separate from the Dirfile encoding scheme, GetData will read Dirfiles > contained > in uncompressed Zip files. This functionality is meant for reading archival > data, so writing to these Zip files is not supported. Using the Info-ZIP > `zip` > utility, a Zip file can be created by running `zip -r0 ../dirfile.zip *` > from > within the root of an existing Dirfile. All encoding schemes are supported > by > this functionality except for the two encoding schemes that already use Zip > files, *zzip* and *zzslim*. The encoding scheme must be specified using the > /ENCODING directive, even if the Dirfile is unencoded. For /INCLUDE > directives > and LINTERP field look up table files, only relative paths are supported > and > only without `./` and `../` syntax. > > Although Zip files are most commonly created using _Deflate_ compression, > the > Zip standard (ISO/IEC 21320-1) also supports _Store_ compression, i.e., no > compression at all. GetData's Zip file support requires _Store_ > compression for > all data files, although either _Store_ compression or _Deflate_ > compression > can be used for any *format* files or any LINTERP field look up table > files. > With _Store_ compression, a Zip file effectively concatenates a Dirfile's > individual files together into a single file. Since a Zip file contains an > offset table, unlike a tarball, random reads are supported without the > need to load the entire file from disk. > > -Matthew Petroff > > > |