I am working on the 1.9.0 release of io_lib, due to be released soon (see the CVS tree if you want a sneak view). I'd be interested in any heavy users of io_lib to test it.
The key change is in speed handling. Specifically I now do most encoding and decoding in memory and then read/write the entire trace as a single data block. This has unfortunately lead to some incompatibilities in the API, but not of the core commonly used functions (I hope).
The preliminary change notes are (poorly formatted I know...)
* ***INCOMPATIBILITIES*** to 1.8.12
- The Exp_info structure now internally contains an "mFILE *" member
instead of "FILE *" member. If you use the experiment file functions
for I/O then hopefully it'll still work. However if you directly
manipulated the Exp_info yourself using fprintf etc then you will
need to modify your code.
- Some functions no longer have external scope. Most of these did not
previously have external function prototypes. If you have a burning
need to use one of these, please contact me directly via sourceforge.
The full list is:
ctfType (global variable) ztr_encode_samples_C
- Some external functions have changed prototypes to use mFILE instead
of FILE. Most cases of these I've put in place a wrapper function
with the old name, but not yet all. Functions changed are:
- Removed support for the OLD unix "pack" program as a valid trace
- Removed CORBA support. (It wasn't enabled and I've no idea if it
even worked as I cannot test it.)
- The default search order for RAWDATA now has the current working
directory at the end of RAWDATA instead of the start.
* Significant speed ups, particularly when dealing with reading
gzipped files or when extracting data from tar files.
* New external functions for faster access via mFILE (memory-file)
structs. These mimic the fread/fwrite calls, but with mfread/mfwrite
* Numerous minor tweaks and updates to fix compiler warnings on more
stricter modes of the Intel C Compiler.
* Preliminary support for storing pyrosequencing style traces. This
has been modeled on the flowgram data from 454, but should be
applicable to other platforms. ZTR has been updated to incorporate
The Read structure also has flow, flow_order, nflows and flow_raw
elements too. Code to convert these into the more usual traceA/C/G/T
arrays exists currently as part of Trev (in tk_utils in the Staden
Package), but this may move into io_lib for the next official release.
* New hash_tar and hash_extract programs. These replace the index_tar
program for rast random access. For RAWDATA include "HASH=hashfile"
as an element to get io_lib to use the archive hash. It's possible
to create hash files of most archive formats as the hash itself
contains the offset and size of each item in the archive. This means
that extracting an item does not need to know the format of the
Some benchmarks show that on ext3 it's actually faster to extract
files from the hash than directly via the directory. This was
testing with ~200,000 files, whereupon directory lookups become
slow. I'd imagine ResierFS or similar to be faster.
* Added an XRLE encoding for ZTR. This is similar to the existing RLE
mechanism but it copes with run length encoding of items larger than
a single byte. It's current use is for storing the 4-base repeating
flow order in 454 data.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.