James Bonfield - 2005-06-30

I am working on the 1.9.0 release of io_lib, due to be released soon (see the CVS tree if you want a sneak view). I'd be interested in any heavy users of io_lib to test it.

The key change is in speed handling. Specifically I now do most encoding and decoding in memory and then read/write the entire trace as a single data block. This has unfortunately lead to some incompatibilities in the API, but not of the core commonly used functions (I hope).

The preliminary change notes are (poorly formatted I know...)

* ***INCOMPATIBILITIES*** to 1.8.12

  - The Exp_info structure now internally contains an "mFILE *" member
    instead of "FILE *" member. If you use the experiment file functions
    for I/O then hopefully it'll still work. However if you directly
    manipulated the Exp_info yourself using fprintf etc then you will
    need to modify your code.

  - Some functions no longer have external scope. Most of these did not
    previously have external function prototypes. If you have a burning
    need to use one of these, please contact me directly via sourceforge.
    The full list is:

      ctfType (global variable)            ztr_encode_samples_C
      replace_nl                           ztr_encode_samples_G
      ctfDecorrelate                       ztr_encode_samples_T
      exp_print_line_                      ztr_decode_samples
      find_file_tar                        ztr_encode_bases
      find_file_archive                    ztr_decode_bases
      find_file_url                        ztr_encode_positions
      ztr_write_header                     ztr_decode_positions
      ztr_write_chunk                      ztr_encode_confidence_1
      ztr_read_header                      ztr_decode_confidence_1
      ztr_read_chunk_hdr                   ztr_encode_confidence_4
      compress_chunk                       ztr_decode_confidence_4
      uncompress_chunk                     ztr_encode_text
      ztr_encode_samples_4                 ztr_decode_text
      ztr_decode_samples_4                 ztr_encode_clips
      ztr_encode_samples_common            ztr_decode_clips
      ztr_encode_samples_A

  - Some external functions have changed prototypes to use mFILE instead
    of FILE. Most cases of these I've put in place a wrapper function
    with the old name, but not yet all. Functions changed are:

      ctfFRead                             write_scf_samples32
      ctfFWrite                            write_scf_base
      exp_print_line                       write_scf_bases
      exp_print_mline                      write_scf_bases3
      exp_print_seq                        write_scf_comment
      read_scf_header                      fcompress_file
      read_scf_sample1                     fopen_compressed
      read_scf_samples1                    freopen_compressed
      read_scf_samples31                   be_write_int_1
      read_scf_sample2                     be_write_int_2
      read_scf_samples2                    be_write_int_4
      read_scf_samples32                   be_read_int_1
      read_scf_base                        be_read_int_2
      read_scf_bases                       be_read_int_4
      read_scf_bases3                      le_write_int_1
      read_scf_comment                     le_write_int_2
      write_scf_header                     le_write_int_4
      write_scf_sample1                    le_read_int_1
      write_scf_samples1                   le_read_int_2
      write_scf_samples31                  le_read_int_4
      write_scf_samples2                   fdetermine_trace_type

  - Removed support for the OLD unix "pack" program as a valid trace
    compression algorithm.

  - Removed CORBA support. (It wasn't enabled and I've no idea if it
    even worked as I cannot test it.)

  - The default search order for RAWDATA now has the current working
    directory at the end of RAWDATA instead of the start.

* Significant speed ups, particularly when dealing with reading
  gzipped files or when extracting data from tar files.

* New external functions for faster access via mFILE (memory-file)
  structs. These mimic the fread/fwrite calls, but with mfread/mfwrite
  etc.
* Numerous minor tweaks and updates to fix compiler warnings on more
  stricter modes of the Intel C Compiler.

* Preliminary support for storing pyrosequencing style traces. This
  has been modeled on the flowgram data from 454, but should be
  applicable to other platforms. ZTR has been updated to incorporate
  this too.

  The Read structure also has flow, flow_order, nflows and flow_raw
  elements too. Code to convert these into the more usual traceA/C/G/T
  arrays exists currently as part of Trev (in tk_utils in the Staden
  Package), but this may move into io_lib for the next official release.

* New hash_tar and hash_extract programs. These replace the index_tar
  program for rast random access. For RAWDATA include "HASH=hashfile"
  as an element to get io_lib to use the archive hash. It's possible
  to create hash files of most archive formats as the hash itself
  contains the offset and size of each item in the archive. This means
  that extracting an item does not need to know the format of the
  original archive.

  Some benchmarks show that on ext3 it's actually faster to extract
  files from the hash than directly via the directory. This was
  testing with ~200,000 files, whereupon directory lookups become
  slow. I'd imagine ResierFS or similar to be faster.

* Added an XRLE encoding for ZTR. This is similar to the existing RLE
  mechanism but it copes with run length encoding of items larger than
  a single byte. It's current use is for storing the 4-base repeating
  flow order in 454 data.

James