Error readinf sanger traces from TAR files

  • Sven Klages

    Sven Klages - 2010-09-08


    we were compiling staden beta 6/7 with io_lib 1.12.2 or 1.12.4 and ran into the following problem:

    We store our ABI traces in tarballs, these are indexed using

    hash_tar -b -v myFile.tar > myFile.tar.hash

    These hash files are then added to RAWDATA environment variable, e.g.

    export RAWDATA="HASH=/path/to/tarballs/myFile.tar.hash"

    We constantly get an Error when trying to open trace files in gap4:

    'trace_file.scf': couldn't open"

    Using single trace files and setting RAWDATA accordingly works like a charm. There must be a problem with finding/reading/extracting the TAR files?

    Funny thing is, it worked with our old production version 1.7 … but everyone is eager to use a slightly newer version ;-)

    I have tested this behavoiur on our inhouse-built linux and on an Ubuntu 10.04 box (32bit).

    Any idea *where* to look or what went wrong?  Any infos missing? Just let me know ..


  • James Bonfield

    James Bonfield - 2010-09-08

    Try TRACE_PATH environment variable instead. RAWDATA I assumed still worked, but perhaps not. (Apologies if so.)

    I split RAWDATA into TRACE_PATH and EXP_PATH as internally we had different remote sources for fetching these.

  • Sven Klages

    Sven Klages - 2010-09-08

    OK, we think we could track down the problem to the function 'HashFileExtract" from the io_lib 1.12.4.
    If we go back to the command line:

    hash_tar myFile.tar > myFile.tar.hash
    hash_extract myFile.tar.hash MY_READ

    where MY_READ *does* exists in the tar file.

    We get not output from above command, $? returned "1".

    That's probably why we get an error from within the contig editor when we try opening a trace file.

    It also seems that RAWDATA/TRACE_PATH is not placed in "Trace File Location" in gap4 GUI?

    Any ideas what's going wrong? Are you using hash'ed tar files.


  • Sven Klages

    Sven Klages - 2010-09-08

    hmm, .. ok hash_extract works. No idea what happened before. So we still have the problem that gap4 doesn't want to read TAR archives .. (or whatever the reason is, that the traces are not loaded/displayed) .. :-(
    I now have set TRACE_PATH to contain the HASH=/path/to/hashfile.

  • Sven Klages

    Sven Klages - 2010-09-08

    Ok, things are getting strange :-(

    No idea why gap4 fails to read traces from TAR files.

    Now, concerning SFF data, hybrid assembly, created with MIRA, converted with caf2gap.

    Indices are created via

    hash_sff -o index.hash *.sff

    In one contig I try to open two sequences,

    GF9QO5102GYUN9 which pops up in trace display and
    GGOB55U02FPZPG which doesn't show up at all. No error in terminal, it seems that the trace display shows up and vanishes inmediately.

    hash_extract index.hash GF9QO5102GYUN9 > a
    hash_extract index.hash GGOB55U02FPZPG > b

    I can open "a" with trev, "b" is corrupted : "Unable to load b with format Any".

    Both reads can be extracted from SFF file using Roche's SFF Tools.

    my system here, 64bit linux, local fs (xfs), kernel 2.6.35, glibc 2.6.1

    I am kind of stuck :-(

    Any idea what to think about / to do ..? I am quite sure it's a nasty little thing I am not thinking about …


  • James Bonfield

    James Bonfield - 2010-09-10

    It's starting to sound more like a bug than simply "user error" to me. If hash_extract is extracting duff data then either the sff file itself is corrupted in some manner, the indexing of it has failed, or hash_extract has decoded the index incorrectly. None of these seem particularly likely, but something has to be wrong. I believe 454 have their own SFF tools for dumping out the contents so perhaps it's possible to query that way. Also you could try (on the unhashed original sff) "trev blah.sff/GGOB55U02FPZPG" to see if it can figure out how to extract the data without using the hash index. This should work, but it's a slower process. Setting TRACE_PATH to SFF=blah.sff has the same effect.

    Is any of this data sensitive? Even if I could look at the raw contents of b (feel free to email me via sourceforge or direct jkb -at- sanger ac uk if permitted), then maybe I could get an inkling as to the nature of the problem.


Log in to post a comment.