Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#253 Gatekeeper ILL Errors

gatekeeper
closed-fixed
Brian Walenz
5
2015-02-03
2013-08-21
JeroenF@lumc
No

Best,

I am running pacBioToCA to correct reads using a Illumina HiSeq dataset.
Unfortunately I run into some troubles. The gatekeeper error log ("asm.gkpStore.err") contains the following error messages:

Processing SINGLE-ENDED SANGER QV encoding reads from:
<pacbio_fastq_file>
GKP finished with 578612 alerts or errors:
540303 # ILL Error: not a sequence start line.
38309 # ILL Error: not a quality start line.

The only information I can find about this error is a fairly recent post on SeqAnswers: http://seqanswers.com/forums/showthread.php?t=24916
Here the original poster states that "The ILL errors are thrown because of read lengths above 2047 bps. A bug that's supposed to be fixed since wgs-7.0."

I (re)installed the latest stable Celera release on our computer cluster but this did not solve the problem. However, gatekeeper does not crash and the assembly continues after the gatekeeper is done. I am concerned about this error. Does it imply that any (PacBio) read longer than ~2kb will not be loaded? How can I solve this issue?

Any help is greatly appreciated.
Thanks,

Jeroen

Discussion

  • Brian Walenz
    Brian Walenz
    2013-08-21

    • assigned_to: Brian Walenz
     
  • Brian Walenz
    Brian Walenz
    2013-08-21

    Is your file of reads fastq? E.g., four lines repeating:

    @read_name
    [sequence]
    +
    [quality values]

    In particular, sequence and quality values must be on one line each.

    If you're loading reads longer than 2k, change AS_READ_MAX_NORMAL_LEN_BITS from 11 to 15 in AS_global.H. Otherwise, you'll get truncated reads, and gatekeeper should error out.

    You can do a 'gatekeeper -dumpinfo gkpStore' to see what reads were actually loaded. The same info should be in gkpStore.info for recent checkouts.

     
  • JeroenF@lumc
    JeroenF@lumc
    2013-08-22

    Hi Brian, Sergey,

    Thank you for your quick, detailed replies.
    I performed a fresh install of Celera, compiling the stable release from source.
    I altered the AS_READ_MAX_NORMAL_LEN_BITS parameter from 11 to 15.
    I also checked the format of the FASTQ file, which appeared to be correct.

    I've started a new pacBioToCA process and the gatekeeper just finished gracefully without any errors. Seems the problem is solved!

    Thanks for your help.
    Best,

    Jeroen

     
    Last edit: JeroenF@lumc 2013-08-22
  • Brian Walenz
    Brian Walenz
    2015-02-03

    • status: open --> closed-fixed
     
  • Brian Walenz
    Brian Walenz
    2015-02-03

    Closing old resolved ticket.