Menu

#290 FASTQ quality line too long in read

gatekeeper
closed-fixed
nobody
None
5
2015-02-02
2014-12-30
jeff
No

I've been running PBcR in the wgs8.2 beta release without a problem but am encountering this issue with the full 8.2 release. When running PBcR on pacbio only data, and using the java jar to add simulated Quality scores to my pacbio fasta's, I get an error stating the line is too long in read. Examining the read lengths and q score lengths with awk etc. does no show any discrepancies. Running PBcR in the beta version of the binary works just fine again. Any thoughts on where to make changes to address this? thanks

Discussion

  • Sergey Koren

    Sergey Koren - 2014-12-30

    When you say you get an error about the line being too long, is that coming from the java jar you are using to add the quality values or from the PBcR pipeline itself? Both 8.2 and 8.2beta should support sequences up to 65Kbp. If you can share your fastq file we can try to reproduce the error.

     
    • jeff

      jeff - 2014-12-30

      Hey Sergey, thanks for the quick reply and sorry I did not clarify. The error is generated from the PBcR pipeline itself. The jar seems fine. As a quick test, I ran the wgs8.2beta binaries on the same output fastq file and it does not have the same problem. What is the best way to share the fastq with you to take a look? (~2gb)

      I'm experimenting to see if it's a bug with header formatting between the beta and full version and will report back what I see.

       
  • Sergey Koren

    Sergey Koren - 2014-12-30

    You can post the file to my FTP site:
    ftp://ftp.cbcb.umd.edu/incoming/sergek

    You won't be able to see the file once it is uploaded but I will have access to it.

     
    • jeff

      jeff - 2014-12-30

      Great, uploading now. Thanks so much for your help, Sergey, on this issue and all of the other advice RE server hardware etc. I appreciate it. I accidentally also sent a dropbox link but you can disregard that email, the gzipped file should finish transferring in a moment.

       
  • Brian Walenz

    Brian Walenz - 2014-12-30

    I suspect this will be gatekeeper, and be this message:

    FASTQ sequence line too long in read '%s'

    If so, changing BASE_MAX_LEN near the top of AS_GKP/AS_GKP_illumina.C will solve the problem. The read should be truncated to the 65534 bp maximum length.

     
    • jeff

      jeff - 2014-12-30

      Sounds good, I have several reads which exceed this length so I'll trim and try again. I actually have one strange monster read (>200kb) which I imagine would interfere with things. I'll report my results with the cleaned up data when they are ready. Was this limitation not present in the beta out of curiosity? beta runs just fine. Thanks

       

      Last edit: jeff 2014-12-30
  • jeff

    jeff - 2014-12-31

    Thanks for the help Sergey and Brian, the spurious giant read was causing the gatekeeper error it appears.

     
  • Brian Walenz

    Brian Walenz - 2015-02-02
    • status: open --> closed-fixed
     
  • Brian Walenz

    Brian Walenz - 2015-02-02

    I increased the limit to 16 million. Reads longer than this will still cause gatekeeper to fail. Reads longer than the assembler max (64k by default) will be truncated.

     

Log in to post a comment.

MongoDB Logo MongoDB