FASTQ quality line too long in read

Brought to you by: brianwalenz, jasonmiller9704, mcschatz, skoren

#290 FASTQ quality line too long in read

Milestone: gatekeeper

Status: closed-fixed

Owner: nobody

Labels: None

Priority: 5

Updated: 2015-02-02

Created: 2014-12-30

Creator: jeff

Private: No

I've been running PBcR in the wgs8.2 beta release without a problem but am encountering this issue with the full 8.2 release. When running PBcR on pacbio only data, and using the java jar to add simulated Quality scores to my pacbio fasta's, I get an error stating the line is too long in read. Examining the read lengths and q score lengths with awk etc. does no show any discrepancies. Running PBcR in the beta version of the binary works just fine again. Any thoughts on where to make changes to address this? thanks

Discussion

Sergey Koren - 2014-12-30

When you say you get an error about the line being too long, is that coming from the java jar you are using to add the quality values or from the PBcR pipeline itself? Both 8.2 and 8.2beta should support sequences up to 65Kbp. If you can share your fastq file we can try to reproduce the error.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- jeff - 2014-12-30
  
  Hey Sergey, thanks for the quick reply and sorry I did not clarify. The error is generated from the PBcR pipeline itself. The jar seems fine. As a quick test, I ran the wgs8.2beta binaries on the same output fastq file and it does not have the same problem. What is the best way to share the fastq with you to take a look? (~2gb)
  
  I'm experimenting to see if it's a bug with header formatting between the beta and full version and will report back what I see.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sergey Koren - 2014-12-30

You can post the file to my FTP site:
ftp://ftp.cbcb.umd.edu/incoming/sergek

You won't be able to see the file once it is uploaded but I will have access to it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- jeff - 2014-12-30
  
  Great, uploading now. Thanks so much for your help, Sergey, on this issue and all of the other advice RE server hardware etc. I appreciate it. I accidentally also sent a dropbox link but you can disregard that email, the gzipped file should finish transferring in a moment.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Brian Walenz - 2014-12-30

I suspect this will be gatekeeper, and be this message:

FASTQ sequence line too long in read '%s'

If so, changing BASE_MAX_LEN near the top of AS_GKP/AS_GKP_illumina.C will solve the problem. The read should be truncated to the 65534 bp maximum length.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- jeff - 2014-12-30
  
  Sounds good, I have several reads which exceed this length so I'll trim and try again. I actually have one strange monster read (>200kb) which I imagine would interfere with things. I'll report my results with the cleaned up data when they are ready. Was this limitation not present in the beta out of curiosity? beta runs just fine. Thanks
  
  Last edit: jeff 2014-12-30
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

jeff - 2014-12-31

Thanks for the help Sergey and Brian, the spurious giant read was causing the gatekeeper error it appears.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Brian Walenz - 2015-02-02

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Brian Walenz - 2015-02-02

I increased the limit to 16 million. Reads longer than this will still cause gatekeeper to fail. Reads longer than the assembler max (64k by default) will be truncated.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.