I will check in with the lab, but my understanding is that these came from a NextSeq or NovaSeq and didn't have any modifications. Thanks for the quick response. I took a peak at FASTQ.java and saw the following code block: // Here we try to weed out PacBio, which will differ after the last slash: for (int i = idxSlash1 + 2; i < len1; i++) { if (id1.charAt(i) != id2.charAt(i)) { return false; } } I am using reformat.sh to do the following: - make sure reads are paired - count the number of reads/bases...
I decided to subset my FASTQ to a single read so the files are more than manageable to demonstrate the issue: - bad_R* -> this FASTQ pair is the read as shown in the post above. This fails with vpair enabled. - bad-no-desc_R* -> this FASTQ pair is the same read where the optional description (text after the space) has been trimmed. This succeeds with vpair enabled. - bad-no-trail_R* -> this FASTQ pair is the same read except the /1 and /2 has been removed from the sequence identifier. This succeeds...
I decided to subset my FASTQ to a single read so the files are more than manageable to demonstrate the issue: - bad_R -> this FASTQ pair is the read as shown in the post above. This fails with vpair enabled. - bad-no-desc_R -> this FASTQ pair is the same read where the optional description (text after the space) has been trimmed. This succeeds with vpair enabled. - bad-no-trail_R -> this FASTQ pair is the same read except the /1 and /2 has been removed from the sequence identifier. This succeeds...
reformat.sh vpair fails matching reads.
clarifying that I missed cq=f. This can be closed.
`refromat.sh` mode that does not change base quality scores
Another alternative solution for me might just be to run gzip -dct ${FASTQ_PATH} on each gzipped fastq I'd like to analyze. This will catch gzip corruption. However it might be useful to propagate errors like these directly through bbmap suite.
Propagation of internal error codes