Menu

#107 Verify paired-end sequence names and mate numbers

Next Release
open
nobody
None
5
2015-04-30
2015-04-30
Jake
No

I recently stumbled upon this particularly important piece of information in the bowtie2 manual.

"The first mate in the file for mate 1 forms a pair with the first mate in the file for mate 2, the second with the second, and so on."

I had been using unsorted paired-end files, although the paired-end alignment wasn't particularly important, at the time. In addition, my group had been given poorly laned fastq files. Some of the reads in the mate 1 file were actually tagged with /2 and vice versa. Garbage in, garbage out.

While it would be awesome if they did not require sorting, I imagine that would eat up a substantial amount of memory and time and I can certainly do that from now on, now that I know.

My feature requests are simply that as each read is read ...

  • Verify that the mate number from the sequence name is in the specified mate file.
  • Verify that the paired reads have the same sequence name, ignoring the mate number.

... and throw an error, or just a warning, stating such if they are not.

Discussion


Log in to post a comment.