Menu

#304 Uniqueness miscomputed by Ns in Bowtie 1.0.0

v0.9.0
open
None
5
2015-01-16
2014-03-19
No

Hi Guys,

We are trying to use bowtie 1.0.0 to evaluate the uniqueness of the genome, by extracting sequences, mapping them back to the full genome with -m1, and checking to see if there is a unique alignment. Overall, this works extremely well, but there are a small number of cases where the sequence does not map back to where it came, especially when the sequence contains 'N's.

For example:

Extract sequences from chr21

$ cat weird.bed
chr21 9775338 9775439
chr21 9775338 9775437

$ fastaFromBed -fi /isilon/seq/schatz/tgarvin/genomes/human/hg19/hg19.fa -bed weird.bed -fo weird.fa

$ cat weird.fa

chr21:9775338-9775439
tataagtgagaagatgcagtgtttggttttttcttcctgcattagtttgctgaagatatcagctttgggttcatccatatccctgcaaagagcatgatcNN
chr21:9775338-9775437
tataagtgagaagatgcagtgtttggttttttcttcctgcattagtttgctgaagatatcagctttgggttcatccatatccctgcaaagagcatgatc

(Notice the second is 2 bp shorter excluding the Ns)

$ /bluearc/data/schatz/software/packages/bowtie-1.0.0/bowtie --chunkmbs=200 /seq/schatz/tgarvin/genomes/human/hg19/hg19.fa -f -S -t -m1 --best --strata -p4 weird.fa

...

chr21:9775338-9775437 4 * 0 0 * * 0 0 TATAAGTGAGAAGATGCAGTGTTTGGTTTTTTCTTCCTGCATTAGTTTGCTGAAGATATCAGCTTTGGGTTCATCCATATCCCTGCAAAGAGCATGATC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XM:i:1
chr21:9775338-9775439 0 chr1 143384489 255 101M * 0 0 TATAAGTGAGAAGATGCAGTGTTTGGTTTTTTCTTCCTGCATTAGTTTGCTGAAGATATCAGCTTTGGGTTCATCCATATCCCTGCAAAGAGCATGATCNN IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:99T0C0 NM:i:2

Notice the read without Ns is correctly flagged as non-unique, but with Ns is flagged as unique with edit distance of 2 to chromosome 1!

For our application we know to look for this now, but seems like this could be a general problem.

Let us know if you need any more details or examples.

Thank you,

Mike

Related

Bugs: #304

Discussion

  • Val

    Val - 2014-03-19

    Hi Michael,

    I am investigating now another N related bug. So good to know about this issue as well. If there is not much trouble for you I could use one more example.

    thanks again,
    Val

     
    • Michael Schatz

      Michael Schatz - 2014-03-19

      Hi Val,

      Thanks for working on this. Here is another example where a sequence on
      chr21 is "uniquely" mapped to chr17. It looks like it does that right thing
      when there is 1 N or none, the error only occurs if there are 2 Ns. I
      havent tried any more than this though.

      $ cat weird.bed
      chr21 33157053 33157154
      chr21 33157054 33157154
      chr21 33157055 33157154

      $ fastaFromBed -fi /isilon/seq/schatz/tgarvin/genomes/human/hg19/hg19.fa
      -bed weird.bed -fo weird.fa

      $ cat weird.fa

      chr21:33157053-33157154
      nnaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccatcaccaccaccaccaccaccaccaccaccaccaccaccaccacc
      chr21:33157054-33157154
      naccaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccatcaccaccaccaccaccaccaccaccaccaccaccaccaccacc
      chr21:33157055-33157154
      accaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccaccatcaccaccaccaccaccaccaccaccaccaccaccaccaccacc

      $ /bluearc/data/schatz/software/packages/bowtie-1.0.0/bowtie --chunkmbs=200
      /seq/schatz/tgarvin/genomes/human/hg19/hg19.fa -f -S -t -m1 --best --strata
      -p4 weird.fa

      ...

      chr21:33157055-33157154 4 * 0 0 * 0 0
      ACCACCACCACCACCACCACCACCACCACCACCACCACCACCACCACCACCACCATCACCACCACCACCACCACCACCACCACCACCACCACCACCACC
      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
      XM:i:1
      chr21:33157054-33157154 4 * 0 0 *
      0 0
      NACCACCACCACCACCACCACCACCACCACCACCACCACCACCACCACCACCACCATCACCACCACCACCACCACCACCACCACCACCACCACCACCACC
      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
      XM:i:1
      chr21:33157053-33157154 16 chr17 64795043 255 101M * 0 0
      GGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGATGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTNN
      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
      XA:i:2 MD:Z:99G0G0 NM:i:2

      Good luck!

      Mike

      On Wed, Mar 19, 2014 at 10:16 AM, Val valduboisvert@users.sf.net wrote:

      Hi Michael,

      I am investigating now another N related bug. So good to know about this
      issue as well. If there is not much trouble for you I could use one more
      example.

      thanks again,
      Val


      Status: open
      Group: v0.9.0
      Created: Wed Mar 19, 2014 05:35 AM UTC by Michael Schatz
      Last Updated: Wed Mar 19, 2014 05:35 AM UTC
      Owner: Ben Langmead

      Hi Guys,

      We are trying to use bowtie 1.0.0 to evaluate the uniqueness of the
      genome, by extracting sequences, mapping them back to the full genome with
      -m1, and checking to see if there is a unique alignment. Overall, this
      works extremely well, but there are a small number of cases where the
      sequence does not map back to where it came, especially when the sequence
      contains 'N's.

      For example:

      Extract sequences from chr21

      $ cat weird.bed
      chr21 9775338 9775439
      chr21 9775338 9775437

      $ fastaFromBed -fi /isilon/seq/schatz/tgarvin/genomes/human/hg19/hg19.fa
      -bed weird.bed -fo weird.fa

      $ cat weird.fa

      chr21:9775338-9775439

      tataagtgagaagatgcagtgtttggttttttcttcctgcattagtttgctgaagatatcagctttgggttcatccatatccctgcaaagagcatgatcNN
      chr21:9775338-9775437

      tataagtgagaagatgcagtgtttggttttttcttcctgcattagtttgctgaagatatcagctttgggttcatccatatccctgcaaagagcatgatc

      (Notice the second is 2 bp shorter excluding the Ns)

      $ /bluearc/data/schatz/software/packages/bowtie-1.0.0/bowtie
      --chunkmbs=200 /seq/schatz/tgarvin/genomes/human/hg19/hg19.fa -f -S -t -m1
      --best --strata -p4 weird.fa

      ...

      chr21:9775338-9775437 4 * 0 0 * * 0 0
      TATAAGTGAGAAGATGCAGTGTTTGGTTTTTTCTTCCTGCATTAGTTTGCTGAAGATATCAGCTTTGGGTTCATCCATATCCCTGCAAAGAGCATGATC
      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
      XM:i:1
      chr21:9775338-9775439 0 chr1 143384489 255 101M * 0 0
      TATAAGTGAGAAGATGCAGTGTTTGGTTTTTTCTTCCTGCATTAGTTTGCTGAAGATATCAGCTTTGGGTTCATCCATATCCCTGCAAAGAGCATGATCNN
      IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
      XA:i:0 MD:Z:99T0C0 NM:i:2

      Notice the read without Ns is correctly flagged as non-unique, but with Ns
      is flagged as unique with edit distance of 2 to chromosome 1!

      For our application we know to look for this now, but seems like this
      could be a general problem.

      Let us know if you need any more details or examples.

      Thank you,

      Mike

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bowtie-bio/bugs/304/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #304

  • Val

    Val - 2014-03-19

    Thank you Michael. I will look into this.

    Val

     

Log in to post a comment.