Menu

#302 Bowtie2 alignment speed very slow for many Ns in the reference sequence

v0.9.0
closed
nobody
None
5
2014-04-08
2014-02-26
No

Lately I have been experiencing very slow mapping speeds with Bowtie 2 against a genome containing many ‘Ns’ and was wondering if anyone has experienced the same or know a solution to this.

I have generated some mouse strain genomes containing Ns at known SNP positions to the Black6 reference genome. Running standard 50bp paired-end alignments on 10 cores took more than 10 days to complete for ~200M sequence pairs, which doesn’t sound that it is meant like that. I have since tested a few things, such as the latest version (2.2.0) or the previous one (2.2.1) but that’s not the issue. Also reducing the --score-min parameter didn’t speed it up noticeably. I then took 1M test reads and aligned them to the following 3 genomes:

1) Genome containing 18M Ns, time: ~2h
2) Genome containing 4M Ns, time: 30 mins
3) Black6 reference genome, time : 2 mins

I have noticed that the 3rd index file increased in size from 5858 bytes for Black6 to 156842630 bytes for the N-strain. Is this the index file describing the position of Ns?
Do I just have to accept that Ns in the genome slow Bowtie 2 down >50-fold or is there any known cure for this?

Related

Bugs: #302

Discussion

  • Ben Langmead

    Ben Langmead - 2014-02-26

    Hi Felix,

    A couple questions.

    Are the Ns distributed fairly evenly across the genome?

    Can we get a copy of the FASTA file you're using?

    Best,
    Ben

     
    • Felix Krueger

      Felix Krueger - 2014-02-26

      Hi Ben,

      Thanks for getting back to me so quickly. Yes I suppose the Ns should be distributed evenly over all chromosomes but chrY. I have uploaded the FastA files as well as the *bt2 index files to the following server:

      Connection Details

      Hostname ftp2.babraham.ac.uk
      Username ftpusr92
      Password vZ4QtKE1
      FTP URL ftp://ftpusr92:vZ4QtKE1@ftp2.babraham.ac.uk

      I have run another test on 5M reads today and it took 9h 13 mins using 10 cores, which is pretty consistent with the 1M test sequences and close to 2h I had seen from a different file yesterday.
      I already got a suggestion on SeqAnswers to use BBmap instead because it is much better and faster than every other tool on the planet, did you get a chance to look at these claims?

      Again, many thanks,
      Best,
      Felix


      From: Ben Langmead [ben_langmead@users.sf.net]
      Sent: 26 February 2014 20:28
      To: [bowtie-bio:bugs]
      Subject: [bowtie-bio:bugs] #302 Bowtie2 alignment speed very slow for many Ns in the reference sequence

      Hi Felix,

      A couple questions.

      Are the Ns distributed fairly evenly across the genome?

      Can we get a copy of the FASTA file you're using?

      Best,
      Ben


      [bugs:#302]http://sourceforge.net/p/bowtie-bio/bugs/302/ Bowtie2 alignment speed very slow for many Ns in the reference sequence

      Status: open
      Created: Wed Feb 26, 2014 01:50 PM UTC by Felix Krueger
      Last Updated: Wed Feb 26, 2014 01:50 PM UTC
      Owner: nobody

      Lately I have been experiencing very slow mapping speeds with Bowtie 2 against a genome containing many ‘Ns’ and was wondering if anyone has experienced the same or know a solution to this.

      I have generated some mouse strain genomes containing Ns at known SNP positions to the Black6 reference genome. Running standard 50bp paired-end alignments on 10 cores took more than 10 days to complete for ~200M sequence pairs, which doesn’t sound that it is meant like that. I have since tested a few things, such as the latest version (2.2.0) or the previous one (2.2.1) but that’s not the issue. Also reducing the --score-min parameter didn’t speed it up noticeably. I then took 1M test reads and aligned them to the following 3 genomes:

      1) Genome containing 18M Ns, time: ~2h
      2) Genome containing 4M Ns, time: 30 mins
      3) Black6 reference genome, time : 2 mins

      I have noticed that the 3rd index file increased in size from 5858 bytes for Black6 to 156842630 bytes for the N-strain. Is this the index file describing the position of Ns?
      Do I just have to accept that Ns in the genome slow Bowtie 2 down >50-fold or is there any known cure for this?


      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/bowtie-bio/bugs/302/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
      The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
      The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms

       

      Related

      Bugs: #302

  • Val

    Val - 2014-04-08
    • status: open --> closed
     
  • Val

    Val - 2014-04-08

    Fixed for the next release.

     

Log in to post a comment.