Lately I have been experiencing very slow mapping speeds with Bowtie 2 against a genome containing many ‘Ns’ and was wondering if anyone has experienced the same or know a solution to this.
I have generated some mouse strain genomes containing Ns at known SNP positions to the Black6 reference genome. Running standard 50bp paired-end alignments on 10 cores took more than 10 days to complete for ~200M sequence pairs, which doesn’t sound that it is meant like that. I have since tested a few things, such as the latest version (2.2.0) or the previous one (2.2.1) but that’s not the issue. Also reducing the --score-min parameter didn’t speed it up noticeably. I then took 1M test reads and aligned them to the following 3 genomes:
1) Genome containing 18M Ns, time: ~2h
2) Genome containing 4M Ns, time: 30 mins
3) Black6 reference genome, time : 2 mins
I have noticed that the 3rd index file increased in size from 5858 bytes for Black6 to 156842630 bytes for the N-strain. Is this the index file describing the position of Ns?
Do I just have to accept that Ns in the genome slow Bowtie 2 down >50-fold or is there any known cure for this?
Hi Felix,
A couple questions.
Are the Ns distributed fairly evenly across the genome?
Can we get a copy of the FASTA file you're using?
Best,
Ben
Hi Ben,
Thanks for getting back to me so quickly. Yes I suppose the Ns should be distributed evenly over all chromosomes but chrY. I have uploaded the FastA files as well as the *bt2 index files to the following server:
Connection Details
Hostname ftp2.babraham.ac.uk
Username ftpusr92
Password vZ4QtKE1
FTP URL ftp://ftpusr92:vZ4QtKE1@ftp2.babraham.ac.uk
I have run another test on 5M reads today and it took 9h 13 mins using 10 cores, which is pretty consistent with the 1M test sequences and close to 2h I had seen from a different file yesterday.
I already got a suggestion on SeqAnswers to use BBmap instead because it is much better and faster than every other tool on the planet, did you get a chance to look at these claims?
Again, many thanks,
Best,
Felix
From: Ben Langmead [ben_langmead@users.sf.net]
Sent: 26 February 2014 20:28
To: [bowtie-bio:bugs]
Subject: [bowtie-bio:bugs] #302 Bowtie2 alignment speed very slow for many Ns in the reference sequence
Hi Felix,
A couple questions.
Are the Ns distributed fairly evenly across the genome?
Can we get a copy of the FASTA file you're using?
Best,
Ben
[bugs:#302]http://sourceforge.net/p/bowtie-bio/bugs/302/ Bowtie2 alignment speed very slow for many Ns in the reference sequence
Status: open
Created: Wed Feb 26, 2014 01:50 PM UTC by Felix Krueger
Last Updated: Wed Feb 26, 2014 01:50 PM UTC
Owner: nobody
Lately I have been experiencing very slow mapping speeds with Bowtie 2 against a genome containing many ‘Ns’ and was wondering if anyone has experienced the same or know a solution to this.
I have generated some mouse strain genomes containing Ns at known SNP positions to the Black6 reference genome. Running standard 50bp paired-end alignments on 10 cores took more than 10 days to complete for ~200M sequence pairs, which doesn’t sound that it is meant like that. I have since tested a few things, such as the latest version (2.2.0) or the previous one (2.2.1) but that’s not the issue. Also reducing the --score-min parameter didn’t speed it up noticeably. I then took 1M test reads and aligned them to the following 3 genomes:
1) Genome containing 18M Ns, time: ~2h
2) Genome containing 4M Ns, time: 30 mins
3) Black6 reference genome, time : 2 mins
I have noticed that the 3rd index file increased in size from 5858 bytes for Black6 to 156842630 bytes for the N-strain. Is this the index file describing the position of Ns?
Do I just have to accept that Ns in the genome slow Bowtie 2 down >50-fold or is there any known cure for this?
Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/bowtie-bio/bugs/302/
To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.ukhttp://www.babraham.ac.uk/terms
Related
Bugs:
#302Fixed for the next release.