#256 Bowtie2 slower after 2.0.0.beta6

open
nobody
bowtie (175)
5
2013-01-04
2012-11-05
Anonymous
No

Hello,

Starting with beta 7, it seems Bowtie2 analyses take a lot longer to complete; for example to screen three files representing 3.15GB of gz compressed data, it takes:
- v2.0.0-b6: 219 seconds
- v2.0.0-b7 to 2.0.0.2: 294 seconds

The command line I used to run these analyses is:
bowtie2 --local -k 1 -N 1 -L 20 -i S,1,0.75 --ignore-quals --ma 4 --mp 12,12 --score-min G,-20,34 -x Ref -U data -p 6 -t --no-unal -S file.sam

Is this a bug or is this to be expected given some of the changes made in beta 7?

Thanks,
Laurent

Discussion

  • Ben Langmead
    Ben Langmead
    2012-12-19

    Hi Laurent,

    It's not very surprising that there would be a difference, since changes were made in that release that affected the --local mode of Bowtie 2 in a non-trivial way. However, I'm surprised by how large the difference you report is. Can you tell us anything else about your data? Length distribution, % aligning, etc? Could you share it, even? I have not been able to recreate it on some HiSeq data.

    Best,
    Ben

     
  • Ben Langmead
    Ben Langmead
    2012-12-19

    • status: open --> pending
     
  • laurenta1
    laurenta1
    2013-01-04

    Hi Ben,

    Thank you for the reply.

    I have been setting up a pipeline for the reanalysis of the 1000G exome data for several multigenic families and I used Bowtie2 in several steps.

    The speed problem I have with versions >2.0.0-beta6 is in the first step, when I isolate reads specific to a gene family of interest amongst genome-wide reads: what is maybe unusual in this step is thus that only a very small fraction of the reads do align (<0.05% in the example below).

    In the subsequent steps of the pipeline, the proportion of reads that map is much higher and for these steps I noticed that the latest versions of Bowtie2 are faster than 2.0.0-beta6, and so I currently use 2.0.0-beta6 for the first step and 2.0.4 for the other steps.

    I conducted a couple of new tests on a single data file with versions 2.0.0-beta6 and 2.0.4 and the slowdown is still there (2.0.4 is almost twice as slow as 2.0.0-beta6 ); the results are, however, exactly the same and as follows:
    20482003 reads; of these:
    20482003 (100.00%) were unpaired; of these:
    20473517 (99.96%) aligned 0 times
    8486 (0.04%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
    0.04% overall alignment rate

    The command lines I used in these tests are given below, together with the gene sequence used to build the index; the reads are from the 1000G project (Illumina HiSeq 2000, 90bp) and the file I used in these tests for the reads is here:
    http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG01028/sequence_read/ERR047676_1.filt.fastq.gz

    I hope you can replicate the problem but let me know if you need more information!

    Best regards,
    Laurent

    -------------------------------------------
    Bowtie 2 version 2.0.0-beta6

    bowtie2-build Test.fna Test

    time bowtie2 --local -k 1 -N 1 -L 20 -i S,1,0.75 --ignore-quals --ma 4 --mp 12,12 --score-min G,-20,34 -x Test -U ERR047676_1.filt.fastq.gz -p 6 -t --no-unal -S Test.sam
    real 1m37.917s
    user 9m17.855s
    sys 0m1.376s

    time bowtie2 --local -k 1 -N 1 -L 20 -i S,1,0.75 --ignore-quals --ma 4 --mp 12,12 --score-min G,-20,34 -x Test -U ERR047676_1.filt.fastq.gz -p 6 --no-unal --al Test.fastq > /dev/null
    real 1m37.468s
    user 9m14.215s
    sys 0m1.452s

    ------------------------------------------------------------
    Bowtie 2 version 2.0.4

    bowtie2-build Test.fna Test

    time bowtie2 --local -k 1 -N 1 -L 20 -i S,1,0.75 --ignore-quals --ma 4 --mp 12,12 --score-min G,-20,34 -x Test -U ERR047676_1.filt.fastq.gz -p 6 -t --no-unal -S Test.sam
    real 3m43.622s
    user 12m24.331s
    sys 5m47.302s

    time bowtie2 --local -k 1 -N 1 -L 20 -i S,1,0.75 --ignore-quals --ma 4 --mp 12,12 --score-min G,-20,34 -x Test -U ERR047676_1.filt.fastq.gz -p 6 --no-unal --al Test.fastq > /dev/null
    real 3m37.580s
    user 13m52.744s
    sys 4m30.857s

    ---------------------------------------------------------------
    Sequence for the Index:

    >Test
    GTGACCAGTGGGATTCATGACAAGCAGCGGATGATAACCAGTCATCAAATAAATATCAAC
    TCCCTCCCCCACTCCCCAAATCAAAGCTCAAACATAAGTCATTGTTCTCAAAACGTTGAA
    CAGGGATTGAGGTGCAGAGGGATGGCCAAGTAAGCAAAGGGCACCGAGGAGGCAGGAAAG
    TCTCAGATGTTTGTTCCCAGCGGGTGGGAGTGGACACTGTAGCAAAATATTTTAAAAAGG
    GGAAGTTGAGAGGGGACTATTTGGTTGAAAGAAAACCCACAATCCAGTGTCAAGAAAGAA
    GTCAACTTTTCTTCCCCTATTTCCCTGCATTTCTCCTCTGTGCTCACTGCCACACGCAGC
    TCAACCTGAGCTACACAGCCAGATGCGAGATGCTTCTCTGCTGATCTGAGTCTGCCTGCA
    GCATGGACCTTGGTCTTCCCTGAAGCATCTCCAGGGCTGGAGGGACGACTGCCATGGTAA
    GGACCCCACAACGCTGAGCTGATGGATGGCTGAAGGAGGGAGGGTGACCATGTGGGAGGC
    TGTGAGAAGGAAAGGGAAGCCTCCGTTACCCTCATCTGGAAGGGCAGACGCAGAAAGCAC
    CAGTTCTATTTGCTGCTACATCCCGTCTCTCAGTGAGAAGAGGAGAAACCAGACAGACAG
    TGGCTGGGGGTCAGGAAAGACCCCATTACAGTCTGAAATGTCTGCAGAGGGCCTGGTTCC
    TGCCCCCACCTCAGCTCTAAAAGAATGAGAGTCAGGCTCCTGGTAGGGTAGTTCTGCTTC
    CTGTGTGGCTGCAGATGACAACACCCCATGAGAAGGACCCAGCCTCCGAGTGTCCACACT
    GGGTGGGAAGGAGGGGAGGCTATTTCTCTCTGTGTGTCTCTGTCCTGCCAGCACCGAGGG
    CTCATCCATCCGCAGAGCAGGGCAGTGGGAGGAGACGCTATGACCCCCATCGTCACAGTC
    CTGATCTGTCTCAGTGAGATTTGAAGAGGGAGGGGAGCTTCTAACCTAGGAGGGACCTCA
    CCCCACAGCCGACCTCTAGTCCCTAAGGAGACCCCAGGGGCTCACAAAGATCCCAGGGAG
    GGGAGGACCTGCCCAGGCTTCAGGGGCAAATTCCTCACAGGGAACTCTCTTCCAGGGCTG
    AGTCTGGGCCCCCGGACCCACGTGCAGGCAGGTGAGTCTGTCCCCAGCTCTCCCAGGTCC
    CTCCTCCTCACTGGGGACAAGGGGCCACCCCCGTGCAGCTGGGGATGGGGAATAGCAGTT
    CTGGACTGACTGATGGGGGCATCTGGAGGGTCCTGGGCTGAGAGCTGAGATATGTTGGGT
    GGGAAATGACTTAGAATCTGAACTCTGATTTCCTTCCAGGGACCCTCCCCAAGCCCACAC
    TCTGGGCTGAGCCAGGCTCTGTGATCACCCAGGGGAGTCCCGTGACCCTCTGGTGTCAGG
    GGATCCTGGAGACCCAGGAGTACCGTCTGTATAGAGAAAAGAAAACAGCACCCTGGATTA
    CACGGATCCCACAGGAGATTGTGAAGAAGGGCCAGTTCCCCATCCCATCCATCACCTGGG
    AACACACAGGGCGGTATCGCTGTTTCTACGGTAGCCACACTGCAGGCTGGTCAGAGCCCA
    GTGACCCCCTGGAGCTGGTGGTGACAGGTGAGCTGACACTGAGGGCTCCCAGCCCCAGGC
    TCTGCCCTCAGGAAGGGAGTCAGTTCTCAGGGGCATCTCCCTCTCACAGCCCAGCCCTGG
    GGATGAAGTGGGAGGTGTGAGCCCCATTTAACATGGTGCCTCCTTCTCTCCTAGGAGCCT
    ACATCAAACCCACCCTCTCAGCTCTACCCAGCCCTGTGGTGACCTCAGGAGGGAACGTGA
    CCCTCCATTGTGTCTCACAGGTGGCATTTGGCAGCTTCATTCTGTGTAAGGAAGGAGAAG
    ATGAACACCCACAATGCCTGAACTCACAGCCCCGTACCCATGGGTGGTCCCGGGCCATCT
    TCTCTGTGGGCCCCGTGAGCCCGAGTCGCAGGTGGTCGTACAGGTGCTATGCTTATGACT
    CGAACTCTCCCCATGTGTGGTCTCTACCCAGTGATCTCCTGGAGCTCCTGGTCCTAGGTG
    AGAAATTCACAGCATTGCCTGGAGTTCCCTGAGTCTCCCTGAGTCTCCAGGCAGGTGGGG
    AGCAGCCACGTCTCAGGGCAGCTCCAGGTGGGATGATGTTGGGGCGAGAGGGCTCAGGGC
    TCCTGGGGCCGGAGACACAGGAAGATCAGCAGTGGTGAGGCCCCGGGGGAGAGGGAGGAT
    ATGTGGGGAAGCCTGAGGGTCGGCTCCTGGAAACCATGAGCACCTTTTCCCAGGTGTTTC
    TAAGAAGCCATCACTCTCAGTGCAGCCAGGTCCTATAGTGGCCCCTGGGGAGAGCCTGAC
    CCTCCAGTGTGTTTCTGATGTCAGCTACGACAGATTTGTTCTGTATAAGGAGGGAGAACG
    TGACTTCCTCCAGCTCCCTGGCCCACAGCCCCAGGCTGGGCTCTCCCAGGCCAACTTCAC
    CCTGGGCCCTGTGAGCCGCTCCTACGGGGGCCAGTACAGATGCTCCGGTGCATACAACCT
    CTCCTCCGAGTGGTCGGCCCCCAGCGACCCCCTGGACATCCTGATCGCAGGTGAGGAGCC
    CAGCGGGTTCAGTCAGGGACACAGGCTCCGCACAGGCCCTGCCAGGGGAGCCCAGGTGGT
    GATGGCCGGAATGAGGGGTGGGGGTCCCAAGGGAGGGAGAGACAGACAGAGACAGGGGAT
    GGGCGGGGCGGGGAAGACTCAGAGAAAACAGAGATAGAGACTGAGGGTCCCAGATAGAAG
    CCTGGGGAGGCGTCAGCTCAGAACAAGGTGGGGCAGCCTCTCACCCATCCTTCTTCTCTC
    CAGGACAGTTCCGTGGCAGACCCTTCATCTCGGTGCATCCGGGCCCCACGGTGGCCTCAG
    GAGAGAACGTGACCCTGCTGTGTCAGTCATGGGGGCCGTTCCACACTTTCCTTCTGACCA
    AGGCGGGAGCAGCTGATGCCCCCCTCCGTCTCAGATCAATACACGAATATCCTAAGTACC
    AGGCTGAATTCCCTATGAGTCCTGTGACCTCAGCCCACTCGGGGACCTACAGGTGCTACG
    GCTCACTCAGCTCCAACCCCTACCTGCTGTCTCACCCCAGTGACTCCCTGGAGCTCATGG
    TCTCAGGTGAGGGCCCTGACCCTGTCCTCTCCGAGCTCAAAGGATCAGCTCAGGCCCTGC
    CCCCCAGGAGAGCTCTGGACACTAAGAAAAGAGGGGAGTTGTGGAGTGAAGGGGGAAGGT
    CTGCGGGGGAGGGTCGAGCCCATGGGAGGGTGGAAATAGACGGGGCCTCCCACCCCTGGC
    TCCCACCCTTGTAGTCTCAGTAGGGTAAAGAGCAGGGAAGGCTGGGAGGAGATGGGGGTG
    AACCTCAGAGGAGATGAGAGTAGACTGAGGGTGAAAGACAGAGGCCCCACCTGCTCCCCT
    CCTGATGTCTCCACCTCAGAATCAGAGCCTCTGGGGATCCCAACCTCTAAGTCCTGACCC
    CATGGGTGACAAAAACCCAGTCACTCCCAGCTCTAAAGAAGTTTCTAGACTCATCTCAAT
    GCTACCTCTAATATTCAGGGTCTGATTTCCAGGGCAGCAGAGGGGAGGGTGGACAGTAAG
    GGTGTGGTCTGCATGGCTTCCTGGTGCTCCAGGGATGGGGCAGGTGTTCCCTCCGTGGTG
    TTCAGAGGGGAGAGAGGTGTCTGAGGTTCAGCATTGATGAGTGGAGCAGCGGGGTCTTTC
    CCCCTCCCCGAGCAGGATTCCAGGAGACATCACCTCTGGTTGAGACTCTCCACTGTCTCA
    TGTACATAACAAAATCTCTCCAATTTTCTACTGAAAGCAACACGTGGCACAGCTCTGCAG
    GACCCCACACCCCGACCTTGTCCTGCAGGATGTGTGACGAGTAGAAGAGGGAGAACAGGT
    CGGGTCAGCAGGATTTGGGGTCCAGCCTGACTTGGACACGTGGAAGATGCTGGGGCTGAT
    GGAGGAGGAACAGAGGCGGGCGAGTTGGAAAGAGGACAGACAGACGGTCCCTTGGCAGCT
    CTCATTTCTCATTTCCAAGGGCCCCTGAGGATGAACCCCTCACCCACACCTGTAGGGTCC
    CTGGGCCATCTCAAGACAAGAGAGGAGGCCTTGGTGGGATCTGACTGTGATGAGGGTGAA
    GTCCACCCCGAGCAGAAATGAGTGATACACAACACGTGCTGTGAATAATTCCCTAACTTG
    CCAGGGAGCAAGTGCACGGCCCCTCCTTAGTCTCAGGGGTGCCCTGAGCCCAAGCCCACC
    AGGTGAGCACAGGAGGGGCCGTGTGAGCGGCACCCACAGCTGGAGTGCTTCTCTCTAAAG
    GAGCACGTTTTGGGTGGACTCAACCCTCACCACAGTCAGATCCCACCAAGGCTCTGTGCT
    CAGGGCACCCGGAGACTAAGGAGGGACCGTGCACCTGCTCCCTGGATGAGTTAGGGAATG
    ATTCACAGCACGTCTCATATGTCATTCATTTCTACACTGTATTTTCTGTACGTATGCTTT
    CTATATGTATGCTCTCTCCTTTACTAAAACTTTTAAAGCAATACTTCCTTTGCCTTTCAT
    TGGGGCTTGATTTAATTTATATATTCAATGTAGACATCCATTTTTCATTAACCTCAAGTC
    TTCCTCCTGTACATATTAAAAATCGAGGTTTCCTTAACGAACTTCAGAAATGTTTGGATG
    TTTCAAAGCACAATGGCCCGAGCGAAACTGACTGGGCGGCTCCCTGTGGCATGAGAAACC
    CGGGGGAGGTCAGCGGGAGCTACAGTGCAGCTCAGCCCTGGGCCTGGGGGGTTCATGCCC
    AACCTTGTCCAATCACTGGATAATTCTAACATCTAAATAAACGTCTTTTATATGAAAAAA
    GTGCTTTAAATTGTTAATTTAGATTTAAATTAGAACAGGGCAATTTGGTAGTGGGTTAAT
    ATGAAATACAATGAATATACCCAAACCAGTGGCTTTCTCATGAGTACTTATCTCTCGTTT
    TAAAAAATGTAAAGGAATCAAATACTTCACTTATAAATTGTTAAAGGTGTTGAATAATTC
    TTTAAATTAGAACGAATATAATTAAAAAAAATGAATATAATTTTTTAAATGTCTACCCAG
    GACACCCACCTCTCCTTGACAGGGAGGTTATATAAGTTATACATAATCTTATATAGAATT
    TGTTATATAAGTTACCTCTGAATATGTCTCTTCTCCTCTGTTTTGATTCTCAGGAGCAGC
    TGAGACCCTCAGCCCACCACAAAACAAGTCCGATTCCAAGGCTGGTGAGTGAGGAGATGC
    TTGCCGTGATGACGCTGGGCACAGAGGGTCAGGTCCTGTCAAGGGGAGCTGGGTGTCCTG
    GGTGGACATTTTAAAAAATTACATTCATTCTAATTTAAAGAATTCTTCAACACCTTTAAT
    GATTTATAAGTGAAGTATTTCATTCCTTTACATTTTTAAAATAAGAGATAACTATCCATG
    AGAAAGCTACTGCTTTGAGTATATTCATTGTATTTCATGCTAACTCACTACTAAATTGCT
    CTGTTCTAACTGCTTTTCAATGGATCTCCTCTAATTTACTAATACAATTTGTATAAACCG
    TAAGACAATGGGAAATTTTACTTCTTTATTTCTAAATTATGTGCCAATATTTCTCACTTT
    AAATGTCAATATATATGTTACTACATCTTAAATAAATATCTGAAGTTTTACATATATACA
    TATTTATGTGTGGTTAAGTAAGATACTTTTGAATATGTGTGTAAGTATATCCAAATTCAT
    TTGATGTATATTCATGCATATGCTTAATATATTTGATGCGGTAGGTGTTTACATGTTTGT
    TCCTGGTCTAGATTCACCTAGATTCACACTTCATAAAAACAAATACTGATTTATGAACCT
    TGAGTGACATCTCCATTTAGGGAAACTGTATTGCATTCTAGGTGTTACTGCCTATATGAG
    GAATTAGTTAACTAGAGATTAAATGGAAGATGAAACCCCAGGTGAACTGGCTGAGGCTGT
    GTGAAGAAGAAGCACCCCCAGACTTTCACCCCTTTGTGCTTCTGACACTGGGGAGCCCCT
    GCAGACCAACCTCCCATGCATGGAGCCTGGGTCCTCCGCTGGTGGATAAGTGAAACTCTC
    ATCTCTGGGGGAATTGGCTCATGTGCTCCTGTGTCCCTGGCTGCACAGACAGCGCACAGG
    GCTCAGTGACTTCTGTATTCCCTTGCAGATCCTGAGCTCCCAGAGTGCAGGAAAACGCCC
    TCCCCAAATGCCTCAGGAGTAACATTCGAATTTCTAGAATGCAAGAAATCTGAAATAATT
    CAATAAGGACACTGGAGGGAACCCTGCTACACAGGAAGGGTTTATTGAGGAACTCCCTAG
    AACTCATGTCAGAAGACATAGGGGAAGATAAGAATGCAGAGCCCAGGGGAGAGGCTGGCT
    CAGGGCTCTTCCCCTCTGTTTTTATTCTCAGGAGCAGCTAACACCCTCAGCCCATCACAA
    AACAAGACTGGTGAGTGAGGAGATGCTCTCGTTTACGGTGCTGGGCACAAGGGTTGGGTC
    CTGTCAAGGGTAAGGAGGTGCTCTGGGTGGACATCCAGAGGTCCTGGGTGAAGTTGATCT
    GCCCTGACCTCTGTGACCTCTTTGCCCACCATCCCCAGCCTCACACCCCCAGGATTACAC
    AGTGGAGAATCTCATCCGCATGGGCATAGCTGGCTTGGTCCTGGTGGTCCTCGGGATTCT
    GCTATTTGAGGCTCAGCACAGCCAGAGAAGCCTCTGAGATGCAGCCGGGAGGTGAACAGC
    AGAGAGAAGAATGTACCCTTCAGAGTGGTGGAGCCTTGGGAACAGATCTGATGATGCCAG
    GAGGTTCCGGGAGACAATTTAGGGCTGATGCTATCTGGACTGTCTGCCAATCATTTTTAG
    AGGGAGGAATCAGTGTTGGATTGCAGAGACATTTTCTGGAGTGATCCATGAAGGACCATT
    AACCTGTGATACCTTTCCTCTCTATTAATGTTGACTTCCCTTGGTTGGATCCTCTTCTTT
    CCCCACCCCCAGACAGACATGAGGCTACATCCCACATGGCAGCGTTGGGTCCACACCTCT
    GCACATCTGTGTGCTCTGGTCCATGGTGTGTAACACAGTCTTCTTTATTACTCATTGCCA
    TACTCCCTGGTGTGCTTTACTGAGCCTCCATCTCTTCAATTCAGAGTTCCAAACGTGCTT
    CAGTAACTAAATCAATGGGAGAGTATCGGATTTCAACCAGGAAAAGATAAATCCACCCTG
    ATGCCCTGACACCCTCTCTGAACCCTACGAGCCCTTCCCTCCTTCTCACATGCTACCTGT
    GCAGCTTCTCCTTAGATCAACGTCAAAAGCAATGATAGGCATTTGCAGTGTGTTGGTGAT
    CCACGAAAGGAAAATCACGGAAGCAGGATAGAAATCCAGCTGCAGACAAGACCTCAGGTC
    GATGAATCTTGACAAGCAGTTGAGCTGTTTTTTTCTACTCACCTAGGACAGTCAGGCAGA
    AGTATGCAAAATGACTGGGGCTGATTCTTTTCTG

     
  • laurenta1
    laurenta1
    2013-01-04

    • status: pending --> open