[maq-announce] bug in bwa-0.4.1 and the bug fix
Status: Beta
Brought to you by:
lh3lh3
From: Heng Li <lh...@sa...> - 2009-01-09 17:13:06
|
Hello, Aaron Quinlan has just found a critical bug in bwa's indexer: it segfaults if there are no comment texts in the FASTA header line. I have fixed the bug and put a new version on the sourceforge. Nothing else was changed. Sorry for this obvious bug. As a compensate, I am forwarding an email originally sent to the 1000Genomes Project mailing list. It evaluates how different insert sizes and read lengths may affect the fraction of genome that is sequence-able with short PE reads. This is done by bwa. Best wishes, Heng Begin forwarded message: > I have simulated read pairs of different lengths/insert sizes and > mapped them back to the human genome to see the effect of these > factors on the accessibility of the human genome. The basic message > is that in comparison to having longer insert size, having longer > read length help to access higher fraction of the human genome with > short reads. For example, given 50bp reads, 200+/-20 insert size > (50@200+/-20), about 95% of genomes can be covered by uniquely > mapped read pairs. This percentage increases to 96% given > 50@10000+/-100 PE reads. In contrast, 32@500+/-50 PE reads can > access 91% of the genome, while 125@500+/-50 access nearly 98%. > > Detailed numbers are attached. Each data line shows mapping quality > interval (06x stands for 60-69), # wrong alignments, # alignments in > the interval, cumulative # alignments, cumulative alignment error > rate. > > regards, > > Heng > > Procedure > --------- > > In simulation, I first simulated a diploid genome with 0.09% SNP > rates and 0.01% indel rates. About 1,854,000 PE reads (not 2 million > because the simulator may generates a read full of N) were randomly > generated from this diploid genome with a uniform 1.5% base error > rate. These reads were then mapped back to the human reference > genome with bwa. As the simulator coded the true alignment > coordinates in the read names, the alignment error rate can be > evaluated. Bwa aligner achieves similar accuracy to maq <http://maq.sourceforge.net/bwa-man.shtml#9 > >. Doing the evaluation with maq would lead to a similar conclusion. > Using other aligners may be different. > > Results > ------- > > :::::::::::::: > 50bp PE reads; 200+/-20bp insert size > :::::::::::::: > 06x 6 / 1535520 1535520 3.907e-06 > 05x 0 / 1188 1536708 3.904e-06 > 04x 2 / 15234 1551942 5.155e-06 > 03x 227 / 115113 1667055 1.410e-04 > 02x 1816 / 81863 1748918 1.173e-03 > 01x 38 / 4518 1753436 1.191e-03 > 00x 60137 / 98102 1851538 3.361e-02 > :::::::::::::: > 50bp PE reads; 500+/-50bp insert size > :::::::::::::: > 06x 4 / 1528486 1528486 2.617e-06 > 05x 0 / 1128 1529614 2.615e-06 > 04x 0 / 12958 1542572 2.593e-06 > 03x 334 / 131015 1673587 2.020e-04 > 02x 1723 / 83586 1757173 1.173e-03 > 01x 44 / 4156 1761329 1.195e-03 > 00x 56826 / 90559 1851888 3.182e-02 > :::::::::::::: > 50bp PE reads; 3000+/-300bp insert size > :::::::::::::: > 06x 12 / 1513600 1513600 7.928e-06 > 05x 0 / 1162 1514762 7.922e-06 > 04x 3 / 10622 1525384 9.834e-06 > 03x 572 / 151097 1676481 3.501e-04 > 02x 1503 / 91113 1767594 1.182e-03 > 01x 33 / 3373 1770967 1.199e-03 > 00x 50185 / 79209 1850176 2.827e-02 > :::::::::::::: > 50bp PE reads; 10000+/-1000bp insert size > :::::::::::::: > 06x 16 / 1505430 1505430 1.063e-05 > 05x 0 / 1140 1506570 1.062e-05 > 04x 1 / 9606 1516176 1.121e-05 > 03x 663 / 163353 1679529 4.049e-04 > 02x 1447 / 95911 1775440 1.198e-03 > 01x 13 / 2700 1778140 1.204e-03 > 00x 45914 / 72701 1850841 2.596e-02 > > > ******************************************************** > ******************************************************** > > :::::::::::::: > 32bp reads; 500+/-50 > :::::::::::::: > 06x 8 / 1313464 1313464 6.091e-06 > 05x 0 / 7402 1320866 6.057e-06 > 04x 2 / 25204 1346070 7.429e-06 > 03x 399 / 176376 1522446 2.686e-04 > 02x 2726 / 168809 1691255 1.854e-03 > 01x 52 / 4955 1696210 1.879e-03 > 00x 99521 / 155509 1851719 5.547e-02 > :::::::::::::: > 50bp reads; 500+/-50 > :::::::::::::: > 06x 4 / 1528486 1528486 2.617e-06 > 05x 0 / 1128 1529614 2.615e-06 > 04x 0 / 12958 1542572 2.593e-06 > 03x 334 / 131015 1673587 2.020e-04 > 02x 1723 / 83586 1757173 1.173e-03 > 01x 44 / 4156 1761329 1.195e-03 > 00x 56826 / 90559 1851888 3.182e-02 > :::::::::::::: > 70bp reads; 500+/-50 > :::::::::::::: > 06x 1 / 1637726 1637726 6.106e-07 > 05x 0 / 390 1638116 6.105e-07 > 04x 0 / 7698 1645814 6.076e-07 > 03x 272 / 93750 1739564 1.569e-04 > 02x 985 / 45099 1784663 7.049e-04 > 01x 26 / 3587 1788250 7.180e-04 > 00x 38266 / 64061 1852311 2.135e-02 > :::::::::::::: > 125bp reads; 500+/-50 > :::::::::::::: > 06x 4 / 1711262 1711262 2.337e-06 > 05x 0 / 118 1711380 2.337e-06 > 04x 0 / 4390 1715770 2.331e-06 > 03x 109 / 72593 1788363 6.319e-05 > 02x 381 / 21586 1809949 2.729e-04 > 01x 29 / 2950 1812899 2.885e-04 > 00x 22920 / 40478 1853377 1.265e-02 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |