From: Dan B. <dan...@gm...> - 2009-05-29 14:38:05
|
Hi, I have been exploring Bambus because it could be very useful for some of the assembly problems that we have in the lab. To do various experiments, I have created a simple dummy input file (included below). The input defines some links between 5 contigs. Each contig is 10 kb, and each link is aligned to a 1000 bp region near the end of each contig. The contigs are ordered from 1 to 5, and all the links are 'perfect', having the right length and the right orientation. I created this input in GDE (.contig) format, as defined here: http://www.cbcb.umd.edu/research/contig_representation.shtml#contig and discussed here: http://amos.sourceforge.net/docs/bambus/Manual.html#inputopt http://amos.sourceforge.net/docs/bambus/Manual.html#asmout So far so good. Passing this file to Bambus (along with the mates and .config file, also given below) creates the correct scaffold: contig 1 to 5 in order, with no invalid or unused links. Now, I decided to add some 'experimental noise' into the data. Of the 12 'good' links in the data, I took one mate-pair linking contig 1 and 2 and duplicated the reverse read onto contig 4 (see Input 2 below). In the details file this had the effect of creating one massive 'insert': Valid: library lib_c: lid5.f 8901 9900 ---> ... 21899 ... <--- 101 1100 lid5.r which is clearly out side the range of the allowed values for the library in the mates file. None the less, it is listed as a valid link. I ran *untangle* on the scaffold, hoping that the lone, inconsistent, length invalid link would be removed, but it was not. If I repeat the experiment but instead link contig 1 to contig 5 (via the same method), untangle throws out contig 2, and all 6 of its aligned mates (6 good links), and gives me contigs in the order 1 -> 5 -> 3 -> 4, despite there being no links between contig 3 and 5. All of this just because of one incorrect link out of 12. If I put the bad link on contig 3, and reverse the orientation (see Input 3 below), the scaffold is broken beyond all recognition, before and after running 'untangle'. Is this an error in my input, or an error somewhere in Bambus? Thanks very much for any feedback on this issue, I appreciate it's quite technical, but its very important to us to understand how Bambus behaves, and how to regulate that behaviour. Cheers, Dan. The 'perfect' input looks like this: ##cid1 3 9900 bases, 00000000 checksum. #lid5.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid11.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid12.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> ##cid2 6 9900 bases, 00000000 checksum. #lid1.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid3.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid5.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid8.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid11.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid12.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> ##cid3 5 9900 bases, 00000000 checksum. #lid1.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid3.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid7.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid8.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid9.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> ##cid4 6 9900 bases, 00000000 checksum. #lid2.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid4.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid6.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid7.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid9.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid10.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> ##cid5 4 1100 bases, 00000000 checksum. #lid2.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid4.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid6.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid10.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> With this mates file: library c 1500 2500 (lid).* pair (.*)\.f$ (.*)\.r$ and this bambus.config file: # Priorities priority c 1 # Redundancies redundancy c 1 # Link size error #error c 0.05 # Overlapping contigs allowed? overlaps c N # Global redundancy redundancy 1 # Minimum scaffold size mingroupsize 100 Input 2 ##cid1 3 9900 bases, 00000000 checksum. #lid5.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid11.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid12.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> ##cid2 6 9900 bases, 00000000 checksum. #lid1.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid3.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid5.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid8.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid11.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid12.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> ##cid3 5 9900 bases, 00000000 checksum. #lid1.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid3.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid7.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid8.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid9.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> ##cid4 7 9900 bases, 00000000 checksum. #lid2.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid4.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid6.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid7.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid9.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid10.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid5.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> ##cid5 4 1100 bases, 00000000 checksum. #lid2.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid4.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid6.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid10.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> Input 3 ##cid1 3 9900 bases, 00000000 checksum. #lid5.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid11.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid12.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> ##cid2 6 9900 bases, 00000000 checksum. #lid1.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid3.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid5.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid8.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid11.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid12.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> ##cid3 5 9900 bases, 00000000 checksum. #lid1.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid3.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid7.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid8.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid9.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid5.f(8900) [RC] 1000 bases, 00000000 checksum. {1000 1} <8901 9900> ##cid4 6 9900 bases, 00000000 checksum. #lid2.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid4.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid6.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> #lid7.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid9.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid10.f(8900) [] 1000 bases, 00000000 checksum. {1 1000} <8901 9900> ##cid5 4 1100 bases, 00000000 checksum. #lid2.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid4.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid6.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> #lid10.r(100) [RC] 1000 bases, 00000000 checksum. {1000 1} <101 1100> |