From: Ole K. T. <o.k...@bi...> - 2012-08-17 10:43:43
|
Hi Brian, I got some comments/questions/answers below. On 13 August 2012 04:34, Walenz, Brian <bw...@jc...> wrote: > Hi, Ole- > > Very sorry to hear. I've been stung by this a few times too. > > There is minimal support for non-innie oriented mates. The assembler was > developed with innie-oriented mates assumed, and there are still lots of > places where we make that assumption. In particular, finding evidence for > merging two scaffolds assumes innie oriented mates; computing gap sizes > based on mate pairs also does. Both explicitly exclude non-innie oriented > mates from contributing. > > The same issue comes up after classifyMates runs. We're left with a pile of > now outtie-oriented PE pairs that we can do nothing with. We thought about > updating the stores (reverse complementing the read), but as every overlap > involving these reads would need to be modified, we decided this was just > too risky. > > So, I'm sad to say, recomputing is the only real option. If it makes you > feel any better, I had to run overlaps on a big assembly three times because > our scratch disk policy is to delete files older than a week, and I kept > getting pulled away from it. I reran the entire run because I was a bit too eager and deleted the store before I was able to look into the issues you mentioned here. I've done it now though, and got some questions about them. > > You might be able to learn something from this run though. Bogart can't use > all 3.3tb of those overlaps, so maybe you can reduce the number of overlaps. The store is 2.5 TB, still too big I guess. I don't think I've seen this big a store before, the largest was about 1 TB (and approx. the same input data, about 51x coverage in Illumina reads and 26x in 454 reads). I have some files that number from 0001 to 0250 where the 0001 file is 5 GB and the 0250 file is 20 GB, so I guess that correct. And a ovs and idx in addition. > > Is the minimum overlap length too low? You could spot check some overlaps > to see what the longest overlap is. You might be able to get away with, > say, a minimum overlap length of 64 bases. How do I do this precisely? I tried running some commands like this: ~/src/wgs-August2/Linux-amd64/bin/overlapStore -p 375000001 51xillumina_26x454_bac-ends_bog.ovlStore 51xillumina_26x454_bac-ends_bog.gkpStore OBTINITIAL Output: 375000001 A: 1 0 ---------------------------------------------------------------------------------------------------> 288037584 A: 0 1975 ( -1) B: 124 2047 ( -1) 0.00% +124> +-2048 283419705 A: 0 1976 ( -1) B: 124 2047 ( -1) 3.45% +124> +-2048 Bus error (core dumped) ~/src/wgs-August2/Linux-amd64/bin/overlapStore -p 250000000 51xillumina_26x454_bac-ends_bog.ovlStore 51xillumina_26x454_bac-ends_bog.gkpStore OBTINITIAL Output: DUMPING PICTURE for ID 250000000 in store 51xillumina_26x454_bac-ends_bog.ovlStore (gkp 51xillumina_26x454_bac-ends_bog.gkpStore clear OBTINITIAL) 250000000 A: 1 0 ---------------------------------------------------------------------------------------------------> 362293433 A: 0 1912 ( -1) B: 80 2047 ( -1) 0.00% +80> +-2048 308391868 A: 0 1913 ( -1) B: 60 2047 ( -1) 0.00% +60> +-2048 117489298 A: 0 1914 ( -1) B: 54 2047 ( -1) 0.00% +54> +-2048 231078346 A: 0 1917 ( -1) B: 51 2047 ( -1) 0.00% +51> +-2048 92028512 A: 0 1922 ( -1) B: 37 2047 ( -1) 0.00% +37> +-2048 Bus error (core dumped) That does not look like what I expected (from http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=OvlStore). I also ran with this: ~/src/wgs-August2/Linux-amd64/bin/overlapStore -d 51xillumina_26x454_bac-ends_bog.ovlStore -b 375000000 -e 375000000 That gave med 5596 overlaps, so I guess it was a bad choice fragment. If I choose another fragment (454 shotgun read): ~/src/wgs-August2/Linux-amd64/bin/overlapStore -d 51xillumina_26x454_bac-ends_bog.ovlStore -b 400000001 -e 400000001 I get 153 overlaps, and some of them: 400000001 3314999 I 59 -104 0.00 0.00 400000001 4090114 I 42 -121 0.00 0.00 400000001 5386505 I 59 -104 0.00 0.00 400000001 9054281 I 171 8 2.17 2.17 400000001 25877453 I 11 -170 0.00 0.00 400000001 34584423 I 195 32 2.94 2.94 400000001 35861704 N 221 26 2.38 2.38 400000001 36059151 N 168 5 2.11 2.11 400000001 36573727 N 10 -184 0.00 0.00 400000001 37990829 N 175 12 1.14 1.14 400000001 39350033 I 59 -104 0.00 0.00 400000001 39934425 N 218 39 4.44 4.44 400000001 41436906 I 133 -46 0.00 0.00 400000001 41776173 N 168 5 2.11 2.11 400000001 42439341 I 211 45 1.92 1.92 400000001 42876704 N 189 26 1.35 1.35 400000001 44051983 I 103 -60 0.00 0.00 400000001 47862199 N 2 -184 0.00 0.00 400000001 48017281 N 202 39 3.28 3.28 400000001 48875374 N 168 5 2.11 2.11 400000001 51196018 I 130 -58 0.00 0.00 400000001 54522446 N 205 39 3.45 3.45 400000001 56546126 I 72 -91 0.00 0.00 400000001 56653818 N 89 -74 0.00 0.00 400000001 66204913 I 5 -158 0.00 0.00 400000001 67154193 I 171 8 2.17 2.17 <snip> 400000001 367185391 I 220 225 4.65 4.65 400000001 367771473 I 0 18 0.76 0.76 400000001 371107629 N -13 55 0.76 0.76 400000001 372107538 N 0 -31 0.00 0.00 400000001 372408998 N 152 347 0.90 0.90 400000001 377636043 N 0 54 0.38 0.38 400000001 377646282 N 0 53 0.38 0.38 400000001 377655226 N 99 72 0.61 0.61 400000001 377896151 N 0 -96 0.00 0.00 400000001 377911031 N 0 57 0.38 0.38 400000001 383651644 N 189 336 1.35 1.35 400000001 383688429 N 192 109 1.41 1.41 400000001 383754638 N 189 338 1.35 1.35 400000001 383766287 N 189 338 1.35 1.35 400000001 383932105 I 99 345 0.61 0.61 400000001 385033058 I 69 124 1.55 1.55 400000001 387893078 I 0 88 0.76 0.76 400000001 387946865 I 0 88 0.76 0.76 400000001 398390290 I 205 489 1.72 1.72 400000001 398406551 I 214 449 4.08 4.08 400000001 398998877 I 99 368 0.61 0.61 400000001 399078177 I 153 368 0.91 0.91 400000001 405995246 N 166 387 1.03 1.03 400000001 409651281 I 0 63 0.38 0.38 400000001 409656569 I 79 63 0.54 0.54 400000001 409813283 N 0 48 0.38 0.38 400000001 410422960 I 0 -95 0.00 0.00 But I can't see the overlap length from that can I? I guess I could get the read length for each read and test though. It seems that the error rate is mostly below 4 %, so that might not help much to set it to 4 % (just a little bit). For another read (Illumina from a MP library), 66 overlaps: ~/src/wgs-August2/Linux-amd64/bin/overlapStore -d 51xillumina_26x454_bac-ends_bog.ovlStore -b 40000000 -e 40000000 40000000 18888403 N 3 7 0.00 0.00 40000000 19154662 I -22 -37 3.39 3.39 40000000 42344478 N 15 7 0.00 0.00 40000000 53346801 I -51 -50 4.35 4.35 40000000 61865542 N 49 53 0.00 0.00 40000000 64381669 N 0 -7 2.25 2.25 40000000 65643742 N 41 26 0.00 0.00 40000000 66612098 I -53 -54 4.76 4.76 40000000 69287018 N -3 0 0.00 0.00 40000000 70886837 I -55 -51 4.44 4.44 40000000 73035047 N -46 -42 3.70 3.70 40000000 77363047 I 48 52 0.00 0.00 40000000 83770223 N 3 7 0.00 0.00 40000000 92992529 I -53 -49 4.26 4.26 40000000 93109879 N -25 -21 0.00 0.00 <snip> 40000000 317454557 N -121 -32 0.00 0.00 40000000 320220222 N -4 6 2.11 2.11 40000000 323660284 I -22 55 0.00 0.00 40000000 330106339 N -128 -47 0.00 0.00 40000000 330437997 I -48 37 2.08 2.08 40000000 333633813 I -4 31 0.00 0.00 40000000 350457623 N 34 139 0.00 0.00 40000000 366189823 I -10 -26 2.86 2.86 40000000 373715828 I -61 -48 0.00 0.00 40000000 380074140 I -51 -15 0.00 0.00 40000000 380125547 I -9 -9 2.30 2.30 40000000 383202147 N -18 -4 0.00 0.00 40000000 383480660 N 34 27 0.00 0.00 40000000 392470256 N -26 -31 3.08 3.08 40000000 395518457 N -2 -5 0.00 0.00 40000000 409319564 I -26 -54 4.76 4.76 There seems to be a lot of 0 % error overlaps (but I don't know the length). Can I see from bogart's output how much of the overlaps it would have loaded? Would it have loaded all 2.5 TB? I have 410,962,052 reads in total, and bogart says 21,884,230,828 overlaps. If I do chose to rerun overlaps, will it run faster with more stringent options? (4 % errors, 50 overlap length or maybe 64). Thank you for your help, once again. Ole > > Is the error rate too high? Again spot checking, are there reads with no > low-error overlaps? Maybe you can get away with only 4% error. > > b > > > > On 8/10/12 3:18 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > >> Hi, >> I ran classify on an Illumina mate pair library, and managed to use >> one of the old versions of gatekeeper to dump the reads, so I guess >> they were dumped as innie reads. I thought the library still was >> outtie, and input that into an assembly. Now, after finishing >> overlapper (using grid and grid version of overlapStoreBuild) I have a >> ovlStore of 3.3 TB, so I'd rather not run that again if I can avoid >> it. >> >> I see from this page: >> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Gatekeeper >> #Library >> that there are some options in changing orientation of the library, >> but only "innie" is supported it says. Do you have any suggestions of >> what I can do? Would it not work changing the library to "outtie"? >> >> Thank you. >> >> Ole >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |