From: Walenz, B. <bw...@jc...> - 2012-11-02 19:23:48
|
On 11/1/12 2:07 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > On 1 November 2012 17:42, Quan, Xueping <x....@im...> wrote: >> My assembly is now in the unitigger stage. In the 4-unitigger/, I saw three >> files: >> 3.5M 2012-10-30 03:34 HX.001.bestoverlapgraph.log >> 12G 2012-10-29 11:41 HX.fragmentInfo >> 75K 2012-10-29 22:32 unitigger.err >> >> The bestoverlapgraph.log stopped updating for nearly two days now while the >> unitigger (bogart) keep running occupying 100%cpu and 624gb memory but >> producing no result. >> Below is the last few lines of the HX.001.bestoverlapgraph.log >> "BestOverlapGraph()-- frag 1005759619 is suspicious (174 overlaps). >> BestOverlapGraph()-- frag 1005759692 is suspicious (217 overlaps). >> BestOverlapGraph()-- frag 1005759967 is suspicious (219 overlaps). >> BestOverlapGraph()-- frag 1005760214 is suspicious (22 overlaps). >> BestOverlapGraph()-- fra" >> Was there something going wrong and bogart not really running (though seems >> running in top command) or it is working but not no output yet. If it is the >> second case, how long usually unitigger take to finish or output further >> result? > > I'm not completely certain, but I think it can run for a bit some > times. Bogart can use quite a long time, a couple of weeks depending > on your amount of data and genome. It seems that you are using CA 7.0, > if you use the CVS version you can take advantage of all those 64 CPUs > you have, and complete the bogart stage much, much quicker than if you > use only one CPU (which is the case for CA 7.0). > > Ole The step after this is to examine those 625gb of overlaps and pick out the best for each end of each read. There isn't any logging until it finishes this, at which time it will dump the overlaps to 'best.contains', 'best.edges', and 'best.singletons' files. I don't remember this taking days to finish though. As for speed, on a large 3gb fish with ~ 1.5 billion reads, bogart took around 10 days. With the parallelization it was down to less than 2. If you do choose to restart with the CVS version, you can build the overlap graph on disk, then load it for the unitig computation. Building the overlap graph is LOTS of I/O and is the slower part. Add option '-create' to the bogart command. It will stop after the graph is built. Then you can restart bogart (without -create) to do the computation. b |