From: Dan B. <dan...@gm...> - 2010-01-18 16:43:37
|
2010/1/6 Adam Phillippy <aph...@gm...> > Hi Dan, > There's a #define for increasing the memory limit. Mummer will definately > be faster. > > > make clean > > make CPPFLAGS="-O3 -DSIXTYFOURBITS" > > For big assemblies, there are a couple tricks I use. 1) split the contigs > up into batches and run the alignments in parallel on a cluster, this will > also help with the memory overhead Regarding point 1, should I be splitting the reference contigs, the query contigs, or both? So far I'm trying splitting the query contigs using 'fastasplitn', but I'm not sure if this is helping to reduce the memory usage of the job. Cheers, Dan. > , 2) increase the minimum match length and cluster size, especially for > very similar assemblies, 3) turn on a uniqueness flag for mummer (e.g. -mum) > to ignore the repeats, dnadiff uses -maxmatch by default. > > One or a combination of those will get the job done. Depends on the genomes > you are comparing and if you want to see the repeats or not. > > Best, > -Adam > > > On Wed, Jan 6, 2010 at 5:58 AM, Dan Bolser <dan...@gm...> wrote: > >> Hi, >> >> I want to run MUMmer (dnadiff actually) to compare two eukaryotic >> genome assemblies, both approximately 700 Mbp. When I start dnadiff, I >> get the following error: >> >> mummer: suffix tree construction failed: textlen=727301215 larger than >> maximal textlen=536870908 >> >> >> Looking at the file where this value is set (MAXTEXTLEN in >> .../MUMmer3.22/src/kurtz/streesrc/streehuge.h) makes me think that it >> isn't trivial to just tweek the value of this parameter. Can it be >> easily enlarged, or is this essentially the limit of the algorithm? >> >> >> I tried using the "memory efficient ... drop-in replacement for >> mummer", that is described here: >> >> http://compbio.cs.princeton.edu/mems/ >> >> >> Which runs (via dnadiff), however, it's very slow. I'm seeing output like >> this: >> >> ... >> # P.length()=133139 >> # PGSC0002DMS000000743 >> # P.length()=356445 >> # PGSC0002DMS000000744 >> # P.length()=193677 >> ... >> >> >> where the "PGSC0002DMSnnnnnnnnn" part is the ID of one scaffold in the >> assembly. Currently we are up to ~750 scaffolds (after about 8 hours >> of processing) with a total of ~68,000 scaffolds... This is clearly >> too slow. Will the 'original MUMmer' be any faster if I can get it to >> run? >> >> The algorithm is still producing an "out.mgaps" file, currently 7.2Gb. >> >> Is this scale of problem beyond that which MUMmer (or dnadiff) can >> handle? Any alternative solutions for comparing two assemblies of this >> magnitude? >> >> >> Thanks for any help, and please let me know if there is any more >> information I can provide. >> >> Dan. >> >> >> ------------------------------------------------------------------------------ >> This SF.Net email is sponsored by the Verizon Developer Community >> Take advantage of Verizon's best-in-class app development support >> A streamlined, 14 day to market process makes app distribution fast and >> easy >> Join now and get one step closer to millions of Verizon customers >> http://p.sf.net/sfu/verizon-dev2dev >> _______________________________________________ >> AMOS-help mailing list >> AMO...@li... >> https://lists.sourceforge.net/lists/listinfo/amos-help >> > > |