From: Brian W. <th...@gm...> - 2015-06-03 01:13:06
|
That's an old page. The most recent page, linked from http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page, is: http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR (look for 'self correction') I've run drosophilia on my 12-core development machine in a few hours to overnight (I haven't timed it). Sergey replaced blasr with a much much faster algorithm, and that was where most of the time was spent. b On Tue, Jun 2, 2015 at 9:02 PM, Elton Vasconcelos <elt...@iq...> wrote: > Thanks for the hints, Brian! > > We'll try everything you suggested tomorrow, back in the lab. > Then I'll tell you what we got. > For now, I only wanna say that our main concern, instead of running runCA > itself, is gonna be with the pre-assembly (correction) step, running > PacBiotoCA and PBcR pipeline that are embedded in the wgs package. > Please take a look at the following strategy to assemble the Drosophila > genome sequenced by PacBio technology (which presents a high error rate on > the base calling, ~15%) at CBCB in Maryland : > http://cbcb.umd.edu/software/PBcR/dmel.html > They mentioned 621K CPU hours to correct that genome of ~122 Mb. > Our organism genome is something like 380 Mb long. Three times > Drosophila's one. > Well, just to let you know again! ;-) > > Talk to you later, > Thanks again. > Good night! > Elton > > 2015-06-02 20:19 GMT-03:00 Brian Walenz <th...@gm...>: > >> For the link problems - all those symbols come out of the kmer package. >> Check that the flags and compilers and whatnot are compatible with those in >> wgs-assembler. >> >> The kmer configuration is a bit awkward. A shell script (configure.sh) >> dumps a config to Make.compilers, which is read by the main Makefile. >> 'gmake real-clean' will remove the previous build AND the Make.compilers >> file. 'gmake' by itself will first build a Make.compilers by calling >> configure.sh, then continue on with the build. The proper way to modify >> this is: >> >> edit configure.sh >> gmake real-clean >> gmake install >> repeat until it works >> >> In configure.sh, there is a block of flags for Linux-amd64. I think >> it'll be easy to apply the same changes made for wgs-assembler. >> >> After rebuilding kmer, the wgs-assembler build should need to just link >> -- in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do >> 'gmake clean' here! You might need to remove the dependency directory >> 'dep' too. >> >> >> For running - the assembler will emit an SGE submit command to run a >> single shell script on tens-to-hundreds-to-thousands of jobs. Each job >> will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is >> faster, fewer is slower). If you can figure out how to run jobs of the >> form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on >> on BG/Q you're most of the way to running CA. To make it output such a >> submit command, supply "useGrid=1 scriptOnGrid=0" to runCA. >> >> The other half of the assembler will be either large I/O or large >> memory. If you've got access to a machine with 256gb and 32 cores you >> should be fine. I don't know what a minimum usable machine size would be. >> >> So, the flow of the computer will be: >> >> On the 256gb machine: runCA useGrid=1 scriptOnGrid=0 .... >> Wait for it to emit a submit command >> Launch those jobs on BG/Q >> Wait for those to finish >> Relaunch runCA on the 256gb machine. It'll check that the job outputs >> are complete, and continue processing, probably emitting another submit >> command, so repeat. >> >> Historical note: back when runCA was first developed, we had a DEC Alpha >> Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few hundred 2 >> CPU, 2gb, 32-bit Linux machines. The Alpha wasn't in the grid, and a >> different architecture anyway, so we had to run CA this way. It was a real >> chore. We're all spoiled with our 4 core 8gb laptops now... >> >> b >> >> >> >> >> >> >> On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq...> >> wrote: >> >>> Thanks Brian, Serge and Huang, >>> >>> We've gone through fixing several error messages during the compilation >>> within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. >>> At the end of the day we stopped on "undefined reference" errors on >>> static libraries (mainly libseq.a, please see make_progs.log file). >>> >>> The 'gmake install' command within the kmer/ dir ran just fine. >>> >>> The following indicates BGQ OS type: >>> [erv3@bgq-fn src]$ uname -a >>> Linux bgq-fn.rcsg.rice.edu 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 >>> 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux >>> >>> We also had to edit c_make.as file, adding some -I options (to indicate >>> paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" section. >>> >>> Running "make objs" and "make libs" separately, everything appeared to >>> work fine (see attached files make_objs.log and make_libs.log). >>> The above mentioned trouble came up on the "make progs" final command we >>> ran (make_progs.log file). >>> >>> Well, just to let you guys know and to see whether some light can be >>> shed. >>> >>> Thanks a lot, >>> Cheers, >>> Elton >>> >>> PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do >>> you think it isn't worthwhile keeping the attempt to install CA on BGQ? >>> >>> >>> > > > -- > Elton Vasconcelos, DVM, PhD > Post-doc at Verjovski-Almeida Lab > Department of Biochemistry - Institute of Chemistry > University of Sao Paulo, Brazil > > |