Re: [wgs-assembler-users] CA on BlueGene server at Rice University

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

That's an old page.  The most recent page, linked from
http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page, is:

http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR

(look for 'self correction')

I've run drosophilia on my 12-core development machine in a few hours to
overnight (I haven't timed it).  Sergey replaced blasr with a much much
faster algorithm, and that was where most of the time was spent.

b

On Tue, Jun 2, 2015 at 9:02 PM, Elton Vasconcelos <elt...@iq...>
wrote:

> Thanks for the hints, Brian!
>
> We'll try everything you suggested tomorrow, back in the lab.
> Then I'll tell you what we got.
> For now, I only wanna say that our main concern, instead of running runCA
> itself, is gonna be with the pre-assembly (correction) step, running
> PacBiotoCA and PBcR pipeline that are embedded in the wgs package.
> Please take a look at the following strategy to assemble the Drosophila
> genome sequenced by PacBio technology (which presents a high error rate on
> the base calling, ~15%)  at CBCB in Maryland :
> http://cbcb.umd.edu/software/PBcR/dmel.html
> They mentioned 621K CPU hours to correct that genome of ~122 Mb.
> Our organism genome is something like 380 Mb long. Three times
> Drosophila's one.
> Well, just to let you know again! ;-)
>
> Talk to you later,
> Thanks again.
> Good night!
> Elton
>
> 2015-06-02 20:19 GMT-03:00 Brian Walenz <th...@gm...>:
>
>> For the link problems - all those symbols come out of the kmer package.
>> Check that the flags and compilers and whatnot are compatible with those in
>> wgs-assembler.
>>
>> The kmer configuration is a bit awkward.  A shell script (configure.sh)
>> dumps a config to Make.compilers, which is read by the main Makefile.
>> 'gmake real-clean' will remove the previous build AND the Make.compilers
>> file.  'gmake' by itself will first build a Make.compilers by calling
>> configure.sh, then continue on with the build.  The proper way to modify
>> this is:
>>
>> edit configure.sh
>> gmake real-clean
>> gmake install
>> repeat until it works
>>
>> In configure.sh, there is a block of flags for Linux-amd64.  I think
>> it'll be easy to apply the same changes made for wgs-assembler.
>>
>> After rebuilding kmer, the wgs-assembler build should need to just link
>> -- in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do
>> 'gmake clean' here!  You might need to remove the dependency directory
>> 'dep' too.
>>
>>
>> For running - the assembler will emit an SGE submit command to run a
>> single shell script on tens-to-hundreds-to-thousands of jobs.  Each job
>> will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is
>> faster, fewer is slower).  If you can figure out how to run jobs of the
>> form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on
>> on BG/Q you're most of the way to running CA.  To make it output such a
>> submit command, supply "useGrid=1 scriptOnGrid=0" to runCA.
>>
>> The other half of the assembler will be either large I/O or large
>> memory.  If you've got access to a machine with 256gb and 32 cores you
>> should be fine.  I don't know what a minimum usable machine size would be.
>>
>> So, the flow of the computer will be:
>>
>> On the 256gb machine:  runCA useGrid=1 scriptOnGrid=0 ....
>> Wait for it to emit a submit command
>> Launch those jobs on BG/Q
>> Wait for those to finish
>> Relaunch runCA on the 256gb machine.  It'll check that the job outputs
>> are complete, and continue processing, probably emitting another submit
>> command, so repeat.
>>
>> Historical note: back when runCA was first developed, we had a DEC Alpha
>> Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few hundred 2
>> CPU, 2gb, 32-bit Linux machines.  The Alpha wasn't in the grid, and a
>> different architecture anyway, so we had to run CA this way.  It was a real
>> chore.  We're all spoiled with our 4 core 8gb laptops now...
>>
>> b
>>
>>
>>
>>
>>
>>
>> On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq...>
>> wrote:
>>
>>> Thanks Brian, Serge and Huang,
>>>
>>> We've gone through fixing several error messages during the compilation
>>> within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package.
>>> At the end of the day we stopped on "undefined reference" errors on
>>> static libraries (mainly libseq.a, please see make_progs.log file).
>>>
>>> The 'gmake install' command within the kmer/ dir ran just fine.
>>>
>>> The following indicates BGQ OS type:
>>> [erv3@bgq-fn src]$ uname -a
>>> Linux bgq-fn.rcsg.rice.edu 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10
>>> 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux
>>>
>>> We also had to edit c_make.as file, adding some -I options (to indicate
>>> paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" section.
>>>
>>> Running "make objs" and "make libs" separately, everything appeared to
>>> work fine (see attached files make_objs.log and make_libs.log).
>>> The above mentioned trouble came up on the "make progs" final command we
>>> ran (make_progs.log file).
>>>
>>> Well, just to let you guys know and to see whether some light can be
>>> shed.
>>>
>>> Thanks a lot,
>>> Cheers,
>>> Elton
>>>
>>> PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do
>>> you think it isn't worthwhile keeping the attempt to install CA on BGQ?
>>>
>>>
>>>
>
>
> --
> Elton Vasconcelos, DVM, PhD
> Post-doc at Verjovski-Almeida Lab
> Department of Biochemistry - Institute of Chemistry
> University of Sao Paulo, Brazil
>
>