Re: [wgs-assembler-users] CA on BlueGene server at Rice University

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Poking around a bit too, it looks like BlueGene/P only supports MPI, which the assembler doesn't.  The assembler needs SGE (or PBS or LSF) to run independent threaded jobs.

BlueGene/Q has 16 threads and 16 GB per node.  This is a better match to assembler workloads.  It'll still need some kind of batch scheduler.

b

-----Original Message-----
From: Serge Koren [mailto:ser...@gm...] 
Sent: Tuesday, June 02, 2015 2:48 PM
To: Brian Walenz
Cc: wgs...@li...; Elton Vasconcelos
Subject: Re: [wgs-assembler-users] CA on BlueGene server at Rice University

Looking at the system description, assuming this is the system:
http://www.rcsg.rice.edu/sharecore/bluegenep-bgp/ <http://www.rcsg.rice.edu/sharecore/bluegenep-bgp/>

The cores are 32-bit which would limit your processes to 4GB and wouldn’t work well for assembly. Plus, we haven’t compiled/tested the assembler on 32-bit platforms in years so I don’t think it’s worth your time to try to compile it on there. How big is your genome? A 70X human correction takes about 8K cpu hours to generate corrected reads (is that what you mean by pre-assemble?) and total runtime (including correct + assemble) is about 50K cpu so with 2GHz CPUs you should be closer to a week for a full run not 30 days with almost 1000 cores (there is some overhead so not all steps would use all your cores). The assembler supports multiple grid systems (SGE, LSF, PBS) with a shared filesystem and I see Rice University has some clusters available so I’d recommend using one of those rather than recompiling.

> On Jun 2, 2015, at 11:11 AM, Brian Walenz <th...@gm...> wrote:
> 
> Nope, not on our mind.  A lack of access is the primary problem, followed closely by a lack of time and a lack of demand.
> 
> There isn't anything fancy or gcc-specific in the code though.  It does compile with clang (without thread support, but that's clang's fault).  Mucking with c_make.as <http://c_make.as/> might be all that is needed.  To start, try copying the 'Darwin' section, and changing the OSTYPE test to whatever 'uname' reports, and the compiler to icc.  Icc will probably complain about ARCH_CFLAGS, so might as well get rid of all of those.  -O (optimize) is pretty generic.
> 
> b
> 
> 
> 
> On Tue, Jun 2, 2015 at 9:41 AM, Elton Vasconcelos <elt...@iq... <mailto:elt...@iq...>> wrote:
> Hello folks,
> 
> I wonder whether it is on CA developers mind to generate a wgs-assembler version that is compatible with IBM compilers, so we could run it on BG/P and/or BG/Q supercomputers at Rice University.
> I am trying to pre-assemble my target genome sequenced by PacBio technology. By my calculations, I am gonna need 800 CPUs to run it in 30 days.
> 
> Thanks in advance for your attention,
> Cheers,
> Elton
> 
> --
> Elton Vasconcelos, DVM, PhD
> Post-doc at Verjovski-Almeida Lab
> Department of Biochemistry - Institute of Chemistry University of Sao 
> Paulo, Brazil
> 
> 
> ----------------------------------------------------------------------
> --------
> 
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li... 
> <mailto:wgs...@li...>
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users 
> <https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users>
> 
> 
> ----------------------------------------------------------------------
> -------- _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users