From: Serge K. <ser...@gm...> - 2015-06-22 23:19:23
|
Hi, This is a limitation of BLASR/sawriter which is used for the overlapping in hybrid correction. Due to 32-bit indicies they can only support 4GB of sequence. You have to use a value <4gb for ovlHashBlockLength (your current spec file is 6GB so reducing it to 4 or 3 will work). There are recommended parameters on the wiki page for large genomes: http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Correcting_Large_.28.3E_100Mbp.29_Genomes_.28Using_high-identity_data_or_CA_8.1.29 <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Correcting_Large_.28.3E_100Mbp.29_Genomes_.28Using_high-identity_data_or_CA_8.1.29> However, I would advice against using hybrid correction with a mammalian genome, especially on a single machine. It will be very slow. Instead, I’d recommend using only the PacBio data with the low coverage settings from the wiki page: http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly> It will be significantly faster than hybrid correction and we’ve used as little as 18X to assemble 2GB+ genomes. The assembly will not be as contiguous as it is from 50X+ but should be reasonable. Sergey > On Jun 22, 2015, at 2:47 PM, Stephanie D'Souza <sd...@bu...> wrote: > > Hello, > > > I have been trying to run PBcR on a mammalian genome with hybrid data (~54x of Illumina HiSeq and ~24x of PacBio) on a 64 core, 512GB machine, and get the following error: > > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 1 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 2 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 3 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 4 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 5 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 6 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 7 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 8 FAILED. > > 8 overlap partitioning jobs failed. > > > In other words, 8/9 partitioning jobs fail. When I go into the 1-overlapper directory, I find .hash.err files that all say the same thing; here is a representative file: > > ERROR! Reading fasta files greater than 4Gbytes is not supported. > Command exited with non-zero status 1 > 0.00user 0.00system 0:00.01elapsed 0%CPU (0avgtext+0avgdata 2032maxresident)k > 0inputs+8outputs (0major+153minor)pagefaults 0swaps > > > It looks like my PacBio data is being partitioned into 9 files of size 5.7GB each, except for the 9th file which is under 4 GB in size; thus only 8 jobs fail. Why should the size of the file matter? Should I change a .spec file parameter to correct this? (The .spec file I used is attached.) I'd appreciate any help on this. > > Thanks very much, > Stephanie > > > -- > Stephanie D'Souza > MD/PhD Program, PhD Year 1 > Kepler Lab > Department of Microbiology > Boston University School of Medicine > L519 - 72 E Concord St. > Boston, MA 02118 > > <newPBcR_6-4-15.spec.txt>------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o_______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |