You can subscribe to this list here.
2012 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(29) |
May
(8) |
Jun
(5) |
Jul
(46) |
Aug
(16) |
Sep
(5) |
Oct
(6) |
Nov
(17) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(5) |
Feb
(2) |
Mar
(10) |
Apr
(13) |
May
(20) |
Jun
(7) |
Jul
(6) |
Aug
(14) |
Sep
(9) |
Oct
(19) |
Nov
(17) |
Dec
(3) |
2014 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(1) |
May
(1) |
Jun
(30) |
Jul
(10) |
Aug
(2) |
Sep
(18) |
Oct
(3) |
Nov
(4) |
Dec
(13) |
2015 |
Jan
(27) |
Feb
|
Mar
(19) |
Apr
(12) |
May
(10) |
Jun
(18) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
(9) |
2016 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Christian D. <chr...@gm...> - 2015-07-23 15:14:05
|
Hi, I run wgs-8.2beta until the assembler went idle on one of the overlap correction steps (frgcorr.sh). Obviously, one of the early fragments didn't finish, as frgcorr.sh for this fragment was running for 15h and the log file contained only the first few row up to ### Using 20 pthreads. The assembler stopped the correction only a few fragments after the idle one. I killed the idle process and executed the fragcorr.sh command for this fragment manually. After that, I run runCA again with the original command and immediatly got the failure message: "gatekeeper failed to add fragments" As this didn't seem to work, I renamed the folder 3-overlapcorrection and run runCA again leading to the same error message. I thought that starting with the step before the error correction could work and run: /software/wgs-8.2beta/Linux-amd64/bin/overlapStoreBuild -o /cabog/CA/genome.ovlStore.BUILDING -g /cabog/CA/genome.gkpStore -M 8192 -L /cabog/CA/genome.ovlStore.list > /cabog/CA/genome.ovlStore.err 2>&1 The log file genome.ovlStore.err contains the following: gkStore_open()-- ERROR! Incorrect element sizes; code and store are incompatible. gkLibrary: store 216 code 216 bytes gkPackedFragment: store 24 code 24 bytes gkNormalFragment: store 48 code 48 bytes gkStrobeFragment: store 48 code 48 bytes AS_READ_MAX_NORMAL_LEN_BITS: store 16 code 18 Is it possible to restart the assembly at this point? What steps do I have to take to "rescue" the assembly results up to this point (23 days of calculation time) Thanks Chris |
From: Christian D. <chr...@gm...> - 2015-07-23 14:29:23
|
Hi, I run wgs-8.2beta until the assembler went idle on one of the overlap correction steps (frgcorr.sh). Obviously, one of the early fragments didn't finish, as frgcorr.sh for this fragment was running for 15h and the log file contained only the first few row up to ### Using 20 pthreads. The assembler stopped the correction only a few fragments after the idle one. I killed the idle process and executed the fragcorr.sh command for this fragment manually. After that, I run runCA again with the original command and immediatly got the failure message: "gatekeeper failed to add fragments" As this didn't seem to work, I renamed the folder 3-overlapcorrection and run runCA again leading to the same error message. I thought that starting with the step before the error correction could work and run: /software/wgs-8.2beta/Linux-amd64/bin/overlapStoreBuild -o /cabog/CA/genome.ovlStore.BUILDING -g /cabog/CA/genome.gkpStore -M 8192 -L /cabog/CA/genome.ovlStore.list > /cabog/CA/genome.ovlStore.err 2>&1 The log file genome.ovlStore.err contains the following: gkStore_open()-- ERROR! Incorrect element sizes; code and store are incompatible. gkLibrary: store 216 code 216 bytes gkPackedFragment: store 24 code 24 bytes gkNormalFragment: store 48 code 48 bytes gkStrobeFragment: store 48 code 48 bytes AS_READ_MAX_NORMAL_LEN_BITS: store 16 code 18 Is it possible to restart the assembly at this point? What steps do I have to take to "rescue" the assembly results up to this point (>20 days of calculation time) Thanks Chris |
From: Serge K. <ser...@gm...> - 2015-06-22 23:19:23
|
Hi, This is a limitation of BLASR/sawriter which is used for the overlapping in hybrid correction. Due to 32-bit indicies they can only support 4GB of sequence. You have to use a value <4gb for ovlHashBlockLength (your current spec file is 6GB so reducing it to 4 or 3 will work). There are recommended parameters on the wiki page for large genomes: http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Correcting_Large_.28.3E_100Mbp.29_Genomes_.28Using_high-identity_data_or_CA_8.1.29 <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Correcting_Large_.28.3E_100Mbp.29_Genomes_.28Using_high-identity_data_or_CA_8.1.29> However, I would advice against using hybrid correction with a mammalian genome, especially on a single machine. It will be very slow. Instead, I’d recommend using only the PacBio data with the low coverage settings from the wiki page: http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly> It will be significantly faster than hybrid correction and we’ve used as little as 18X to assemble 2GB+ genomes. The assembly will not be as contiguous as it is from 50X+ but should be reasonable. Sergey > On Jun 22, 2015, at 2:47 PM, Stephanie D'Souza <sd...@bu...> wrote: > > Hello, > > > I have been trying to run PBcR on a mammalian genome with hybrid data (~54x of Illumina HiSeq and ~24x of PacBio) on a 64 core, 512GB machine, and get the following error: > > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 1 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 2 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 3 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 4 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 5 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 6 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 7 FAILED. > ERROR: Overlap prep job /projectnb/keplab/sdsouza/PBcR_June2015//tempbat2newPBcR_6-4-15/1-overlapper/long_reads_part 8 FAILED. > > 8 overlap partitioning jobs failed. > > > In other words, 8/9 partitioning jobs fail. When I go into the 1-overlapper directory, I find .hash.err files that all say the same thing; here is a representative file: > > ERROR! Reading fasta files greater than 4Gbytes is not supported. > Command exited with non-zero status 1 > 0.00user 0.00system 0:00.01elapsed 0%CPU (0avgtext+0avgdata 2032maxresident)k > 0inputs+8outputs (0major+153minor)pagefaults 0swaps > > > It looks like my PacBio data is being partitioned into 9 files of size 5.7GB each, except for the 9th file which is under 4 GB in size; thus only 8 jobs fail. Why should the size of the file matter? Should I change a .spec file parameter to correct this? (The .spec file I used is attached.) I'd appreciate any help on this. > > Thanks very much, > Stephanie > > > -- > Stephanie D'Souza > MD/PhD Program, PhD Year 1 > Kepler Lab > Department of Microbiology > Boston University School of Medicine > L519 - 72 E Concord St. > Boston, MA 02118 > > <newPBcR_6-4-15.spec.txt>------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o_______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Stephanie D'S. <sd...@bu...> - 2015-06-22 18:47:54
|
#PBcR options genomeSize=2100000000 maxCoverage=50 maxGap = 1500 blasr=-noRefineAlign -advanceHalf -noSplitSubreads -minMatch 10 -minPctIdentity 70 -bestn 24 -nCandidates 24 # original asm settings merSize = 14 #merylMemory = 128000 #merylThreads = 32 # ovlStoreMemory = 8192 # grid info ovlHashBits = 27 # ovlThreads = 32 ovlHashBlockLength = 6000000000 ovlRefBlockLength = 1000000000 ovlRefBlockSize = 0 # for mer overlapper merCompression = 1 merOverlapperSeedBatchSize = 500000 merOverlapperExtendBatchSize = 250000 #frgCorrThreads = 2 #frgCorrBatchSize = 100000 #ovlCorrBatchSize = 100000 #akshaya's ovlStoreMemory=256000 ovlThreads=45 ovlCorrConcurrency=20 ovlCorrBatchSize=3000000 merylMemory=256000 merylThreads=45 frgCorrThreads=48 frgCorrBatchSize=3000000 frgCorrConcurrency=20 /archive/Steph/lane1.frg /archive/Steph/lane2.frg /archive/Steph/lane3.frg |
From: Serge K. <ser...@gm...> - 2015-06-19 22:03:33
|
Hi, Fundamentally, it’s the same approach as the non-low coverage data. It computes overlaps with MHAP using more sensitive parameters than the default (same k-mer size but larger sketch size, which decreases your chance of missing an overlap). The PBDAGCON step creates the consensus but the threshold to keep a base is reduced from the default of 4-fold to 2-fold. Most bases have higher coverage than this but the lowered threshold is what is responsible for keeping many more sequences at this lower coverage level while still trimming artifacts in the data when only a single sequence supports it. The worst sequences will remain relatively high error but the median error rates of the sequences is significantly increased since the majority of them have more than 2-fold coverage correcting them. They may have some regions of remaining higher error. The assembly step will trim the corrected sequences again to remove any artifacts not filtered by the initial correction. As far as I know, HGAP performs an all-vs-longest brute force alignment (using BLASR, this is why it’s computationally expensive). There is an index built on the longest sequences but the same is true for pretty much all methods (including PBcR which uses the min-hash as its index). PBcR will use a similar approach and try to correct the longest 40X of data using all data (i.e. all data mapped to longest 40X) but since you have less than 40X it’s all-vs-all. PBcR will use partial overlaps when doing a correction (that is part of a read is contained in part of another read, not only fully contained ones like the default in HGAP). There is a global filter which only allows each sequence to map to its best coverage positions, where best is based on size and identity. Serge > On Jun 18, 2015, at 10:39 AM, mic...@ip... wrote: > > Dear Serge, > > Could you give me some information about how PBcR does the error-correction (specially for low coverage). > This might sound like a bold question but i have to ask since could not find any detailed information about it. > > I fed PBcR with 22 x PacBio data of a 1.3 Gb genome (low coverage settings) and it returned 15 x of error-corrected reads. This result is amazing (evenwhen considering the quality to be "only" 97-98 instead of 99+). > > I know that overlaps are found using your MHAP aligner and that those overlaps are fed to PBDAGCON to create consensus, which then results in high confident base-information of the whole sequence. > > Does PBcR(like HGAP) use long sequences as initial "references" for the alignments or is it just brute-force all-against-all alignment and piling the overlaps up to find as many overlaps (coverage) per position as possible? > > Is there is a lower coverage threshold to do consensus calling at a given position of the read? > > Those questions relate more to PBDAGCON, for which i could not find much information. Maybe you could point me to some information about PBDAGCON or briefly explain its settings in PBcR. > > Thank you, > > Michel |
From: <mic...@ip...> - 2015-06-18 14:55:58
|
Dear Serge, Could you give me some information about how PBcR does the error-correction (specially for low coverage). This might sound like a bold question but i have to ask since could not find any detailed information about it. I fed PBcR with 22 x PacBio data of a 1.3 Gb genome (low coverage settings) and it returned 15 x of error-corrected reads. This result is amazing (evenwhen considering the quality to be "only" 97-98 instead of 99+). I know that overlaps are found using your MHAP aligner and that those overlaps are fed to PBDAGCON to create consensus, which then results in high confident base-information of the whole sequence. Does PBcR(like HGAP) use long sequences as initial "references" for the alignments or is it just brute-force all-against-all alignment and piling the overlaps up to find as many overlaps (coverage) per position as possible? Is there is a lower coverage threshold to do consensus calling at a given position of the read? Those questions relate more to PBDAGCON, for which i could not find much information. Maybe you could point me to some information about PBDAGCON or briefly explain its settings in PBcR. Thank you, Michel |
From: Serge K. <ser...@gm...> - 2015-06-18 13:42:53
|
Hi, Sorry for the delay in replying. If you are trying to re-run with the sensitive low-coverage options, you need to start from scratch by either removing the existing results or using a new library name. Serge > On Jun 12, 2015, at 5:08 PM, Seth Munholland <mu...@uw...> wrote: > > Hi Serge, > > I double checked BLASR/PBDAGCON, kept the old stage 9 folder changed, upgraded to wgs8.3rc2 and tried to rerun it but still got the same "Will not overwrite" error. At this point the command line output is gone so I can't grep it. Would it be safed to a log file somewhere? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Thu, May 28, 2015 at 10:39 AM, Serge Koren <ser...@gm... <mailto:ser...@gm...>> wrote: > Yes, the 25X coverage is for after correction. However, 12X should be sufficient for a reasonable assembly, certainly not a tiny fraction of your genome like you’re seeing. If you run > gatekeeper -dumpinfo PI440795_Self_Assembled/asm.gkpStore > > That should give more info on what reads made it into the assembly. > > >> On May 28, 2015, at 10:37 AM, Seth Munholland <mu...@uw... <mailto:mu...@uw...>> wrote: >> >> After correction I ended up with about 12x coverage. I presume the ~25x coverage suggested on the PBcR page is for after correction? I'll try the low-coverage parameters and double check BLASR/PBDAGCON next, thanks. >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 <> >> On Wed, May 27, 2015 at 6:56 PM, Serge Koren <ser...@gm... <mailto:ser...@gm...>> wrote: >> That most likely means you ended up with too little coverage for assembly after correction. You can check the coverage in the PI440795_Self_Assembled*.fastq files. If you’re not already, I’d suggest using the low-coverage parameters on the wiki page: >> http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly> >> >> I’d also double-check that you have BLASR/PBDAGCON available in your path and that it is being used for assembly (in your tempPI440795_Self_Assembled/runPartition.sh file look for the word pbdagcon). >> >>> On May 26, 2015, at 12:03 PM, Seth Munholland <mu...@uw... <mailto:mu...@uw...>> wrote: >>> >>> Hi Serge, >>> >>> I looked into the 9-terminator folder and found the asm.utg.fasta file, but it's only 4.5MB (~0.007x coverage) when I started wth ~33x coverage. Any suggestions for where to look for the data loss? >>> >>> Seth Munholland, B.Sc. >>> Department of Biological Sciences >>> Rm. 304 Biology Building >>> University of Windsor >>> 401 Sunset Ave. N9B 3P4 >>> T: (519) 253-3000 Ext: 4755 <> >>> On Tue, May 26, 2015 at 11:18 AM, Serge Koren <ser...@gm... <mailto:ser...@gm...>> wrote: >>> Hi, >>> >>> This was a bug fixed in CA 8.3rc2 (when the assembly of the corrected data failed, the restart did not work properly). If you grep for runCA in your command line output from your run and re-run the last command (it should have the library name as the -d option). That will re-create the 9-terminator directory and corresponding files. Unless you install the missing perl package, the qc generation will still fail, but it only contains statistics on the assembly, the asm.utg.fasta file should be your complete assembly. >>> >>> Serge >>> >>> >>>> On May 26, 2015, at 10:49 AM, Seth Munholland <mu...@uw... <mailto:mu...@uw...>> wrote: >>>> >>>> Hello Everyone, >>>> >>>> I was running a PBcR through to assembly with nothing in my spec file excet memory options since I share the server. I got all the way to step 9 (terminator) when I got the following error: >>>> >>>> ----------------------------------------START Tue May 26 02:18:24 2015 >>>> /usr/bin/env perl /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> -euid /lore/bill.crosby.storage/PI440795/PI440795_Self_Assembled/9-terminator/asm.asm >>>> Can't locate Statistics/Descriptive.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> line 18. >>>> BEGIN failed--compilation aborted at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> line 18. >>>> ----------------------------------------END Tue May 26 02:18:24 2015 (0 seconds) >>>> ERROR: Failed with signal INT (2) >>>> The Cleaner has arrived. Doing 'none'. >>>> ----------------------------------------END Tue May 26 02:18:24 2015 (1490 seconds) >>>> >>>> I google search tells me that I can try manually running asmOutputFasta to try and make the missing output fasta (http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/ <http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/>). When I try it the fasta files are only ~3MB. The same link warns that the asm may be incomplete and I might have to repeat step 9 in runCA, this is where I get stuck. >>>> >>>> I've renamed the 9-terminator folder to 9-terminator-old, but what is the command for runCA to pickup a PBcR run? I tried specifying the directory, prefix, and spec file and after changing to the hash memory options in my spec file i get: >>>> >>>> Failure message: >>>> >>>> no fragment files specified, and stores not already created >>>> >>>> While trying to rerun the PBcR command again gives: >>>> >>>> Error: requested to output PI440795_Self_Assembled.frg but file already exists. Will not overwrite. >>>> >>>> >>>> Seth Munholland, B.Sc. >>>> Department of Biological Sciences >>>> Rm. 304 Biology Building >>>> University of Windsor >>>> 401 Sunset Ave. N9B 3P4 >>>> T: (519) 253-3000 Ext: 4755 <>------------------------------------------------------------------------------ >>>> One dashboard for servers and applications across Physical-Virtual-Cloud >>>> Widest out-of-the-box monitoring support with 50+ applications >>>> Performance metrics, stats and reports that give you Actionable Insights >>>> Deep dive visibility with transaction tracing using APM Insight. >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________ <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________> >>>> wgs-assembler-users mailing list >>>> wgs...@li... <mailto:wgs...@li...> >>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users <https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users> >>> >>> >> >> > > |
From: Seth M. <mu...@uw...> - 2015-06-12 21:08:38
|
Hi Serge, I double checked BLASR/PBDAGCON, kept the old stage 9 folder changed, upgraded to wgs8.3rc2 and tried to rerun it but still got the same "Will not overwrite" error. At this point the command line output is gone so I can't grep it. Would it be safed to a log file somewhere? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Thu, May 28, 2015 at 10:39 AM, Serge Koren <ser...@gm...> wrote: > Yes, the 25X coverage is for after correction. However, 12X should be > sufficient for a reasonable assembly, certainly not a tiny fraction of your > genome like you’re seeing. If you run > gatekeeper -dumpinfo PI440795_Self_Assembled/asm.gkpStore > > That should give more info on what reads made it into the assembly. > > > On May 28, 2015, at 10:37 AM, Seth Munholland <mu...@uw...> wrote: > > After correction I ended up with about 12x coverage. I presume the ~25x > coverage suggested on the PBcR page is for after correction? I'll try the > low-coverage parameters and double check BLASR/PBDAGCON next, thanks. > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > > On Wed, May 27, 2015 at 6:56 PM, Serge Koren <ser...@gm...> wrote: > >> That most likely means you ended up with too little coverage for assembly >> after correction. You can check the coverage in the >> PI440795_Self_Assembled*.fastq files. If you’re not already, I’d suggest >> using the low-coverage parameters on the wiki page: >> >> http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly >> >> I’d also double-check that you have BLASR/PBDAGCON available in your path >> and that it is being used for assembly (in your >> tempPI440795_Self_Assembled/runPartition.sh file look for the word >> pbdagcon). >> >> On May 26, 2015, at 12:03 PM, Seth Munholland <mu...@uw...> >> wrote: >> >> Hi Serge, >> >> I looked into the 9-terminator folder and found the asm.utg.fasta file, >> but it's only 4.5MB (~0.007x coverage) when I started wth ~33x coverage. >> Any suggestions for where to look for the data loss? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 >> >> On Tue, May 26, 2015 at 11:18 AM, Serge Koren <ser...@gm...> >> wrote: >> >>> Hi, >>> >>> This was a bug fixed in CA 8.3rc2 (when the assembly of the corrected >>> data failed, the restart did not work properly). If you grep for runCA in >>> your command line output from your run and re-run the last command (it >>> should have the library name as the -d option). That will re-create the >>> 9-terminator directory and corresponding files. Unless you install the >>> missing perl package, the qc generation will still fail, but it only >>> contains statistics on the assembly, the asm.utg.fasta file should be your >>> complete assembly. >>> >>> Serge >>> >>> >>> On May 26, 2015, at 10:49 AM, Seth Munholland <mu...@uw...> >>> wrote: >>> >>> Hello Everyone, >>> >>> I was running a PBcR through to assembly with nothing in my spec file >>> excet memory options since I share the server. I got all the way to step 9 >>> (terminator) when I got the following error: >>> >>> ----------------------------------------START Tue May 26 02:18:24 2015 >>> /usr/bin/env perl /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/ >>> caqc.pl -euid >>> /lore/bill.crosby.storage/PI440795/PI440795_Self_Assembled/9-terminator/asm.asm >>> Can't locate Statistics/Descriptive.pm in @INC (@INC contains: >>> /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi >>> /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl >>> /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi >>> /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl >>> /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at >>> /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl line 18. >>> BEGIN failed--compilation aborted at >>> /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl line 18. >>> ----------------------------------------END Tue May 26 02:18:24 2015 (0 >>> seconds) >>> ERROR: Failed with signal INT (2) >>> The Cleaner has arrived. Doing 'none'. >>> ----------------------------------------END Tue May 26 02:18:24 2015 >>> (1490 seconds) >>> >>> I google search tells me that I can try manually running asmOutputFasta >>> to try and make the missing output fasta ( >>> http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/). >>> When I try it the fasta files are only ~3MB. The same link warns that the >>> asm may be incomplete and I might have to repeat step 9 in runCA, this is >>> where I get stuck. >>> >>> I've renamed the 9-terminator folder to 9-terminator-old, but what is >>> the command for runCA to pickup a PBcR run? I tried specifying the >>> directory, prefix, and spec file and after changing to the hash memory >>> options in my spec file i get: >>> >>> Failure message: >>> >>> no fragment files specified, and stores not already created >>> >>> While trying to rerun the PBcR command again gives: >>> >>> Error: requested to output PI440795_Self_Assembled.frg but file already >>> exists. Will not overwrite. >>> >>> >>> Seth Munholland, B.Sc. >>> Department of Biological Sciences >>> Rm. 304 Biology Building >>> University of Windsor >>> 401 Sunset Ave. N9B 3P4 >>> T: (519) 253-3000 Ext: 4755 >>> >>> ------------------------------------------------------------------------------ >>> One dashboard for servers and applications across Physical-Virtual-Cloud >>> Widest out-of-the-box monitoring support with 50+ applications >>> Performance metrics, stats and reports that give you Actionable Insights >>> Deep dive visibility with transaction tracing using APM Insight. >>> >>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> >>> >>> >> >> > > |
From: Elton V. <elt...@iq...> - 2015-06-09 18:30:13
|
Thanks very much, Serge! I'll try your suggestions and we´ll get back if it is necessary. Cheers, Elton 2015-06-09 14:47 GMT-03:00 Serge Koren <ser...@gm...>: > See my replies below inline. > > On Jun 9, 2015, at 1:29 PM, Elton Vasconcelos <elt...@iq...> wrote: > > Hy Brian and Serge, > > I forgot to tell you last week that I am not doing PacBio reads > self-correction. Instead I'm doing hybrid assembly (26 SMRTcells plus 3 > paired-end Illumina libraries). > ### Question 1: ### > Is it still worthwhile doing that in a single multi-thread machine? > Cause I've seen a Sergey's comment that the pipeline is quite slower when > considering correction with Illumina reads ( > http://ehc.ac/p/wgs-assembler/mailman/message/33620582/) > > A single machine will likely take several weeks to run the correction with > Illumina data for your size genome, I’d advise against it. Based on your > genome size and # smrtcells, I’d guess you have around 30X pacbio so I’d > suggest trying the low coverage options for self-correction instead: > > http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly > It will still be significantly faster than the Illumina based correction. > On a recent 20X assembly of a 2.4GB genome, self-correction ran about 50 > times faster than illumina-based correction. You can try one of the more > recent tools for hybrid correction (LoRDEC, proovread) though I haven’t > personally run them and can’t say how much time they would require. > > > I am now running CA on a sinlge multi-thread server that has 80 threads > and 1T RAM. > It spent about 5 days on the "overlapInCore" step and I had to kill the > process because of server owner's complaint about too many threads being > consumed for a long time period. > > ### Question 2: ### > Could you explain me why "overlapInCore" is using all available threads > (80) instead of only the user's request (20)? > My spec file is attached and I ran the following command: > $ nohup /home/elton/wgs-8.3rc2/Linux-amd64/bin/pacBioToCA -l Test01 > -threads 20 -shortReads -genomeSize 380000000 -s pacbio.spec -fastq > 26-SMRTcells-filtered_subreads.fastq illumina-NEW.frg & > > Your spec file specifies: > ovlThreads=20 > ovlConcurrency=20 > This means run 20 jobs each using 20 cores, thus it is really trying to > use 400 cores on your system. If you set ovlConcurrency=1 it will use 20 > cores. > > > Thanks a lot again for your attention and support, > Best, > Elton > > > 2015-06-03 11:43 GMT-03:00 Serge Koren <ser...@gm...>: > >> Yes, the latest CA 8.3 release can assemble D. melanogaster in < 700CPU >> hours. You can see the updated timings here: >> >> http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes >> >> I’ve routinely run D. melanogaster on a 16-core, 32GB machine in less >> than a day (I haven’t timed it exactly) so for your genome you’re looking >> at 3-4K cpu hours. You should be able to run it on a single 16-core 32GB >> machine in a couple of days so I think it’s easiest to run it on a single >> largish machine you have access to. >> >> Sergey >> >> On Jun 2, 2015, at 9:12 PM, Brian Walenz <th...@gm...> wrote: >> >> That's an old page. The most recent page, linked from >> http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page, is: >> >> http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR >> >> (look for 'self correction') >> >> I've run drosophilia on my 12-core development machine in a few hours to >> overnight (I haven't timed it). Sergey replaced blasr with a much much >> faster algorithm, and that was where most of the time was spent. >> >> b >> >> >> On Tue, Jun 2, 2015 at 9:02 PM, Elton Vasconcelos <elt...@iq...> >> wrote: >> >>> Thanks for the hints, Brian! >>> >>> We'll try everything you suggested tomorrow, back in the lab. >>> Then I'll tell you what we got. >>> For now, I only wanna say that our main concern, instead of running >>> runCA itself, is gonna be with the pre-assembly (correction) step, running >>> PacBiotoCA and PBcR pipeline that are embedded in the wgs package. >>> Please take a look at the following strategy to assemble the Drosophila >>> genome sequenced by PacBio technology (which presents a high error rate on >>> the base calling, ~15%) at CBCB in Maryland : >>> http://cbcb.umd.edu/software/PBcR/dmel.html >>> They mentioned 621K CPU hours to correct that genome of ~122 Mb. >>> Our organism genome is something like 380 Mb long. Three times >>> Drosophila's one. >>> Well, just to let you know again! ;-) >>> >>> Talk to you later, >>> Thanks again. >>> Good night! >>> Elton >>> >>> 2015-06-02 20:19 GMT-03:00 Brian Walenz <th...@gm...>: >>> >>>> For the link problems - all those symbols come out of the kmer >>>> package. Check that the flags and compilers and whatnot are compatible >>>> with those in wgs-assembler. >>>> >>>> The kmer configuration is a bit awkward. A shell script (configure.sh) >>>> dumps a config to Make.compilers, which is read by the main Makefile. >>>> 'gmake real-clean' will remove the previous build AND the Make.compilers >>>> file. 'gmake' by itself will first build a Make.compilers by calling >>>> configure.sh, then continue on with the build. The proper way to modify >>>> this is: >>>> >>>> edit configure.sh >>>> gmake real-clean >>>> gmake install >>>> repeat until it works >>>> >>>> In configure.sh, there is a block of flags for Linux-amd64. I think >>>> it'll be easy to apply the same changes made for wgs-assembler. >>>> >>>> After rebuilding kmer, the wgs-assembler build should need to just link >>>> -- in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do >>>> 'gmake clean' here! You might need to remove the dependency directory >>>> 'dep' too. >>>> >>>> >>>> For running - the assembler will emit an SGE submit command to run a >>>> single shell script on tens-to-hundreds-to-thousands of jobs. Each job >>>> will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is >>>> faster, fewer is slower). If you can figure out how to run jobs of the >>>> form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on >>>> on BG/Q you're most of the way to running CA. To make it output such a >>>> submit command, supply "useGrid=1 scriptOnGrid=0" to runCA. >>>> >>>> The other half of the assembler will be either large I/O or large >>>> memory. If you've got access to a machine with 256gb and 32 cores you >>>> should be fine. I don't know what a minimum usable machine size would be. >>>> >>>> So, the flow of the computer will be: >>>> >>>> On the 256gb machine: runCA useGrid=1 scriptOnGrid=0 .... >>>> Wait for it to emit a submit command >>>> Launch those jobs on BG/Q >>>> Wait for those to finish >>>> Relaunch runCA on the 256gb machine. It'll check that the job outputs >>>> are complete, and continue processing, probably emitting another submit >>>> command, so repeat. >>>> >>>> Historical note: back when runCA was first developed, we had a DEC >>>> Alpha Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few >>>> hundred 2 CPU, 2gb, 32-bit Linux machines. The Alpha wasn't in the grid, >>>> and a different architecture anyway, so we had to run CA this way. It was >>>> a real chore. We're all spoiled with our 4 core 8gb laptops now... >>>> >>>> b >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq...> >>>> wrote: >>>> >>>>> Thanks Brian, Serge and Huang, >>>>> >>>>> We've gone through fixing several error messages during the >>>>> compilation within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. >>>>> At the end of the day we stopped on "undefined reference" errors on >>>>> static libraries (mainly libseq.a, please see make_progs.log file). >>>>> >>>>> The 'gmake install' command within the kmer/ dir ran just fine. >>>>> >>>>> The following indicates BGQ OS type: >>>>> [erv3@bgq-fn src]$ uname -a >>>>> Linux bgq-fn.rcsg.rice.edu 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 >>>>> 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux >>>>> >>>>> We also had to edit c_make.as file, adding some -I options (to >>>>> indicate paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" >>>>> section. >>>>> >>>>> Running "make objs" and "make libs" separately, everything appeared to >>>>> work fine (see attached files make_objs.log and make_libs.log). >>>>> The above mentioned trouble came up on the "make progs" final command >>>>> we ran (make_progs.log file). >>>>> >>>>> Well, just to let you guys know and to see whether some light can be >>>>> shed. >>>>> >>>>> Thanks a lot, >>>>> Cheers, >>>>> Elton >>>>> >>>>> PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do >>>>> you think it isn't worthwhile keeping the attempt to install CA on BGQ? >>>>> >>>>> >>>>> >>> >>> >>> -- >>> Elton Vasconcelos, DVM, PhD >>> Post-doc at Verjovski-Almeida Lab >>> Department of Biochemistry - Institute of Chemistry >>> University of Sao Paulo, Brazil >>> >>> >> >> > > > -- > Elton Vasconcelos, DVM, PhD > Post-doc at Verjovski-Almeida Lab > Department of Biochemistry - Institute of Chemistry > University of Sao Paulo, Brazil > > <pacbio.spec> > > > -- Elton Vasconcelos, DVM, PhD Post-doc at Verjovski-Almeida Lab Department of Biochemistry - Institute of Chemistry University of Sao Paulo, Brazil |
From: Serge K. <ser...@gm...> - 2015-06-09 17:48:09
|
See my replies below inline. > On Jun 9, 2015, at 1:29 PM, Elton Vasconcelos <elt...@iq...> wrote: > > Hy Brian and Serge, > > I forgot to tell you last week that I am not doing PacBio reads self-correction. Instead I'm doing hybrid assembly (26 SMRTcells plus 3 paired-end Illumina libraries). > ### Question 1: ### > Is it still worthwhile doing that in a single multi-thread machine? > Cause I've seen a Sergey's comment that the pipeline is quite slower when considering correction with Illumina reads (http://ehc.ac/p/wgs-assembler/mailman/message/33620582/ <http://ehc.ac/p/wgs-assembler/mailman/message/33620582/>) A single machine will likely take several weeks to run the correction with Illumina data for your size genome, I’d advise against it. Based on your genome size and # smrtcells, I’d guess you have around 30X pacbio so I’d suggest trying the low coverage options for self-correction instead: http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly> It will still be significantly faster than the Illumina based correction. On a recent 20X assembly of a 2.4GB genome, self-correction ran about 50 times faster than illumina-based correction. You can try one of the more recent tools for hybrid correction (LoRDEC, proovread) though I haven’t personally run them and can’t say how much time they would require. > > I am now running CA on a sinlge multi-thread server that has 80 threads and 1T RAM. > It spent about 5 days on the "overlapInCore" step and I had to kill the process because of server owner's complaint about too many threads being consumed for a long time period. > > ### Question 2: ### > Could you explain me why "overlapInCore" is using all available threads (80) instead of only the user's request (20)? > My spec file is attached and I ran the following command: > $ nohup /home/elton/wgs-8.3rc2/Linux-amd64/bin/pacBioToCA -l Test01 -threads 20 -shortReads -genomeSize 380000000 -s pacbio.spec -fastq 26-SMRTcells-filtered_subreads.fastq illumina-NEW.frg & Your spec file specifies: ovlThreads=20 ovlConcurrency=20 This means run 20 jobs each using 20 cores, thus it is really trying to use 400 cores on your system. If you set ovlConcurrency=1 it will use 20 cores. > > Thanks a lot again for your attention and support, > Best, > Elton > > > 2015-06-03 11:43 GMT-03:00 Serge Koren <ser...@gm... <mailto:ser...@gm...>>: > Yes, the latest CA 8.3 release can assemble D. melanogaster in < 700CPU hours. You can see the updated timings here: > http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes <http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes> > > I’ve routinely run D. melanogaster on a 16-core, 32GB machine in less than a day (I haven’t timed it exactly) so for your genome you’re looking at 3-4K cpu hours. You should be able to run it on a single 16-core 32GB machine in a couple of days so I think it’s easiest to run it on a single largish machine you have access to. > > Sergey > >> On Jun 2, 2015, at 9:12 PM, Brian Walenz <th...@gm... <mailto:th...@gm...>> wrote: >> >> That's an old page. The most recent page, linked from http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page <http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page>, is: >> >> http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR> >> >> (look for 'self correction') >> >> I've run drosophilia on my 12-core development machine in a few hours to overnight (I haven't timed it). Sergey replaced blasr with a much much faster algorithm, and that was where most of the time was spent. >> >> b >> >> >> On Tue, Jun 2, 2015 at 9:02 PM, Elton Vasconcelos <elt...@iq... <mailto:elt...@iq...>> wrote: >> Thanks for the hints, Brian! >> >> We'll try everything you suggested tomorrow, back in the lab. >> Then I'll tell you what we got. >> For now, I only wanna say that our main concern, instead of running runCA itself, is gonna be with the pre-assembly (correction) step, running PacBiotoCA and PBcR pipeline that are embedded in the wgs package. >> Please take a look at the following strategy to assemble the Drosophila genome sequenced by PacBio technology (which presents a high error rate on the base calling, ~15%) at CBCB in Maryland : >> http://cbcb.umd.edu/software/PBcR/dmel.html <http://cbcb.umd.edu/software/PBcR/dmel.html> >> They mentioned 621K CPU hours to correct that genome of ~122 Mb. >> Our organism genome is something like 380 Mb long. Three times Drosophila's one. >> Well, just to let you know again! ;-) >> >> Talk to you later, >> Thanks again. >> Good night! >> Elton >> >> 2015-06-02 20:19 GMT-03:00 Brian Walenz <th...@gm... <mailto:th...@gm...>>: >> For the link problems - all those symbols come out of the kmer package. Check that the flags and compilers and whatnot are compatible with those in wgs-assembler. >> >> The kmer configuration is a bit awkward. A shell script (configure.sh) dumps a config to Make.compilers, which is read by the main Makefile. 'gmake real-clean' will remove the previous build AND the Make.compilers file. 'gmake' by itself will first build a Make.compilers by calling configure.sh, then continue on with the build. The proper way to modify this is: >> >> edit configure.sh >> gmake real-clean >> gmake install >> repeat until it works >> >> In configure.sh, there is a block of flags for Linux-amd64. I think it'll be easy to apply the same changes made for wgs-assembler. >> >> After rebuilding kmer, the wgs-assembler build should need to just link -- in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do 'gmake clean' here! You might need to remove the dependency directory 'dep' too. >> >> >> For running - the assembler will emit an SGE submit command to run a single shell script on tens-to-hundreds-to-thousands of jobs. Each job will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is faster, fewer is slower). If you can figure out how to run jobs of the form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on on BG/Q you're most of the way to running CA. To make it output such a submit command, supply "useGrid=1 scriptOnGrid=0" to runCA. >> >> The other half of the assembler will be either large I/O or large memory. If you've got access to a machine with 256gb and 32 cores you should be fine. I don't know what a minimum usable machine size would be. >> >> So, the flow of the computer will be: >> >> On the 256gb machine: runCA useGrid=1 scriptOnGrid=0 .... >> Wait for it to emit a submit command >> Launch those jobs on BG/Q >> Wait for those to finish >> Relaunch runCA on the 256gb machine. It'll check that the job outputs are complete, and continue processing, probably emitting another submit command, so repeat. >> >> Historical note: back when runCA was first developed, we had a DEC Alpha Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few hundred 2 CPU, 2gb, 32-bit Linux machines. The Alpha wasn't in the grid, and a different architecture anyway, so we had to run CA this way. It was a real chore. We're all spoiled with our 4 core 8gb laptops now... >> >> b >> >> >> >> >> >> >> On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq... <mailto:elt...@iq...>> wrote: >> Thanks Brian, Serge and Huang, >> >> We've gone through fixing several error messages during the compilation within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. >> At the end of the day we stopped on "undefined reference" errors on static libraries (mainly libseq.a, please see make_progs.log file). >> >> The 'gmake install' command within the kmer/ dir ran just fine. >> >> The following indicates BGQ OS type: >> [erv3@bgq-fn src]$ uname -a >> Linux bgq-fn.rcsg.rice.edu <http://bgq-fn.rcsg.rice.edu/> 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux >> >> We also had to edit c_make.as <http://c_make.as/> file, adding some -I options (to indicate paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" section. >> >> Running "make objs" and "make libs" separately, everything appeared to work fine (see attached files make_objs.log and make_libs.log). >> The above mentioned trouble came up on the "make progs" final command we ran (make_progs.log file). >> >> Well, just to let you guys know and to see whether some light can be shed. >> >> Thanks a lot, >> Cheers, >> Elton >> >> PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do you think it isn't worthwhile keeping the attempt to install CA on BGQ? >> >> >> >> >> >> -- >> Elton Vasconcelos, DVM, PhD >> Post-doc at Verjovski-Almeida Lab >> Department of Biochemistry - Institute of Chemistry >> University of Sao Paulo, Brazil >> >> > > > > > -- > Elton Vasconcelos, DVM, PhD > Post-doc at Verjovski-Almeida Lab > Department of Biochemistry - Institute of Chemistry > University of Sao Paulo, Brazil > > <pacbio.spec> |
From: Elton V. <elt...@iq...> - 2015-06-09 17:29:33
|
Hy Brian and Serge, I forgot to tell you last week that I am not doing PacBio reads self-correction. Instead I'm doing hybrid assembly (26 SMRTcells plus 3 paired-end Illumina libraries). ### Question 1: ### Is it still worthwhile doing that in a single multi-thread machine? Cause I've seen a Sergey's comment that the pipeline is quite slower when considering correction with Illumina reads ( http://ehc.ac/p/wgs-assembler/mailman/message/33620582/) I am now running CA on a sinlge multi-thread server that has 80 threads and 1T RAM. It spent about 5 days on the "overlapInCore" step and I had to kill the process because of server owner's complaint about too many threads being consumed for a long time period. ### Question 2: ### Could you explain me why "overlapInCore" is using all available threads (80) instead of only the user's request (20)? My spec file is attached and I ran the following command: $ nohup /home/elton/wgs-8.3rc2/Linux-amd64/bin/pacBioToCA -l Test01 -threads 20 -shortReads -genomeSize 380000000 -s pacbio.spec -fastq 26-SMRTcells-filtered_subreads.fastq illumina-NEW.frg & Thanks a lot again for your attention and support, Best, Elton 2015-06-03 11:43 GMT-03:00 Serge Koren <ser...@gm...>: > Yes, the latest CA 8.3 release can assemble D. melanogaster in < 700CPU > hours. You can see the updated timings here: > > http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes > > I’ve routinely run D. melanogaster on a 16-core, 32GB machine in less than > a day (I haven’t timed it exactly) so for your genome you’re looking at > 3-4K cpu hours. You should be able to run it on a single 16-core 32GB > machine in a couple of days so I think it’s easiest to run it on a single > largish machine you have access to. > > Sergey > > On Jun 2, 2015, at 9:12 PM, Brian Walenz <th...@gm...> wrote: > > That's an old page. The most recent page, linked from > http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page, is: > > http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR > > (look for 'self correction') > > I've run drosophilia on my 12-core development machine in a few hours to > overnight (I haven't timed it). Sergey replaced blasr with a much much > faster algorithm, and that was where most of the time was spent. > > b > > > On Tue, Jun 2, 2015 at 9:02 PM, Elton Vasconcelos <elt...@iq...> > wrote: > >> Thanks for the hints, Brian! >> >> We'll try everything you suggested tomorrow, back in the lab. >> Then I'll tell you what we got. >> For now, I only wanna say that our main concern, instead of running runCA >> itself, is gonna be with the pre-assembly (correction) step, running >> PacBiotoCA and PBcR pipeline that are embedded in the wgs package. >> Please take a look at the following strategy to assemble the Drosophila >> genome sequenced by PacBio technology (which presents a high error rate on >> the base calling, ~15%) at CBCB in Maryland : >> http://cbcb.umd.edu/software/PBcR/dmel.html >> They mentioned 621K CPU hours to correct that genome of ~122 Mb. >> Our organism genome is something like 380 Mb long. Three times >> Drosophila's one. >> Well, just to let you know again! ;-) >> >> Talk to you later, >> Thanks again. >> Good night! >> Elton >> >> 2015-06-02 20:19 GMT-03:00 Brian Walenz <th...@gm...>: >> >>> For the link problems - all those symbols come out of the kmer package. >>> Check that the flags and compilers and whatnot are compatible with those in >>> wgs-assembler. >>> >>> The kmer configuration is a bit awkward. A shell script (configure.sh) >>> dumps a config to Make.compilers, which is read by the main Makefile. >>> 'gmake real-clean' will remove the previous build AND the Make.compilers >>> file. 'gmake' by itself will first build a Make.compilers by calling >>> configure.sh, then continue on with the build. The proper way to modify >>> this is: >>> >>> edit configure.sh >>> gmake real-clean >>> gmake install >>> repeat until it works >>> >>> In configure.sh, there is a block of flags for Linux-amd64. I think >>> it'll be easy to apply the same changes made for wgs-assembler. >>> >>> After rebuilding kmer, the wgs-assembler build should need to just link >>> -- in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do >>> 'gmake clean' here! You might need to remove the dependency directory >>> 'dep' too. >>> >>> >>> For running - the assembler will emit an SGE submit command to run a >>> single shell script on tens-to-hundreds-to-thousands of jobs. Each job >>> will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is >>> faster, fewer is slower). If you can figure out how to run jobs of the >>> form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on >>> on BG/Q you're most of the way to running CA. To make it output such a >>> submit command, supply "useGrid=1 scriptOnGrid=0" to runCA. >>> >>> The other half of the assembler will be either large I/O or large >>> memory. If you've got access to a machine with 256gb and 32 cores you >>> should be fine. I don't know what a minimum usable machine size would be. >>> >>> So, the flow of the computer will be: >>> >>> On the 256gb machine: runCA useGrid=1 scriptOnGrid=0 .... >>> Wait for it to emit a submit command >>> Launch those jobs on BG/Q >>> Wait for those to finish >>> Relaunch runCA on the 256gb machine. It'll check that the job outputs >>> are complete, and continue processing, probably emitting another submit >>> command, so repeat. >>> >>> Historical note: back when runCA was first developed, we had a DEC Alpha >>> Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few hundred 2 >>> CPU, 2gb, 32-bit Linux machines. The Alpha wasn't in the grid, and a >>> different architecture anyway, so we had to run CA this way. It was a real >>> chore. We're all spoiled with our 4 core 8gb laptops now... >>> >>> b >>> >>> >>> >>> >>> >>> >>> On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq...> >>> wrote: >>> >>>> Thanks Brian, Serge and Huang, >>>> >>>> We've gone through fixing several error messages during the compilation >>>> within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. >>>> At the end of the day we stopped on "undefined reference" errors on >>>> static libraries (mainly libseq.a, please see make_progs.log file). >>>> >>>> The 'gmake install' command within the kmer/ dir ran just fine. >>>> >>>> The following indicates BGQ OS type: >>>> [erv3@bgq-fn src]$ uname -a >>>> Linux bgq-fn.rcsg.rice.edu 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 >>>> 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux >>>> >>>> We also had to edit c_make.as file, adding some -I options (to >>>> indicate paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" >>>> section. >>>> >>>> Running "make objs" and "make libs" separately, everything appeared to >>>> work fine (see attached files make_objs.log and make_libs.log). >>>> The above mentioned trouble came up on the "make progs" final command >>>> we ran (make_progs.log file). >>>> >>>> Well, just to let you guys know and to see whether some light can be >>>> shed. >>>> >>>> Thanks a lot, >>>> Cheers, >>>> Elton >>>> >>>> PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do >>>> you think it isn't worthwhile keeping the attempt to install CA on BGQ? >>>> >>>> >>>> >> >> >> -- >> Elton Vasconcelos, DVM, PhD >> Post-doc at Verjovski-Almeida Lab >> Department of Biochemistry - Institute of Chemistry >> University of Sao Paulo, Brazil >> >> > > -- Elton Vasconcelos, DVM, PhD Post-doc at Verjovski-Almeida Lab Department of Biochemistry - Institute of Chemistry University of Sao Paulo, Brazil |
From: Serge K. <ser...@gm...> - 2015-06-03 14:43:15
|
Yes, the latest CA 8.3 release can assemble D. melanogaster in < 700CPU hours. You can see the updated timings here: http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes <http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes> I’ve routinely run D. melanogaster on a 16-core, 32GB machine in less than a day (I haven’t timed it exactly) so for your genome you’re looking at 3-4K cpu hours. You should be able to run it on a single 16-core 32GB machine in a couple of days so I think it’s easiest to run it on a single largish machine you have access to. Sergey > On Jun 2, 2015, at 9:12 PM, Brian Walenz <th...@gm...> wrote: > > That's an old page. The most recent page, linked from http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page <http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page>, is: > > http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR> > > (look for 'self correction') > > I've run drosophilia on my 12-core development machine in a few hours to overnight (I haven't timed it). Sergey replaced blasr with a much much faster algorithm, and that was where most of the time was spent. > > b > > > On Tue, Jun 2, 2015 at 9:02 PM, Elton Vasconcelos <elt...@iq... <mailto:elt...@iq...>> wrote: > Thanks for the hints, Brian! > > We'll try everything you suggested tomorrow, back in the lab. > Then I'll tell you what we got. > For now, I only wanna say that our main concern, instead of running runCA itself, is gonna be with the pre-assembly (correction) step, running PacBiotoCA and PBcR pipeline that are embedded in the wgs package. > Please take a look at the following strategy to assemble the Drosophila genome sequenced by PacBio technology (which presents a high error rate on the base calling, ~15%) at CBCB in Maryland : > http://cbcb.umd.edu/software/PBcR/dmel.html <http://cbcb.umd.edu/software/PBcR/dmel.html> > They mentioned 621K CPU hours to correct that genome of ~122 Mb. > Our organism genome is something like 380 Mb long. Three times Drosophila's one. > Well, just to let you know again! ;-) > > Talk to you later, > Thanks again. > Good night! > Elton > > 2015-06-02 20:19 GMT-03:00 Brian Walenz <th...@gm... <mailto:th...@gm...>>: > For the link problems - all those symbols come out of the kmer package. Check that the flags and compilers and whatnot are compatible with those in wgs-assembler. > > The kmer configuration is a bit awkward. A shell script (configure.sh) dumps a config to Make.compilers, which is read by the main Makefile. 'gmake real-clean' will remove the previous build AND the Make.compilers file. 'gmake' by itself will first build a Make.compilers by calling configure.sh, then continue on with the build. The proper way to modify this is: > > edit configure.sh > gmake real-clean > gmake install > repeat until it works > > In configure.sh, there is a block of flags for Linux-amd64. I think it'll be easy to apply the same changes made for wgs-assembler. > > After rebuilding kmer, the wgs-assembler build should need to just link -- in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do 'gmake clean' here! You might need to remove the dependency directory 'dep' too. > > > For running - the assembler will emit an SGE submit command to run a single shell script on tens-to-hundreds-to-thousands of jobs. Each job will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is faster, fewer is slower). If you can figure out how to run jobs of the form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on on BG/Q you're most of the way to running CA. To make it output such a submit command, supply "useGrid=1 scriptOnGrid=0" to runCA. > > The other half of the assembler will be either large I/O or large memory. If you've got access to a machine with 256gb and 32 cores you should be fine. I don't know what a minimum usable machine size would be. > > So, the flow of the computer will be: > > On the 256gb machine: runCA useGrid=1 scriptOnGrid=0 .... > Wait for it to emit a submit command > Launch those jobs on BG/Q > Wait for those to finish > Relaunch runCA on the 256gb machine. It'll check that the job outputs are complete, and continue processing, probably emitting another submit command, so repeat. > > Historical note: back when runCA was first developed, we had a DEC Alpha Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few hundred 2 CPU, 2gb, 32-bit Linux machines. The Alpha wasn't in the grid, and a different architecture anyway, so we had to run CA this way. It was a real chore. We're all spoiled with our 4 core 8gb laptops now... > > b > > > > > > > On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq... <mailto:elt...@iq...>> wrote: > Thanks Brian, Serge and Huang, > > We've gone through fixing several error messages during the compilation within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. > At the end of the day we stopped on "undefined reference" errors on static libraries (mainly libseq.a, please see make_progs.log file). > > The 'gmake install' command within the kmer/ dir ran just fine. > > The following indicates BGQ OS type: > [erv3@bgq-fn src]$ uname -a > Linux bgq-fn.rcsg.rice.edu <http://bgq-fn.rcsg.rice.edu/> 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux > > We also had to edit c_make.as <http://c_make.as/> file, adding some -I options (to indicate paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" section. > > Running "make objs" and "make libs" separately, everything appeared to work fine (see attached files make_objs.log and make_libs.log). > The above mentioned trouble came up on the "make progs" final command we ran (make_progs.log file). > > Well, just to let you guys know and to see whether some light can be shed. > > Thanks a lot, > Cheers, > Elton > > PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do you think it isn't worthwhile keeping the attempt to install CA on BGQ? > > > > > > -- > Elton Vasconcelos, DVM, PhD > Post-doc at Verjovski-Almeida Lab > Department of Biochemistry - Institute of Chemistry > University of Sao Paulo, Brazil > > |
From: Brian W. <th...@gm...> - 2015-06-03 01:13:06
|
That's an old page. The most recent page, linked from http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page, is: http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR (look for 'self correction') I've run drosophilia on my 12-core development machine in a few hours to overnight (I haven't timed it). Sergey replaced blasr with a much much faster algorithm, and that was where most of the time was spent. b On Tue, Jun 2, 2015 at 9:02 PM, Elton Vasconcelos <elt...@iq...> wrote: > Thanks for the hints, Brian! > > We'll try everything you suggested tomorrow, back in the lab. > Then I'll tell you what we got. > For now, I only wanna say that our main concern, instead of running runCA > itself, is gonna be with the pre-assembly (correction) step, running > PacBiotoCA and PBcR pipeline that are embedded in the wgs package. > Please take a look at the following strategy to assemble the Drosophila > genome sequenced by PacBio technology (which presents a high error rate on > the base calling, ~15%) at CBCB in Maryland : > http://cbcb.umd.edu/software/PBcR/dmel.html > They mentioned 621K CPU hours to correct that genome of ~122 Mb. > Our organism genome is something like 380 Mb long. Three times > Drosophila's one. > Well, just to let you know again! ;-) > > Talk to you later, > Thanks again. > Good night! > Elton > > 2015-06-02 20:19 GMT-03:00 Brian Walenz <th...@gm...>: > >> For the link problems - all those symbols come out of the kmer package. >> Check that the flags and compilers and whatnot are compatible with those in >> wgs-assembler. >> >> The kmer configuration is a bit awkward. A shell script (configure.sh) >> dumps a config to Make.compilers, which is read by the main Makefile. >> 'gmake real-clean' will remove the previous build AND the Make.compilers >> file. 'gmake' by itself will first build a Make.compilers by calling >> configure.sh, then continue on with the build. The proper way to modify >> this is: >> >> edit configure.sh >> gmake real-clean >> gmake install >> repeat until it works >> >> In configure.sh, there is a block of flags for Linux-amd64. I think >> it'll be easy to apply the same changes made for wgs-assembler. >> >> After rebuilding kmer, the wgs-assembler build should need to just link >> -- in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do >> 'gmake clean' here! You might need to remove the dependency directory >> 'dep' too. >> >> >> For running - the assembler will emit an SGE submit command to run a >> single shell script on tens-to-hundreds-to-thousands of jobs. Each job >> will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is >> faster, fewer is slower). If you can figure out how to run jobs of the >> form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on >> on BG/Q you're most of the way to running CA. To make it output such a >> submit command, supply "useGrid=1 scriptOnGrid=0" to runCA. >> >> The other half of the assembler will be either large I/O or large >> memory. If you've got access to a machine with 256gb and 32 cores you >> should be fine. I don't know what a minimum usable machine size would be. >> >> So, the flow of the computer will be: >> >> On the 256gb machine: runCA useGrid=1 scriptOnGrid=0 .... >> Wait for it to emit a submit command >> Launch those jobs on BG/Q >> Wait for those to finish >> Relaunch runCA on the 256gb machine. It'll check that the job outputs >> are complete, and continue processing, probably emitting another submit >> command, so repeat. >> >> Historical note: back when runCA was first developed, we had a DEC Alpha >> Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few hundred 2 >> CPU, 2gb, 32-bit Linux machines. The Alpha wasn't in the grid, and a >> different architecture anyway, so we had to run CA this way. It was a real >> chore. We're all spoiled with our 4 core 8gb laptops now... >> >> b >> >> >> >> >> >> >> On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq...> >> wrote: >> >>> Thanks Brian, Serge and Huang, >>> >>> We've gone through fixing several error messages during the compilation >>> within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. >>> At the end of the day we stopped on "undefined reference" errors on >>> static libraries (mainly libseq.a, please see make_progs.log file). >>> >>> The 'gmake install' command within the kmer/ dir ran just fine. >>> >>> The following indicates BGQ OS type: >>> [erv3@bgq-fn src]$ uname -a >>> Linux bgq-fn.rcsg.rice.edu 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 >>> 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux >>> >>> We also had to edit c_make.as file, adding some -I options (to indicate >>> paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" section. >>> >>> Running "make objs" and "make libs" separately, everything appeared to >>> work fine (see attached files make_objs.log and make_libs.log). >>> The above mentioned trouble came up on the "make progs" final command we >>> ran (make_progs.log file). >>> >>> Well, just to let you guys know and to see whether some light can be >>> shed. >>> >>> Thanks a lot, >>> Cheers, >>> Elton >>> >>> PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do >>> you think it isn't worthwhile keeping the attempt to install CA on BGQ? >>> >>> >>> > > > -- > Elton Vasconcelos, DVM, PhD > Post-doc at Verjovski-Almeida Lab > Department of Biochemistry - Institute of Chemistry > University of Sao Paulo, Brazil > > |
From: Elton V. <elt...@iq...> - 2015-06-03 01:02:35
|
Thanks for the hints, Brian! We'll try everything you suggested tomorrow, back in the lab. Then I'll tell you what we got. For now, I only wanna say that our main concern, instead of running runCA itself, is gonna be with the pre-assembly (correction) step, running PacBiotoCA and PBcR pipeline that are embedded in the wgs package. Please take a look at the following strategy to assemble the Drosophila genome sequenced by PacBio technology (which presents a high error rate on the base calling, ~15%) at CBCB in Maryland : http://cbcb.umd.edu/software/PBcR/dmel.html They mentioned 621K CPU hours to correct that genome of ~122 Mb. Our organism genome is something like 380 Mb long. Three times Drosophila's one. Well, just to let you know again! ;-) Talk to you later, Thanks again. Good night! Elton 2015-06-02 20:19 GMT-03:00 Brian Walenz <th...@gm...>: > For the link problems - all those symbols come out of the kmer package. > Check that the flags and compilers and whatnot are compatible with those in > wgs-assembler. > > The kmer configuration is a bit awkward. A shell script (configure.sh) > dumps a config to Make.compilers, which is read by the main Makefile. > 'gmake real-clean' will remove the previous build AND the Make.compilers > file. 'gmake' by itself will first build a Make.compilers by calling > configure.sh, then continue on with the build. The proper way to modify > this is: > > edit configure.sh > gmake real-clean > gmake install > repeat until it works > > In configure.sh, there is a block of flags for Linux-amd64. I think it'll > be easy to apply the same changes made for wgs-assembler. > > After rebuilding kmer, the wgs-assembler build should need to just link -- > in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do > 'gmake clean' here! You might need to remove the dependency directory > 'dep' too. > > > For running - the assembler will emit an SGE submit command to run a > single shell script on tens-to-hundreds-to-thousands of jobs. Each job > will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is > faster, fewer is slower). If you can figure out how to run jobs of the > form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on > on BG/Q you're most of the way to running CA. To make it output such a > submit command, supply "useGrid=1 scriptOnGrid=0" to runCA. > > The other half of the assembler will be either large I/O or large memory. > If you've got access to a machine with 256gb and 32 cores you should be > fine. I don't know what a minimum usable machine size would be. > > So, the flow of the computer will be: > > On the 256gb machine: runCA useGrid=1 scriptOnGrid=0 .... > Wait for it to emit a submit command > Launch those jobs on BG/Q > Wait for those to finish > Relaunch runCA on the 256gb machine. It'll check that the job outputs are > complete, and continue processing, probably emitting another submit > command, so repeat. > > Historical note: back when runCA was first developed, we had a DEC Alpha > Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few hundred 2 > CPU, 2gb, 32-bit Linux machines. The Alpha wasn't in the grid, and a > different architecture anyway, so we had to run CA this way. It was a real > chore. We're all spoiled with our 4 core 8gb laptops now... > > b > > > > > > > On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq...> > wrote: > >> Thanks Brian, Serge and Huang, >> >> We've gone through fixing several error messages during the compilation >> within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. >> At the end of the day we stopped on "undefined reference" errors on >> static libraries (mainly libseq.a, please see make_progs.log file). >> >> The 'gmake install' command within the kmer/ dir ran just fine. >> >> The following indicates BGQ OS type: >> [erv3@bgq-fn src]$ uname -a >> Linux bgq-fn.rcsg.rice.edu 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 >> 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux >> >> We also had to edit c_make.as file, adding some -I options (to indicate >> paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" section. >> >> Running "make objs" and "make libs" separately, everything appeared to >> work fine (see attached files make_objs.log and make_libs.log). >> The above mentioned trouble came up on the "make progs" final command we >> ran (make_progs.log file). >> >> Well, just to let you guys know and to see whether some light can be shed. >> >> Thanks a lot, >> Cheers, >> Elton >> >> PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do you >> think it isn't worthwhile keeping the attempt to install CA on BGQ? >> >> >> -- Elton Vasconcelos, DVM, PhD Post-doc at Verjovski-Almeida Lab Department of Biochemistry - Institute of Chemistry University of Sao Paulo, Brazil |
From: Brian W. <th...@gm...> - 2015-06-02 23:19:08
|
For the link problems - all those symbols come out of the kmer package. Check that the flags and compilers and whatnot are compatible with those in wgs-assembler. The kmer configuration is a bit awkward. A shell script (configure.sh) dumps a config to Make.compilers, which is read by the main Makefile. 'gmake real-clean' will remove the previous build AND the Make.compilers file. 'gmake' by itself will first build a Make.compilers by calling configure.sh, then continue on with the build. The proper way to modify this is: edit configure.sh gmake real-clean gmake install repeat until it works In configure.sh, there is a block of flags for Linux-amd64. I think it'll be easy to apply the same changes made for wgs-assembler. After rebuilding kmer, the wgs-assembler build should need to just link -- in other words, remove just wgs-assembler/Linux-amd64/bin -- don't do 'gmake clean' here! You might need to remove the dependency directory 'dep' too. For running - the assembler will emit an SGE submit command to run a single shell script on tens-to-hundreds-to-thousands of jobs. Each job will be 8-32gb (tunable) and 1-32 cores (nothing special here: more is faster, fewer is slower). If you can figure out how to run jobs of the form "command.sh 1", "command.sh 2", "command.sh 3", ..., "command.sh N" on on BG/Q you're most of the way to running CA. To make it output such a submit command, supply "useGrid=1 scriptOnGrid=0" to runCA. The other half of the assembler will be either large I/O or large memory. If you've got access to a machine with 256gb and 32 cores you should be fine. I don't know what a minimum usable machine size would be. So, the flow of the computer will be: On the 256gb machine: runCA useGrid=1 scriptOnGrid=0 .... Wait for it to emit a submit command Launch those jobs on BG/Q Wait for those to finish Relaunch runCA on the 256gb machine. It'll check that the job outputs are complete, and continue processing, probably emitting another submit command, so repeat. Historical note: back when runCA was first developed, we had a DEC Alpha Tru64 machine with 4 CPUs and 32gb of RAM, and a grid of a few hundred 2 CPU, 2gb, 32-bit Linux machines. The Alpha wasn't in the grid, and a different architecture anyway, so we had to run CA this way. It was a real chore. We're all spoiled with our 4 core 8gb laptops now... b On Tue, Jun 2, 2015 at 5:49 PM, Elton Vasconcelos <elt...@iq...> wrote: > Thanks Brian, Serge and Huang, > > We've gone through fixing several error messages during the compilation > within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. > At the end of the day we stopped on "undefined reference" errors on static > libraries (mainly libseq.a, please see make_progs.log file). > > The 'gmake install' command within the kmer/ dir ran just fine. > > The following indicates BGQ OS type: > [erv3@bgq-fn src]$ uname -a > Linux bgq-fn.rcsg.rice.edu 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 > 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux > > We also had to edit c_make.as file, adding some -I options (to indicate > paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" section. > > Running "make objs" and "make libs" separately, everything appeared to > work fine (see attached files make_objs.log and make_libs.log). > The above mentioned trouble came up on the "make progs" final command we > ran (make_progs.log file). > > Well, just to let you guys know and to see whether some light can be shed. > > Thanks a lot, > Cheers, > Elton > > PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do you > think it isn't worthwhile keeping the attempt to install CA on BGQ? > > > |
From: Elton V. <elt...@iq...> - 2015-06-02 21:49:39
|
Thanks Brian, Serge and Huang, We've gone through fixing several error messages during the compilation within the src/ dir from the latest wgs-8.3rc2.tar.bz2 package. At the end of the day we stopped on "undefined reference" errors on static libraries (mainly libseq.a, please see make_progs.log file). The 'gmake install' command within the kmer/ dir ran just fine. The following indicates BGQ OS type: [erv3@bgq-fn src]$ uname -a Linux bgq-fn.rcsg.rice.edu 2.6.32-431.el6.ppc64 #1 SMP Sun Nov 10 22:17:43 EST 2013 ppc64 ppc64 ppc64 GNU/Linux We also had to edit c_make.as file, adding some -I options (to indicate paths to libraries) on the CFLAGS fields from the "OSTYPE, Linux" section. Running "make objs" and "make libs" separately, everything appeared to work fine (see attached files make_objs.log and make_libs.log). The above mentioned trouble came up on the "make progs" final command we ran (make_progs.log file). Well, just to let you guys know and to see whether some light can be shed. Thanks a lot, Cheers, Elton PS: I also noticed about the MPI cluster system on BGQ, Brian. So, do you think it isn't worthwhile keeping the attempt to install CA on BGQ? 2015-06-02 17:15 GMT-03:00 Walenz, Brian <wa...@nb...>: > Poking around a bit too, it looks like BlueGene/P only supports MPI, which > the assembler doesn't. The assembler needs SGE (or PBS or LSF) to run > independent threaded jobs. > > BlueGene/Q has 16 threads and 16 GB per node. This is a better match to > assembler workloads. It'll still need some kind of batch scheduler. > > b > > > > -----Original Message----- > From: Serge Koren [mailto:ser...@gm...] > Sent: Tuesday, June 02, 2015 2:48 PM > To: Brian Walenz > Cc: wgs...@li...; Elton Vasconcelos > Subject: Re: [wgs-assembler-users] CA on BlueGene server at Rice University > > Looking at the system description, assuming this is the system: > http://www.rcsg.rice.edu/sharecore/bluegenep-bgp/ < > http://www.rcsg.rice.edu/sharecore/bluegenep-bgp/> > > The cores are 32-bit which would limit your processes to 4GB and wouldn’t > work well for assembly. Plus, we haven’t compiled/tested the assembler on > 32-bit platforms in years so I don’t think it’s worth your time to try to > compile it on there. How big is your genome? A 70X human correction takes > about 8K cpu hours to generate corrected reads (is that what you mean by > pre-assemble?) and total runtime (including correct + assemble) is about > 50K cpu so with 2GHz CPUs you should be closer to a week for a full run not > 30 days with almost 1000 cores (there is some overhead so not all steps > would use all your cores). The assembler supports multiple grid systems > (SGE, LSF, PBS) with a shared filesystem and I see Rice University has some > clusters available so I’d recommend using one of those rather than > recompiling. > > > > On Jun 2, 2015, at 11:11 AM, Brian Walenz <th...@gm...> wrote: > > > > Nope, not on our mind. A lack of access is the primary problem, > followed closely by a lack of time and a lack of demand. > > > > There isn't anything fancy or gcc-specific in the code though. It does > compile with clang (without thread support, but that's clang's fault). > Mucking with c_make.as <http://c_make.as/> might be all that is needed. > To start, try copying the 'Darwin' section, and changing the OSTYPE test to > whatever 'uname' reports, and the compiler to icc. Icc will probably > complain about ARCH_CFLAGS, so might as well get rid of all of those. -O > (optimize) is pretty generic. > > > > b > > > > > > > > On Tue, Jun 2, 2015 at 9:41 AM, Elton Vasconcelos <elt...@iq... > <mailto:elt...@iq...>> wrote: > > Hello folks, > > > > I wonder whether it is on CA developers mind to generate a wgs-assembler > version that is compatible with IBM compilers, so we could run it on BG/P > and/or BG/Q supercomputers at Rice University. > > I am trying to pre-assemble my target genome sequenced by PacBio > technology. By my calculations, I am gonna need 800 CPUs to run it in 30 > days. > > > > Thanks in advance for your attention, > > Cheers, > > Elton > > > > -- > > Elton Vasconcelos, DVM, PhD > > Post-doc at Verjovski-Almeida Lab > > Department of Biochemistry - Institute of Chemistry University of Sao > > Paulo, Brazil > > > > > > ---------------------------------------------------------------------- > > -------- > > > > _______________________________________________ > > wgs-assembler-users mailing list > > wgs...@li... > > <mailto:wgs...@li...> > > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > <https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users> > > > > > > ---------------------------------------------------------------------- > > -------- _______________________________________________ > > wgs-assembler-users mailing list > > wgs...@li... > > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > -- Elton Vasconcelos, DVM, PhD Post-doc at Verjovski-Almeida Lab Department of Biochemistry - Institute of Chemistry University of Sao Paulo, Brazil |
From: Walenz, B. <wa...@nb...> - 2015-06-02 20:51:07
|
Poking around a bit too, it looks like BlueGene/P only supports MPI, which the assembler doesn't. The assembler needs SGE (or PBS or LSF) to run independent threaded jobs. BlueGene/Q has 16 threads and 16 GB per node. This is a better match to assembler workloads. It'll still need some kind of batch scheduler. b -----Original Message----- From: Serge Koren [mailto:ser...@gm...] Sent: Tuesday, June 02, 2015 2:48 PM To: Brian Walenz Cc: wgs...@li...; Elton Vasconcelos Subject: Re: [wgs-assembler-users] CA on BlueGene server at Rice University Looking at the system description, assuming this is the system: http://www.rcsg.rice.edu/sharecore/bluegenep-bgp/ <http://www.rcsg.rice.edu/sharecore/bluegenep-bgp/> The cores are 32-bit which would limit your processes to 4GB and wouldn’t work well for assembly. Plus, we haven’t compiled/tested the assembler on 32-bit platforms in years so I don’t think it’s worth your time to try to compile it on there. How big is your genome? A 70X human correction takes about 8K cpu hours to generate corrected reads (is that what you mean by pre-assemble?) and total runtime (including correct + assemble) is about 50K cpu so with 2GHz CPUs you should be closer to a week for a full run not 30 days with almost 1000 cores (there is some overhead so not all steps would use all your cores). The assembler supports multiple grid systems (SGE, LSF, PBS) with a shared filesystem and I see Rice University has some clusters available so I’d recommend using one of those rather than recompiling. > On Jun 2, 2015, at 11:11 AM, Brian Walenz <th...@gm...> wrote: > > Nope, not on our mind. A lack of access is the primary problem, followed closely by a lack of time and a lack of demand. > > There isn't anything fancy or gcc-specific in the code though. It does compile with clang (without thread support, but that's clang's fault). Mucking with c_make.as <http://c_make.as/> might be all that is needed. To start, try copying the 'Darwin' section, and changing the OSTYPE test to whatever 'uname' reports, and the compiler to icc. Icc will probably complain about ARCH_CFLAGS, so might as well get rid of all of those. -O (optimize) is pretty generic. > > b > > > > On Tue, Jun 2, 2015 at 9:41 AM, Elton Vasconcelos <elt...@iq... <mailto:elt...@iq...>> wrote: > Hello folks, > > I wonder whether it is on CA developers mind to generate a wgs-assembler version that is compatible with IBM compilers, so we could run it on BG/P and/or BG/Q supercomputers at Rice University. > I am trying to pre-assemble my target genome sequenced by PacBio technology. By my calculations, I am gonna need 800 CPUs to run it in 30 days. > > Thanks in advance for your attention, > Cheers, > Elton > > -- > Elton Vasconcelos, DVM, PhD > Post-doc at Verjovski-Almeida Lab > Department of Biochemistry - Institute of Chemistry University of Sao > Paulo, Brazil > > > ---------------------------------------------------------------------- > -------- > > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > <mailto:wgs...@li...> > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > <https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users> > > > ---------------------------------------------------------------------- > -------- _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Serge K. <ser...@gm...> - 2015-06-02 18:47:45
|
Looking at the system description, assuming this is the system: http://www.rcsg.rice.edu/sharecore/bluegenep-bgp/ <http://www.rcsg.rice.edu/sharecore/bluegenep-bgp/> The cores are 32-bit which would limit your processes to 4GB and wouldn’t work well for assembly. Plus, we haven’t compiled/tested the assembler on 32-bit platforms in years so I don’t think it’s worth your time to try to compile it on there. How big is your genome? A 70X human correction takes about 8K cpu hours to generate corrected reads (is that what you mean by pre-assemble?) and total runtime (including correct + assemble) is about 50K cpu so with 2GHz CPUs you should be closer to a week for a full run not 30 days with almost 1000 cores (there is some overhead so not all steps would use all your cores). The assembler supports multiple grid systems (SGE, LSF, PBS) with a shared filesystem and I see Rice University has some clusters available so I’d recommend using one of those rather than recompiling. > On Jun 2, 2015, at 11:11 AM, Brian Walenz <th...@gm...> wrote: > > Nope, not on our mind. A lack of access is the primary problem, followed closely by a lack of time and a lack of demand. > > There isn't anything fancy or gcc-specific in the code though. It does compile with clang (without thread support, but that's clang's fault). Mucking with c_make.as <http://c_make.as/> might be all that is needed. To start, try copying the 'Darwin' section, and changing the OSTYPE test to whatever 'uname' reports, and the compiler to icc. Icc will probably complain about ARCH_CFLAGS, so might as well get rid of all of those. -O (optimize) is pretty generic. > > b > > > > On Tue, Jun 2, 2015 at 9:41 AM, Elton Vasconcelos <elt...@iq... <mailto:elt...@iq...>> wrote: > Hello folks, > > I wonder whether it is on CA developers mind to generate a wgs-assembler version that is compatible with IBM compilers, so we could run it on BG/P and/or BG/Q supercomputers at Rice University. > I am trying to pre-assemble my target genome sequenced by PacBio technology. By my calculations, I am gonna need 800 CPUs to run it in 30 days. > > Thanks in advance for your attention, > Cheers, > Elton > > -- > Elton Vasconcelos, DVM, PhD > Post-doc at Verjovski-Almeida Lab > Department of Biochemistry - Institute of Chemistry > University of Sao Paulo, Brazil > > > ------------------------------------------------------------------------------ > > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... <mailto:wgs...@li...> > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users <https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users> > > > ------------------------------------------------------------------------------ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Brian W. <th...@gm...> - 2015-06-02 15:12:01
|
Nope, not on our mind. A lack of access is the primary problem, followed closely by a lack of time and a lack of demand. There isn't anything fancy or gcc-specific in the code though. It does compile with clang (without thread support, but that's clang's fault). Mucking with c_make.as might be all that is needed. To start, try copying the 'Darwin' section, and changing the OSTYPE test to whatever 'uname' reports, and the compiler to icc. Icc will probably complain about ARCH_CFLAGS, so might as well get rid of all of those. -O (optimize) is pretty generic. b On Tue, Jun 2, 2015 at 9:41 AM, Elton Vasconcelos <elt...@iq...> wrote: > Hello folks, > > I wonder whether it is on CA developers mind to generate a wgs-assembler > version that is compatible with IBM compilers, so we could run it on BG/P > and/or BG/Q supercomputers at Rice University. > I am trying to pre-assemble my target genome sequenced by PacBio > technology. By my calculations, I am gonna need 800 CPUs to run it in 30 > days. > > Thanks in advance for your attention, > Cheers, > Elton > > -- > Elton Vasconcelos, DVM, PhD > Post-doc at Verjovski-Almeida Lab > Department of Biochemistry - Institute of Chemistry > University of Sao Paulo, Brazil > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |
From: Elton V. <elt...@iq...> - 2015-06-02 14:48:13
|
Hello folks, I wonder whether it is on CA developers mind to generate a wgs-assembler version that is compatible with IBM compilers, so we could run it on BG/P and/or BG/Q supercomputers at Rice University. I am trying to pre-assemble my target genome sequenced by PacBio technology. By my calculations, I am gonna need 800 CPUs to run it in 30 days. Thanks in advance for your attention, Cheers, Elton -- Elton Vasconcelos, DVM, PhD Post-doc at Verjovski-Almeida Lab Department of Biochemistry - Institute of Chemistry University of Sao Paulo, Brazil |
From: Serge K. <ser...@gm...> - 2015-05-28 14:39:51
|
Yes, the 25X coverage is for after correction. However, 12X should be sufficient for a reasonable assembly, certainly not a tiny fraction of your genome like you’re seeing. If you run gatekeeper -dumpinfo PI440795_Self_Assembled/asm.gkpStore That should give more info on what reads made it into the assembly. > On May 28, 2015, at 10:37 AM, Seth Munholland <mu...@uw...> wrote: > > After correction I ended up with about 12x coverage. I presume the ~25x coverage suggested on the PBcR page is for after correction? I'll try the low-coverage parameters and double check BLASR/PBDAGCON next, thanks. > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Wed, May 27, 2015 at 6:56 PM, Serge Koren <ser...@gm... <mailto:ser...@gm...>> wrote: > That most likely means you ended up with too little coverage for assembly after correction. You can check the coverage in the PI440795_Self_Assembled*.fastq files. If you’re not already, I’d suggest using the low-coverage parameters on the wiki page: > http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly> > > I’d also double-check that you have BLASR/PBDAGCON available in your path and that it is being used for assembly (in your tempPI440795_Self_Assembled/runPartition.sh file look for the word pbdagcon). > >> On May 26, 2015, at 12:03 PM, Seth Munholland <mu...@uw... <mailto:mu...@uw...>> wrote: >> >> Hi Serge, >> >> I looked into the 9-terminator folder and found the asm.utg.fasta file, but it's only 4.5MB (~0.007x coverage) when I started wth ~33x coverage. Any suggestions for where to look for the data loss? >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 <> >> On Tue, May 26, 2015 at 11:18 AM, Serge Koren <ser...@gm... <mailto:ser...@gm...>> wrote: >> Hi, >> >> This was a bug fixed in CA 8.3rc2 (when the assembly of the corrected data failed, the restart did not work properly). If you grep for runCA in your command line output from your run and re-run the last command (it should have the library name as the -d option). That will re-create the 9-terminator directory and corresponding files. Unless you install the missing perl package, the qc generation will still fail, but it only contains statistics on the assembly, the asm.utg.fasta file should be your complete assembly. >> >> Serge >> >> >>> On May 26, 2015, at 10:49 AM, Seth Munholland <mu...@uw... <mailto:mu...@uw...>> wrote: >>> >>> Hello Everyone, >>> >>> I was running a PBcR through to assembly with nothing in my spec file excet memory options since I share the server. I got all the way to step 9 (terminator) when I got the following error: >>> >>> ----------------------------------------START Tue May 26 02:18:24 2015 >>> /usr/bin/env perl /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> -euid /lore/bill.crosby.storage/PI440795/PI440795_Self_Assembled/9-terminator/asm.asm >>> Can't locate Statistics/Descriptive.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> line 18. >>> BEGIN failed--compilation aborted at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> line 18. >>> ----------------------------------------END Tue May 26 02:18:24 2015 (0 seconds) >>> ERROR: Failed with signal INT (2) >>> The Cleaner has arrived. Doing 'none'. >>> ----------------------------------------END Tue May 26 02:18:24 2015 (1490 seconds) >>> >>> I google search tells me that I can try manually running asmOutputFasta to try and make the missing output fasta (http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/ <http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/>). When I try it the fasta files are only ~3MB. The same link warns that the asm may be incomplete and I might have to repeat step 9 in runCA, this is where I get stuck. >>> >>> I've renamed the 9-terminator folder to 9-terminator-old, but what is the command for runCA to pickup a PBcR run? I tried specifying the directory, prefix, and spec file and after changing to the hash memory options in my spec file i get: >>> >>> Failure message: >>> >>> no fragment files specified, and stores not already created >>> >>> While trying to rerun the PBcR command again gives: >>> >>> Error: requested to output PI440795_Self_Assembled.frg but file already exists. Will not overwrite. >>> >>> >>> Seth Munholland, B.Sc. >>> Department of Biological Sciences >>> Rm. 304 Biology Building >>> University of Windsor >>> 401 Sunset Ave. N9B 3P4 >>> T: (519) 253-3000 Ext: 4755 <>------------------------------------------------------------------------------ >>> One dashboard for servers and applications across Physical-Virtual-Cloud >>> Widest out-of-the-box monitoring support with 50+ applications >>> Performance metrics, stats and reports that give you Actionable Insights >>> Deep dive visibility with transaction tracing using APM Insight. >>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________ <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________> >>> wgs-assembler-users mailing list >>> wgs...@li... <mailto:wgs...@li...> >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users <https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users> >> >> > > |
From: Seth M. <mu...@uw...> - 2015-05-28 14:37:16
|
After correction I ended up with about 12x coverage. I presume the ~25x coverage suggested on the PBcR page is for after correction? I'll try the low-coverage parameters and double check BLASR/PBDAGCON next, thanks. Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Wed, May 27, 2015 at 6:56 PM, Serge Koren <ser...@gm...> wrote: > That most likely means you ended up with too little coverage for assembly > after correction. You can check the coverage in the > PI440795_Self_Assembled*.fastq files. If you’re not already, I’d suggest > using the low-coverage parameters on the wiki page: > > http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly > > I’d also double-check that you have BLASR/PBDAGCON available in your path > and that it is being used for assembly (in your > tempPI440795_Self_Assembled/runPartition.sh file look for the word > pbdagcon). > > On May 26, 2015, at 12:03 PM, Seth Munholland <mu...@uw...> wrote: > > Hi Serge, > > I looked into the 9-terminator folder and found the asm.utg.fasta file, > but it's only 4.5MB (~0.007x coverage) when I started wth ~33x coverage. > Any suggestions for where to look for the data loss? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > > On Tue, May 26, 2015 at 11:18 AM, Serge Koren <ser...@gm...> > wrote: > >> Hi, >> >> This was a bug fixed in CA 8.3rc2 (when the assembly of the corrected >> data failed, the restart did not work properly). If you grep for runCA in >> your command line output from your run and re-run the last command (it >> should have the library name as the -d option). That will re-create the >> 9-terminator directory and corresponding files. Unless you install the >> missing perl package, the qc generation will still fail, but it only >> contains statistics on the assembly, the asm.utg.fasta file should be your >> complete assembly. >> >> Serge >> >> >> On May 26, 2015, at 10:49 AM, Seth Munholland <mu...@uw...> >> wrote: >> >> Hello Everyone, >> >> I was running a PBcR through to assembly with nothing in my spec file >> excet memory options since I share the server. I got all the way to step 9 >> (terminator) when I got the following error: >> >> ----------------------------------------START Tue May 26 02:18:24 2015 >> /usr/bin/env perl /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/ >> caqc.pl -euid >> /lore/bill.crosby.storage/PI440795/PI440795_Self_Assembled/9-terminator/asm.asm >> Can't locate Statistics/Descriptive.pm in @INC (@INC contains: >> /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi >> /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl >> /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi >> /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl >> /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at >> /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl line 18. >> BEGIN failed--compilation aborted at >> /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl line 18. >> ----------------------------------------END Tue May 26 02:18:24 2015 (0 >> seconds) >> ERROR: Failed with signal INT (2) >> The Cleaner has arrived. Doing 'none'. >> ----------------------------------------END Tue May 26 02:18:24 2015 >> (1490 seconds) >> >> I google search tells me that I can try manually running asmOutputFasta >> to try and make the missing output fasta ( >> http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/). When >> I try it the fasta files are only ~3MB. The same link warns that the asm >> may be incomplete and I might have to repeat step 9 in runCA, this is where >> I get stuck. >> >> I've renamed the 9-terminator folder to 9-terminator-old, but what is the >> command for runCA to pickup a PBcR run? I tried specifying the directory, >> prefix, and spec file and after changing to the hash memory options in my >> spec file i get: >> >> Failure message: >> >> no fragment files specified, and stores not already created >> >> While trying to rerun the PBcR command again gives: >> >> Error: requested to output PI440795_Self_Assembled.frg but file already >> exists. Will not overwrite. >> >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 >> >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> >> >> > > |
From: Serge K. <ser...@gm...> - 2015-05-27 22:56:16
|
That most likely means you ended up with too little coverage for assembly after correction. You can check the coverage in the PI440795_Self_Assembled*.fastq files. If you’re not already, I’d suggest using the low-coverage parameters on the wiki page: http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Low_Coverage_Assembly> I’d also double-check that you have BLASR/PBDAGCON available in your path and that it is being used for assembly (in your tempPI440795_Self_Assembled/runPartition.sh file look for the word pbdagcon). > On May 26, 2015, at 12:03 PM, Seth Munholland <mu...@uw...> wrote: > > Hi Serge, > > I looked into the 9-terminator folder and found the asm.utg.fasta file, but it's only 4.5MB (~0.007x coverage) when I started wth ~33x coverage. Any suggestions for where to look for the data loss? > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <> > On Tue, May 26, 2015 at 11:18 AM, Serge Koren <ser...@gm... <mailto:ser...@gm...>> wrote: > Hi, > > This was a bug fixed in CA 8.3rc2 (when the assembly of the corrected data failed, the restart did not work properly). If you grep for runCA in your command line output from your run and re-run the last command (it should have the library name as the -d option). That will re-create the 9-terminator directory and corresponding files. Unless you install the missing perl package, the qc generation will still fail, but it only contains statistics on the assembly, the asm.utg.fasta file should be your complete assembly. > > Serge > > >> On May 26, 2015, at 10:49 AM, Seth Munholland <mu...@uw... <mailto:mu...@uw...>> wrote: >> >> Hello Everyone, >> >> I was running a PBcR through to assembly with nothing in my spec file excet memory options since I share the server. I got all the way to step 9 (terminator) when I got the following error: >> >> ----------------------------------------START Tue May 26 02:18:24 2015 >> /usr/bin/env perl /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> -euid /lore/bill.crosby.storage/PI440795/PI440795_Self_Assembled/9-terminator/asm.asm >> Can't locate Statistics/Descriptive.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> line 18. >> BEGIN failed--compilation aborted at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> line 18. >> ----------------------------------------END Tue May 26 02:18:24 2015 (0 seconds) >> ERROR: Failed with signal INT (2) >> The Cleaner has arrived. Doing 'none'. >> ----------------------------------------END Tue May 26 02:18:24 2015 (1490 seconds) >> >> I google search tells me that I can try manually running asmOutputFasta to try and make the missing output fasta (http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/ <http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/>). When I try it the fasta files are only ~3MB. The same link warns that the asm may be incomplete and I might have to repeat step 9 in runCA, this is where I get stuck. >> >> I've renamed the 9-terminator folder to 9-terminator-old, but what is the command for runCA to pickup a PBcR run? I tried specifying the directory, prefix, and spec file and after changing to the hash memory options in my spec file i get: >> >> Failure message: >> >> no fragment files specified, and stores not already created >> >> While trying to rerun the PBcR command again gives: >> >> Error: requested to output PI440795_Self_Assembled.frg but file already exists. Will not overwrite. >> >> >> Seth Munholland, B.Sc. >> Department of Biological Sciences >> Rm. 304 Biology Building >> University of Windsor >> 401 Sunset Ave. N9B 3P4 >> T: (519) 253-3000 Ext: 4755 <>------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________ <http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________> >> wgs-assembler-users mailing list >> wgs...@li... <mailto:wgs...@li...> >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users <https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users> > > |
From: Seth M. <mu...@uw...> - 2015-05-26 16:03:36
|
Hi Serge, I looked into the 9-terminator folder and found the asm.utg.fasta file, but it's only 4.5MB (~0.007x coverage) when I started wth ~33x coverage. Any suggestions for where to look for the data loss? Seth Munholland, B.Sc. Department of Biological Sciences Rm. 304 Biology Building University of Windsor 401 Sunset Ave. N9B 3P4 T: (519) 253-3000 Ext: 4755 On Tue, May 26, 2015 at 11:18 AM, Serge Koren <ser...@gm...> wrote: > Hi, > > This was a bug fixed in CA 8.3rc2 (when the assembly of the corrected data > failed, the restart did not work properly). If you grep for runCA in your > command line output from your run and re-run the last command (it should > have the library name as the -d option). That will re-create the > 9-terminator directory and corresponding files. Unless you install the > missing perl package, the qc generation will still fail, but it only > contains statistics on the assembly, the asm.utg.fasta file should be your > complete assembly. > > Serge > > > On May 26, 2015, at 10:49 AM, Seth Munholland <mu...@uw...> wrote: > > Hello Everyone, > > I was running a PBcR through to assembly with nothing in my spec file > excet memory options since I share the server. I got all the way to step 9 > (terminator) when I got the following error: > > ----------------------------------------START Tue May 26 02:18:24 2015 > /usr/bin/env perl /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/ > caqc.pl -euid > /lore/bill.crosby.storage/PI440795/PI440795_Self_Assembled/9-terminator/asm.asm > Can't locate Statistics/Descriptive.pm in @INC (@INC contains: > /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl > /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi > /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl > /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at > /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl line 18. > BEGIN failed--compilation aborted at > /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl line 18. > ----------------------------------------END Tue May 26 02:18:24 2015 (0 > seconds) > ERROR: Failed with signal INT (2) > The Cleaner has arrived. Doing 'none'. > ----------------------------------------END Tue May 26 02:18:24 2015 (1490 > seconds) > > I google search tells me that I can try manually running asmOutputFasta to > try and make the missing output fasta ( > http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/). When > I try it the fasta files are only ~3MB. The same link warns that the asm > may be incomplete and I might have to repeat step 9 in runCA, this is where > I get stuck. > > I've renamed the 9-terminator folder to 9-terminator-old, but what is the > command for runCA to pickup a PBcR run? I tried specifying the directory, > prefix, and spec file and after changing to the hash memory options in my > spec file i get: > > Failure message: > > no fragment files specified, and stores not already created > > While trying to rerun the PBcR command again gives: > > Error: requested to output PI440795_Self_Assembled.frg but file already > exists. Will not overwrite. > > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > > |
From: Serge K. <ser...@gm...> - 2015-05-26 15:18:44
|
Hi, This was a bug fixed in CA 8.3rc2 (when the assembly of the corrected data failed, the restart did not work properly). If you grep for runCA in your command line output from your run and re-run the last command (it should have the library name as the -d option). That will re-create the 9-terminator directory and corresponding files. Unless you install the missing perl package, the qc generation will still fail, but it only contains statistics on the assembly, the asm.utg.fasta file should be your complete assembly. Serge > On May 26, 2015, at 10:49 AM, Seth Munholland <mu...@uw...> wrote: > > Hello Everyone, > > I was running a PBcR through to assembly with nothing in my spec file excet memory options since I share the server. I got all the way to step 9 (terminator) when I got the following error: > > ----------------------------------------START Tue May 26 02:18:24 2015 > /usr/bin/env perl /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> -euid /lore/bill.crosby.storage/PI440795/PI440795_Self_Assembled/9-terminator/asm.asm > Can't locate Statistics/Descriptive.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> line 18. > BEGIN failed--compilation aborted at /data/bill.crosby/apps/wgs-8.3rc1/Linux-amd64/bin/caqc.pl <http://caqc.pl/> line 18. > ----------------------------------------END Tue May 26 02:18:24 2015 (0 seconds) > ERROR: Failed with signal INT (2) > The Cleaner has arrived. Doing 'none'. > ----------------------------------------END Tue May 26 02:18:24 2015 (1490 seconds) > > I google search tells me that I can try manually running asmOutputFasta to try and make the missing output fasta (http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/ <http://sourceforge.net/p/wgs-assembler/mailman/message/33260123/>). When I try it the fasta files are only ~3MB. The same link warns that the asm may be incomplete and I might have to repeat step 9 in runCA, this is where I get stuck. > > I've renamed the 9-terminator folder to 9-terminator-old, but what is the command for runCA to pickup a PBcR run? I tried specifying the directory, prefix, and spec file and after changing to the hash memory options in my spec file i get: > > Failure message: > > no fragment files specified, and stores not already created > > While trying to rerun the PBcR command again gives: > > Error: requested to output PI440795_Self_Assembled.frg but file already exists. Will not overwrite. > > > Seth Munholland, B.Sc. > Department of Biological Sciences > Rm. 304 Biology Building > University of Windsor > 401 Sunset Ave. N9B 3P4 > T: (519) 253-3000 Ext: 4755 <>------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y_______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |