You can subscribe to this list here.
2012 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(29) |
May
(8) |
Jun
(5) |
Jul
(46) |
Aug
(16) |
Sep
(5) |
Oct
(6) |
Nov
(17) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(5) |
Feb
(2) |
Mar
(10) |
Apr
(13) |
May
(20) |
Jun
(7) |
Jul
(6) |
Aug
(14) |
Sep
(9) |
Oct
(19) |
Nov
(17) |
Dec
(3) |
2014 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(1) |
May
(1) |
Jun
(30) |
Jul
(10) |
Aug
(2) |
Sep
(18) |
Oct
(3) |
Nov
(4) |
Dec
(13) |
2015 |
Jan
(27) |
Feb
|
Mar
(19) |
Apr
(12) |
May
(10) |
Jun
(18) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
(9) |
2016 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Christoph H. <chr...@gm...> - 2012-07-10 22:46:20
|
Hi Brian, Thanks! overlaps are being computed now and CVS version of CA has been successfully compiled. Will try the runCA-overlapStoreBuild.pl once the overlapper is finished. One question there: I understand that the memory usage is regulated by the -jobs j parameter. higher value for j means less memory for every job. How can I specify the number of CPUs to be used in the parallel steps? Thanks for your help! I appreciate it! cheers, Christoph On 07/10/2012 10:18 PM, Walenz, Brian wrote: > Quick guess is that runCA is finding the old ovlStore and assuming it is > complete, then continuing on to frgcorr. runCA tests for the existence of > name.ovlStore to determine if overlaps are finished; it doesn't check that > the store is valid. So, delete *ovlStore* too. > > Your latest build (from scratch) is suffering from a long standing > dependency issue. It needs kmer checked out and 'make install'ed. > > make[1]: *** No rule to make target `sweatShop.H', needed by > `classifyMates.o'. Stop. > make[1]: *** Waiting for unfinished jobs.... > make: *** [objs] Error 1 > > Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. > > The kmer included in CA7 is too old for the CVS version of CA, so you'll > need to grab it from subversion. > > http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou > t_and_Compile > > b > > > On 7/10/12 4:00 PM, "Christoph Hahn" <chr...@gm...> wrote: > >> Hi, >> >> I actually tried to just rerun the overlapper. I moved the 1-overlapper >> and the 3-overlapcorrection directories and just ran runCA and it >> immediately starts with doing frgcorr. Do you mean recompute from the >> very start? Is there a way to avoid recomputing the initial overlaps at >> least(it took some 10000 CPUhours)?? >> >> Tried to compile it again - not successful. Ran make in the src >> directory (output in makelog) and also in the AS_RUN directory (output >> AS_RUN-makelog). >> >> Thanks, >> Christoph >> >> >> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>> Odd, the *gz should only be deleted after the store is successfully built. >>> runCA might have been confused by the attempt to rerun. The easiest will be >>> to recompute. :-( >>> >>> I've never seen the 'libCA.a' error before. That particular program is the >>> first to get built. Looks like libCA.a wasn't created. My fix for most >>> strange compile errors is to remove the entire Linux-amd64 directory and >>> recompile. If that fails, send along the complete output of make and I'll >>> take a look. >>> >>> b >>> >>> >>> >>> >>> On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: >>> >>>> Hi Brian, >>>> >>>> Thanks for your reply! >>>> >>>> I would be happy to try the new parallel overlap store build, but I >>>> think I need the *.ovb.gz outputs for that and unfortunately I dont have >>>> them any more. Looks like they were deleted after the ovlStore was >>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>> understanding that correctly? >>>> >>>> I have downloaded the cvs and tried to make, but I get: >>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>> >>>> I really appreciate your help! >>>> >>>> cheers, >>>> Christoph >>>> >>>> >>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>> Hi, Christoph- >>>>> >>>>> The original overlap store build is difficult to resume. I think it can be >>>>> done, but it will take code changes that are probably specific to the case >>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper will >>>>> I suggest this. >>>>> >>>>> Option 1 is then to restart. >>>>> >>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>>>> jobs. The first job is parallel, and transfers the overlapper output into >>>>> buckets for sorting. The second job, also parallel, sorts each bucket. >>>>> The >>>>> final job, sequential, builds an index for the store. Since this compute >>>>> is >>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>> >>>>> Its performance can be great -- at JCVI we've seen builds that we estimated >>>>> would take 2 days using the original sequential build, finish in a few (4?) >>>>> hours with the data parallel version. But on our development cluster, it >>>>> is >>>>> slower than the sequential version. It depends on the disk throughput. >>>>> Our >>>>> dev cluster is powered off of a 6-disk ZFS, while the production side has a >>>>> big Isilon. >>>>> >>>>> It is only in CVS. I just added command line help and a bit of >>>>> documentation, so do an update first. >>>>> >>>>> Happy to provide help if you want to try it out. More than happy to accept >>>>> better documentation. >>>>> >>>>> b >>>>> >>>>> >>>>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>>> >>>>>> Hei Ole, >>>>>> >>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>> referring to just recently. Sounds like a good approach you are using! >>>>>> Will definitely consider that to make the assembly more effective in a >>>>>> next try. Thanks for that! >>>>>> For now, I think I am pretty much over all the trimming and correction >>>>>> steps (once I get this last thing sorted out..). As far as I can see the >>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>> think. >>>>>> >>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>> overlap store. As it is now, I think it is useless, being only >>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>>>> working as I thought it would. >>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>> proceed.. It would be fantastic if anyone could give me a tip what to do!! >>>>>> >>>>>> Thanks for your help! >>>>>> >>>>>> much obliged, >>>>>> Christoph >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>> Hi Christoph. >>>>>>> >>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>> steps as frgcorrection. >>>>>>> >>>>>>> I don't think frgcorrection is that useful for the type of data you're >>>>>>> using anyway. >>>>>>> >>>>>>> If you have a set of corrected reads, you can use these settings for CA: >>>>>>> doOBT=0 >>>>>>> doFragmentCorrection=0 >>>>>>> >>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>>>> too. Apply with caution. >>>>>>> >>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Prepr >>>>>>> oc >>>>>>> es >>>>>>> sing >>>>>>> and discussion with Brian. >>>>>>> >>>>>>> Ole >>>>>>> >>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>>>> Dear users and developers, >>>>>>>> >>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>> completed >>>>>>>> the fragment- and overlap error correction. Unfortunately runCA stopped >>>>>>>> in >>>>>>>> the subsequent updating of the overlapStore, because of an incorrectly >>>>>>>> set >>>>>>>> time limit.. >>>>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>> -u >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>> rrection/salaris.erates> >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore >>>>>>>> -u >>>>>>>> pd >>>>>>>> ate-erates.err >>>>>>>> 2>&1 >>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >>>>>>>> seconds) >>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>> ======================================================================== >>>>>>>> == >>>>>>>> == >>>>>>>> ==== >>>>>>>> >>>>>>>> runCA failed. >>>>>>>> >>>>>>>> ---------------------------------------- >>>>>>>> Stack trace: >>>>>>>> >>>>>>>> at >>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>> line >>>>>>>> 1237 >>>>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>> called >>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>> main::overlapCorrection() called at >>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >>>>>>>> 5880 >>>>>>>> >>>>>>>> ---------------------------------------- >>>>>>>> Last few lines of the relevant log file >>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStor >>>>>>>> e- >>>>>>>> up >>>>>>>> date-erates.err): >>>>>>>> >>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>>>> reading: No such file or directory >>>>>>>> >>>>>>>> ---------------------------------------- >>>>>>>> Failure message: >>>>>>>> >>>>>>>> failed to apply the overlap corrections >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>> reason >>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>> already >>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it >>>>>>>> seems >>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there >>>>>>>> a >>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of >>>>>>>> from >>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>>>> command again manually (the one that was done before starting the >>>>>>>> frgcorr >>>>>>>> and ovlcorr: >>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>> -c >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 >>>>>>>> -L >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>> ovlcorr >>>>>>>> steps, because these steps were really resource intensive. >>>>>>>> >>>>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>>>> Thanks >>>>>>>> in advance for your help! >>>>>>>> >>>>>>>> much obliged, >>>>>>>> Christoph >>>>>>>> >>>>>>>> University of Oslo >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> -- >>>>>>>> -- >>>>>>>> -- >>>>>>>> Live Security Virtual Conference >>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>> Discussions >>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>> malware >>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>> _______________________________________________ >>>>>>>> wgs-assembler-users mailing list >>>>>>>> wgs...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> -- >>>>>> -- >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>>> will include endpoint security, mobile security and the latest in malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> _______________________________________________ >>>>>> wgs-assembler-users mailing list >>>>>> wgs...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |
From: Walenz, B. <bw...@jc...> - 2012-07-10 20:18:44
|
Quick guess is that runCA is finding the old ovlStore and assuming it is complete, then continuing on to frgcorr. runCA tests for the existence of name.ovlStore to determine if overlaps are finished; it doesn't check that the store is valid. So, delete *ovlStore* too. Your latest build (from scratch) is suffering from a long standing dependency issue. It needs kmer checked out and 'make install'ed. make[1]: *** No rule to make target `sweatShop.H', needed by `classifyMates.o'. Stop. make[1]: *** Waiting for unfinished jobs.... make: *** [objs] Error 1 Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. The kmer included in CA7 is too old for the CVS version of CA, so you'll need to grab it from subversion. http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou t_and_Compile b On 7/10/12 4:00 PM, "Christoph Hahn" <chr...@gm...> wrote: > Hi, > > I actually tried to just rerun the overlapper. I moved the 1-overlapper > and the 3-overlapcorrection directories and just ran runCA and it > immediately starts with doing frgcorr. Do you mean recompute from the > very start? Is there a way to avoid recomputing the initial overlaps at > least(it took some 10000 CPUhours)?? > > Tried to compile it again - not successful. Ran make in the src > directory (output in makelog) and also in the AS_RUN directory (output > AS_RUN-makelog). > > Thanks, > Christoph > > > On 07/10/2012 09:04 PM, Walenz, Brian wrote: >> Odd, the *gz should only be deleted after the store is successfully built. >> runCA might have been confused by the attempt to rerun. The easiest will be >> to recompute. :-( >> >> I've never seen the 'libCA.a' error before. That particular program is the >> first to get built. Looks like libCA.a wasn't created. My fix for most >> strange compile errors is to remove the entire Linux-amd64 directory and >> recompile. If that fails, send along the complete output of make and I'll >> take a look. >> >> b >> >> >> >> >> On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: >> >>> Hi Brian, >>> >>> Thanks for your reply! >>> >>> I would be happy to try the new parallel overlap store build, but I >>> think I need the *.ovb.gz outputs for that and unfortunately I dont have >>> them any more. Looks like they were deleted after the ovlStore was >>> build. So I guess I ll need to run the overlapper again, first. Am I >>> understanding that correctly? >>> >>> I have downloaded the cvs and tried to make, but I get: >>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>> >>> I really appreciate your help! >>> >>> cheers, >>> Christoph >>> >>> >>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>> Hi, Christoph- >>>> >>>> The original overlap store build is difficult to resume. I think it can be >>>> done, but it will take code changes that are probably specific to the case >>>> you have. Only if you do not have the *ovb.gz outputs from overlapper will >>>> I suggest this. >>>> >>>> Option 1 is then to restart. >>>> >>>> Option 2 is to use a new 'data-parallel' overlap store build >>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>>> jobs. The first job is parallel, and transfers the overlapper output into >>>> buckets for sorting. The second job, also parallel, sorts each bucket. >>>> The >>>> final job, sequential, builds an index for the store. Since this compute >>>> is >>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>> >>>> Its performance can be great -- at JCVI we've seen builds that we estimated >>>> would take 2 days using the original sequential build, finish in a few (4?) >>>> hours with the data parallel version. But on our development cluster, it >>>> is >>>> slower than the sequential version. It depends on the disk throughput. >>>> Our >>>> dev cluster is powered off of a 6-disk ZFS, while the production side has a >>>> big Isilon. >>>> >>>> It is only in CVS. I just added command line help and a bit of >>>> documentation, so do an update first. >>>> >>>> Happy to provide help if you want to try it out. More than happy to accept >>>> better documentation. >>>> >>>> b >>>> >>>> >>>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>> >>>>> Hei Ole, >>>>> >>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>> referring to just recently. Sounds like a good approach you are using! >>>>> Will definitely consider that to make the assembly more effective in a >>>>> next try. Thanks for that! >>>>> For now, I think I am pretty much over all the trimming and correction >>>>> steps (once I get this last thing sorted out..). As far as I can see the >>>>> next step is already building the unitigs, so I ll try to finish this >>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>> perform in comparison to the pure illumina assemblies which I have >>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>> think. >>>>> >>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>> overlap store. As it is now, I think it is useless, being only >>>>> half-updated.. I also discovered that just rerunning the previous >>>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>>> working as I thought it would. >>>>> Seems to be a very unfortunate situation - really dont know how to >>>>> proceed.. It would be fantastic if anyone could give me a tip what to do!! >>>>> >>>>> Thanks for your help! >>>>> >>>>> much obliged, >>>>> Christoph >>>>> >>>>> >>>>> >>>>> >>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>> Hi Christoph. >>>>>> >>>>>> This is not an answer to your question, but a suggestion for a >>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>> that you can use in assemblies where you skip relatively expensive >>>>>> steps as frgcorrection. >>>>>> >>>>>> I don't think frgcorrection is that useful for the type of data you're >>>>>> using anyway. >>>>>> >>>>>> If you have a set of corrected reads, you can use these settings for CA: >>>>>> doOBT=0 >>>>>> doFragmentCorrection=0 >>>>>> >>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>> assembly now. You might have to clean up your directory tree, like >>>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>>> too. Apply with caution. >>>>>> >>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Prepr >>>>>> oc >>>>>> es >>>>>> sing >>>>>> and discussion with Brian. >>>>>> >>>>>> Ole >>>>>> >>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>>> Dear users and developers, >>>>>>> >>>>>>> I have the following problem: In my assembly process I have just >>>>>>> completed >>>>>>> the fragment- and overlap error correction. Unfortunately runCA stopped >>>>>>> in >>>>>>> the subsequent updating of the overlapStore, because of an incorrectly >>>>>>> set >>>>>>> time limit.. >>>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>> -u >>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>> rrection/salaris.erates> >>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore >>>>>>> -u >>>>>>> pd >>>>>>> ate-erates.err >>>>>>> 2>&1 >>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >>>>>>> seconds) >>>>>>> ERROR: Failed with signal HUP (1) >>>>>>> ======================================================================== >>>>>>> == >>>>>>> == >>>>>>> ==== >>>>>>> >>>>>>> runCA failed. >>>>>>> >>>>>>> ---------------------------------------- >>>>>>> Stack trace: >>>>>>> >>>>>>> at >>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>> line >>>>>>> 1237 >>>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>> called >>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>> main::overlapCorrection() called at >>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >>>>>>> 5880 >>>>>>> >>>>>>> ---------------------------------------- >>>>>>> Last few lines of the relevant log file >>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStor >>>>>>> e- >>>>>>> up >>>>>>> date-erates.err): >>>>>>> >>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>>> reading: No such file or directory >>>>>>> >>>>>>> ---------------------------------------- >>>>>>> Failure message: >>>>>>> >>>>>>> failed to apply the overlap corrections >>>>>>> >>>>>>> >>>>>>> >>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>> reason >>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>> already >>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it >>>>>>> seems >>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there >>>>>>> a >>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of >>>>>>> from >>>>>>> 0001~, which is obviously not there any more?? >>>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>>> command again manually (the one that was done before starting the >>>>>>> frgcorr >>>>>>> and ovlcorr: >>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>> -c >>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 >>>>>>> -L >>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>> ovlcorr >>>>>>> steps, because these steps were really resource intensive. >>>>>>> >>>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>>> Thanks >>>>>>> in advance for your help! >>>>>>> >>>>>>> much obliged, >>>>>>> Christoph >>>>>>> >>>>>>> University of Oslo >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> -- >>>>>>> -- >>>>>>> -- >>>>>>> Live Security Virtual Conference >>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>> Discussions >>>>>>> will include endpoint security, mobile security and the latest in >>>>>>> malware >>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>> _______________________________________________ >>>>>>> wgs-assembler-users mailing list >>>>>>> wgs...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>> >>>>> -------------------------------------------------------------------------- >>>>> -- >>>>> -- >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>> will include endpoint security, mobile security and the latest in malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> wgs-assembler-users mailing list >>>>> wgs...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> > > |
From: Christoph H. <chr...@gm...> - 2012-07-10 20:00:55
|
Hi, I actually tried to just rerun the overlapper. I moved the 1-overlapper and the 3-overlapcorrection directories and just ran runCA and it immediately starts with doing frgcorr. Do you mean recompute from the very start? Is there a way to avoid recomputing the initial overlaps at least(it took some 10000 CPUhours)?? Tried to compile it again - not successful. Ran make in the src directory (output in makelog) and also in the AS_RUN directory (output AS_RUN-makelog). Thanks, Christoph On 07/10/2012 09:04 PM, Walenz, Brian wrote: > Odd, the *gz should only be deleted after the store is successfully built. > runCA might have been confused by the attempt to rerun. The easiest will be > to recompute. :-( > > I've never seen the 'libCA.a' error before. That particular program is the > first to get built. Looks like libCA.a wasn't created. My fix for most > strange compile errors is to remove the entire Linux-amd64 directory and > recompile. If that fails, send along the complete output of make and I'll > take a look. > > b > > > > > On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: > >> Hi Brian, >> >> Thanks for your reply! >> >> I would be happy to try the new parallel overlap store build, but I >> think I need the *.ovb.gz outputs for that and unfortunately I dont have >> them any more. Looks like they were deleted after the ovlStore was >> build. So I guess I ll need to run the overlapper again, first. Am I >> understanding that correctly? >> >> I have downloaded the cvs and tried to make, but I get: >> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >> >> I really appreciate your help! >> >> cheers, >> Christoph >> >> >> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>> Hi, Christoph- >>> >>> The original overlap store build is difficult to resume. I think it can be >>> done, but it will take code changes that are probably specific to the case >>> you have. Only if you do not have the *ovb.gz outputs from overlapper will >>> I suggest this. >>> >>> Option 1 is then to restart. >>> >>> Option 2 is to use a new 'data-parallel' overlap store build >>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>> jobs. The first job is parallel, and transfers the overlapper output into >>> buckets for sorting. The second job, also parallel, sorts each bucket. The >>> final job, sequential, builds an index for the store. Since this compute is >>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>> >>> Its performance can be great -- at JCVI we've seen builds that we estimated >>> would take 2 days using the original sequential build, finish in a few (4?) >>> hours with the data parallel version. But on our development cluster, it is >>> slower than the sequential version. It depends on the disk throughput. Our >>> dev cluster is powered off of a 6-disk ZFS, while the production side has a >>> big Isilon. >>> >>> It is only in CVS. I just added command line help and a bit of >>> documentation, so do an update first. >>> >>> Happy to provide help if you want to try it out. More than happy to accept >>> better documentation. >>> >>> b >>> >>> >>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >>> >>>> Hei Ole, >>>> >>>> Thanks for your reply. I had looked on the preprocessing page you are >>>> referring to just recently. Sounds like a good approach you are using! >>>> Will definitely consider that to make the assembly more effective in a >>>> next try. Thanks for that! >>>> For now, I think I am pretty much over all the trimming and correction >>>> steps (once I get this last thing sorted out..). As far as I can see the >>>> next step is already building the unitigs, so I ll try to finish this >>>> assembly as it is now. Will try to improve it afterwards. I am really >>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>> perform in comparison to the pure illumina assemblies which I have >>>> pretty much optimized now (and with which I am pretty happy, btw), I think. >>>> >>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>> will not work. For the next step (the unitigger) I ll need an intact >>>> overlap store. As it is now, I think it is useless, being only >>>> half-updated.. I also discovered that just rerunning the previous >>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>> working as I thought it would. >>>> Seems to be a very unfortunate situation - really dont know how to >>>> proceed.. It would be fantastic if anyone could give me a tip what to do!! >>>> >>>> Thanks for your help! >>>> >>>> much obliged, >>>> Christoph >>>> >>>> >>>> >>>> >>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>> Hi Christoph. >>>>> >>>>> This is not an answer to your question, but a suggestion for a >>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>> you can use the set of trusted k-mers to correct different datasets. >>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>> deduplication on all the datasets too, then you'll end up with reads >>>>> that you can use in assemblies where you skip relatively expensive >>>>> steps as frgcorrection. >>>>> >>>>> I don't think frgcorrection is that useful for the type of data you're >>>>> using anyway. >>>>> >>>>> If you have a set of corrected reads, you can use these settings for CA: >>>>> doOBT=0 >>>>> doFragmentCorrection=0 >>>>> >>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>> assembly now. You might have to clean up your directory tree, like >>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>> too. Apply with caution. >>>>> >>>>> Most of the stuff I've mentioned I've taken from here: >>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preproc >>>>> es >>>>> sing >>>>> and discussion with Brian. >>>>> >>>>> Ole >>>>> >>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>> Dear users and developers, >>>>>> >>>>>> I have the following problem: In my assembly process I have just completed >>>>>> the fragment- and overlap error correction. Unfortunately runCA stopped in >>>>>> the subsequent updating of the overlapStore, because of an incorrectly set >>>>>> time limit.. >>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -u >>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>> rrection/salaris.erates> >>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-u >>>>>> pd >>>>>> ate-erates.err >>>>>> 2>&1 >>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >>>>>> seconds) >>>>>> ERROR: Failed with signal HUP (1) >>>>>> ========================================================================== >>>>>> == >>>>>> ==== >>>>>> >>>>>> runCA failed. >>>>>> >>>>>> ---------------------------------------- >>>>>> Stack trace: >>>>>> >>>>>> at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>> line >>>>>> 1237 >>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called >>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>> main::overlapCorrection() called at >>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >>>>>> 5880 >>>>>> >>>>>> ---------------------------------------- >>>>>> Last few lines of the relevant log file >>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore- >>>>>> up >>>>>> date-erates.err): >>>>>> >>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>> reading: No such file or directory >>>>>> >>>>>> ---------------------------------------- >>>>>> Failure message: >>>>>> >>>>>> failed to apply the overlap corrections >>>>>> >>>>>> >>>>>> >>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The reason >>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has already >>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems >>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a >>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of >>>>>> from >>>>>> 0001~, which is obviously not there any more?? >>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>> command again manually (the one that was done before starting the frgcorr >>>>>> and ovlcorr: >>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -c >>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 -L >>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>> important thing is that I want to avoid rerunning the frgcorr and ovlcorr >>>>>> steps, because these steps were really resource intensive. >>>>>> >>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>> Thanks >>>>>> in advance for your help! >>>>>> >>>>>> much obliged, >>>>>> Christoph >>>>>> >>>>>> University of Oslo >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> -- >>>>>> -- >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>>> will include endpoint security, mobile security and the latest in malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> _______________________________________________ >>>>>> wgs-assembler-users mailing list >>>>>> wgs...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>> >>>> ---------------------------------------------------------------------------- >>>> -- >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> wgs-assembler-users mailing list >>>> wgs...@li... >>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |
From: Walenz, B. <bw...@jc...> - 2012-07-10 19:04:48
|
Odd, the *gz should only be deleted after the store is successfully built. runCA might have been confused by the attempt to rerun. The easiest will be to recompute. :-( I've never seen the 'libCA.a' error before. That particular program is the first to get built. Looks like libCA.a wasn't created. My fix for most strange compile errors is to remove the entire Linux-amd64 directory and recompile. If that fails, send along the complete output of make and I'll take a look. b On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: > Hi Brian, > > Thanks for your reply! > > I would be happy to try the new parallel overlap store build, but I > think I need the *.ovb.gz outputs for that and unfortunately I dont have > them any more. Looks like they were deleted after the ovlStore was > build. So I guess I ll need to run the overlapper again, first. Am I > understanding that correctly? > > I have downloaded the cvs and tried to make, but I get: > *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. > > I really appreciate your help! > > cheers, > Christoph > > > On 07/10/2012 05:09 PM, Walenz, Brian wrote: >> Hi, Christoph- >> >> The original overlap store build is difficult to resume. I think it can be >> done, but it will take code changes that are probably specific to the case >> you have. Only if you do not have the *ovb.gz outputs from overlapper will >> I suggest this. >> >> Option 1 is then to restart. >> >> Option 2 is to use a new 'data-parallel' overlap store build >> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >> jobs. The first job is parallel, and transfers the overlapper output into >> buckets for sorting. The second job, also parallel, sorts each bucket. The >> final job, sequential, builds an index for the store. Since this compute is >> just a collection of jobs, it can be restarted/resumed/fixed easily. >> >> Its performance can be great -- at JCVI we've seen builds that we estimated >> would take 2 days using the original sequential build, finish in a few (4?) >> hours with the data parallel version. But on our development cluster, it is >> slower than the sequential version. It depends on the disk throughput. Our >> dev cluster is powered off of a 6-disk ZFS, while the production side has a >> big Isilon. >> >> It is only in CVS. I just added command line help and a bit of >> documentation, so do an update first. >> >> Happy to provide help if you want to try it out. More than happy to accept >> better documentation. >> >> b >> >> >> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >> >>> Hei Ole, >>> >>> Thanks for your reply. I had looked on the preprocessing page you are >>> referring to just recently. Sounds like a good approach you are using! >>> Will definitely consider that to make the assembly more effective in a >>> next try. Thanks for that! >>> For now, I think I am pretty much over all the trimming and correction >>> steps (once I get this last thing sorted out..). As far as I can see the >>> next step is already building the unitigs, so I ll try to finish this >>> assembly as it is now. Will try to improve it afterwards. I am really >>> curious how a first attempt of a hybrid approach (454+illumina) will >>> perform in comparison to the pure illumina assemblies which I have >>> pretty much optimized now (and with which I am pretty happy, btw), I think. >>> >>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>> will not work. For the next step (the unitigger) I ll need an intact >>> overlap store. As it is now, I think it is useless, being only >>> half-updated.. I also discovered that just rerunning the previous >>> overlapStore command (the one before the frg- and ovlcorrection) is not >>> working as I thought it would. >>> Seems to be a very unfortunate situation - really dont know how to >>> proceed.. It would be fantastic if anyone could give me a tip what to do!! >>> >>> Thanks for your help! >>> >>> much obliged, >>> Christoph >>> >>> >>> >>> >>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>> Hi Christoph. >>>> >>>> This is not an answer to your question, but a suggestion for a >>>> work-around. If I remember correctly, you have both Illumina and 454 >>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>> (can also be used on 454 reads). What I've been doing lately, is to >>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>> you can use the set of trusted k-mers to correct different datasets. >>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>> for my 454 reads, and then output the result as fastq-files. I used >>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>> for all your reads, used either merTim or merTrim/OBT, and do >>>> deduplication on all the datasets too, then you'll end up with reads >>>> that you can use in assemblies where you skip relatively expensive >>>> steps as frgcorrection. >>>> >>>> I don't think frgcorrection is that useful for the type of data you're >>>> using anyway. >>>> >>>> If you have a set of corrected reads, you can use these settings for CA: >>>> doOBT=0 >>>> doFragmentCorrection=0 >>>> >>>> When I think of it, you might use doFragmentCorrection=0 on this >>>> assembly now. You might have to clean up your directory tree, like >>>> removing the 3-overlapcorrection directory and maybe some other steps >>>> too. Apply with caution. >>>> >>>> Most of the stuff I've mentioned I've taken from here: >>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preproc >>>> es >>>> sing >>>> and discussion with Brian. >>>> >>>> Ole >>>> >>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>> Dear users and developers, >>>>> >>>>> I have the following problem: In my assembly process I have just completed >>>>> the fragment- and overlap error correction. Unfortunately runCA stopped in >>>>> the subsequent updating of the overlapStore, because of an incorrectly set >>>>> time limit.. >>>>> If I am trying to resume the assembly now, I get the following error: >>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -u >>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>> rrection/salaris.erates> >>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-u >>>>> pd >>>>> ate-erates.err >>>>> 2>&1 >>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >>>>> seconds) >>>>> ERROR: Failed with signal HUP (1) >>>>> ========================================================================== >>>>> == >>>>> ==== >>>>> >>>>> runCA failed. >>>>> >>>>> ---------------------------------------- >>>>> Stack trace: >>>>> >>>>> at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>> line >>>>> 1237 >>>>> main::caFailure('failed to apply the overlap corrections', >>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called >>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>> main::overlapCorrection() called at >>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >>>>> 5880 >>>>> >>>>> ---------------------------------------- >>>>> Last few lines of the relevant log file >>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore- >>>>> up >>>>> date-erates.err): >>>>> >>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>> reading: No such file or directory >>>>> >>>>> ---------------------------------------- >>>>> Failure message: >>>>> >>>>> failed to apply the overlap corrections >>>>> >>>>> >>>>> >>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The reason >>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has already >>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems >>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a >>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of >>>>> from >>>>> 0001~, which is obviously not there any more?? >>>>> Another solution I was thinking of is to run the previous overlapStore >>>>> command again manually (the one that was done before starting the frgcorr >>>>> and ovlcorr: >>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -c >>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 -L >>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>> important thing is that I want to avoid rerunning the frgcorr and ovlcorr >>>>> steps, because these steps were really resource intensive. >>>>> >>>>> I would really appreciate any comments or suggestions to my problem! >>>>> Thanks >>>>> in advance for your help! >>>>> >>>>> much obliged, >>>>> Christoph >>>>> >>>>> University of Oslo >>>>> >>>>> >>>>> >>>>> >>>>> -------------------------------------------------------------------------- >>>>> -- >>>>> -- >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>> will include endpoint security, mobile security and the latest in malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> wgs-assembler-users mailing list >>>>> wgs...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>> >>> >>> ---------------------------------------------------------------------------- >>> -- >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |
From: Christoph H. <chr...@gm...> - 2012-07-10 18:15:30
|
Hi Brian, Thanks for your reply! I would be happy to try the new parallel overlap store build, but I think I need the *.ovb.gz outputs for that and unfortunately I dont have them any more. Looks like they were deleted after the ovlStore was build. So I guess I ll need to run the overlapper again, first. Am I understanding that correctly? I have downloaded the cvs and tried to make, but I get: *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. I really appreciate your help! cheers, Christoph On 07/10/2012 05:09 PM, Walenz, Brian wrote: > Hi, Christoph- > > The original overlap store build is difficult to resume. I think it can be > done, but it will take code changes that are probably specific to the case > you have. Only if you do not have the *ovb.gz outputs from overlapper will > I suggest this. > > Option 1 is then to restart. > > Option 2 is to use a new 'data-parallel' overlap store build > (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid > jobs. The first job is parallel, and transfers the overlapper output into > buckets for sorting. The second job, also parallel, sorts each bucket. The > final job, sequential, builds an index for the store. Since this compute is > just a collection of jobs, it can be restarted/resumed/fixed easily. > > Its performance can be great -- at JCVI we've seen builds that we estimated > would take 2 days using the original sequential build, finish in a few (4?) > hours with the data parallel version. But on our development cluster, it is > slower than the sequential version. It depends on the disk throughput. Our > dev cluster is powered off of a 6-disk ZFS, while the production side has a > big Isilon. > > It is only in CVS. I just added command line help and a bit of > documentation, so do an update first. > > Happy to provide help if you want to try it out. More than happy to accept > better documentation. > > b > > > On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: > >> Hei Ole, >> >> Thanks for your reply. I had looked on the preprocessing page you are >> referring to just recently. Sounds like a good approach you are using! >> Will definitely consider that to make the assembly more effective in a >> next try. Thanks for that! >> For now, I think I am pretty much over all the trimming and correction >> steps (once I get this last thing sorted out..). As far as I can see the >> next step is already building the unitigs, so I ll try to finish this >> assembly as it is now. Will try to improve it afterwards. I am really >> curious how a first attempt of a hybrid approach (454+illumina) will >> perform in comparison to the pure illumina assemblies which I have >> pretty much optimized now (and with which I am pretty happy, btw), I think. >> >> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >> will not work. For the next step (the unitigger) I ll need an intact >> overlap store. As it is now, I think it is useless, being only >> half-updated.. I also discovered that just rerunning the previous >> overlapStore command (the one before the frg- and ovlcorrection) is not >> working as I thought it would. >> Seems to be a very unfortunate situation - really dont know how to >> proceed.. It would be fantastic if anyone could give me a tip what to do!! >> >> Thanks for your help! >> >> much obliged, >> Christoph >> >> >> >> >> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>> Hi Christoph. >>> >>> This is not an answer to your question, but a suggestion for a >>> work-around. If I remember correctly, you have both Illumina and 454 >>> reads. Celera runs, as you see below, frgcorrection and overlap based >>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>> (can also be used on 454 reads). What I've been doing lately, is to >>> run meryl on a trusted set of Illumina reads, pair end for example, I >>> ran it on some overlapping reads which I had merged with FLASH. Then >>> you can use the set of trusted k-mers to correct different datasets. >>> For example, I first ran CA to the end of OBT (overlap based trimming) >>> for my 454 reads, and then output the result as fastq-files. I used >>> the trusted k-mer set to correct these 454 reads too. If you do this >>> for all your reads, used either merTim or merTrim/OBT, and do >>> deduplication on all the datasets too, then you'll end up with reads >>> that you can use in assemblies where you skip relatively expensive >>> steps as frgcorrection. >>> >>> I don't think frgcorrection is that useful for the type of data you're >>> using anyway. >>> >>> If you have a set of corrected reads, you can use these settings for CA: >>> doOBT=0 >>> doFragmentCorrection=0 >>> >>> When I think of it, you might use doFragmentCorrection=0 on this >>> assembly now. You might have to clean up your directory tree, like >>> removing the 3-overlapcorrection directory and maybe some other steps >>> too. Apply with caution. >>> >>> Most of the stuff I've mentioned I've taken from here: >>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preproces >>> sing >>> and discussion with Brian. >>> >>> Ole >>> >>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>> Dear users and developers, >>>> >>>> I have the following problem: In my assembly process I have just completed >>>> the fragment- and overlap error correction. Unfortunately runCA stopped in >>>> the subsequent updating of the overlapStore, because of an incorrectly set >>>> time limit.. >>>> If I am trying to resume the assembly now, I get the following error: >>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -u >>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>> rrection/salaris.erates> >>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-upd >>>> ate-erates.err >>>> 2>&1 >>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >>>> seconds) >>>> ERROR: Failed with signal HUP (1) >>>> ============================================================================ >>>> ==== >>>> >>>> runCA failed. >>>> >>>> ---------------------------------------- >>>> Stack trace: >>>> >>>> at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >>>> 1237 >>>> main::caFailure('failed to apply the overlap corrections', >>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called >>>> at /usit/titan/u1/chrishah/programmes/wgs >>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>> main::overlapCorrection() called at >>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line 5880 >>>> >>>> ---------------------------------------- >>>> Last few lines of the relevant log file >>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-up >>>> date-erates.err): >>>> >>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>> reading: No such file or directory >>>> >>>> ---------------------------------------- >>>> Failure message: >>>> >>>> failed to apply the overlap corrections >>>> >>>> >>>> >>>> So it can obviously not find the file /salaris.ovlStore/0001~. The reason >>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has already >>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems >>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a >>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of from >>>> 0001~, which is obviously not there any more?? >>>> Another solution I was thinking of is to run the previous overlapStore >>>> command again manually (the one that was done before starting the frgcorr >>>> and ovlcorr: >>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -c >>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 -L >>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>> restore the status from before the frgcorr and ovlcorr steps, before >>>> resuming runCA. This should restore the 0001~ file, right? The most >>>> important thing is that I want to avoid rerunning the frgcorr and ovlcorr >>>> steps, because these steps were really resource intensive. >>>> >>>> I would really appreciate any comments or suggestions to my problem! Thanks >>>> in advance for your help! >>>> >>>> much obliged, >>>> Christoph >>>> >>>> University of Oslo >>>> >>>> >>>> >>>> >>>> ---------------------------------------------------------------------------- >>>> -- >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> wgs-assembler-users mailing list >>>> wgs...@li... >>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Walenz, B. <bw...@jc...> - 2012-07-10 15:09:21
|
Hi, Christoph- The original overlap store build is difficult to resume. I think it can be done, but it will take code changes that are probably specific to the case you have. Only if you do not have the *ovb.gz outputs from overlapper will I suggest this. Option 1 is then to restart. Option 2 is to use a new 'data-parallel' overlap store build (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid jobs. The first job is parallel, and transfers the overlapper output into buckets for sorting. The second job, also parallel, sorts each bucket. The final job, sequential, builds an index for the store. Since this compute is just a collection of jobs, it can be restarted/resumed/fixed easily. Its performance can be great -- at JCVI we've seen builds that we estimated would take 2 days using the original sequential build, finish in a few (4?) hours with the data parallel version. But on our development cluster, it is slower than the sequential version. It depends on the disk throughput. Our dev cluster is powered off of a 6-disk ZFS, while the production side has a big Isilon. It is only in CVS. I just added command line help and a bit of documentation, so do an update first. Happy to provide help if you want to try it out. More than happy to accept better documentation. b On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: > Hei Ole, > > Thanks for your reply. I had looked on the preprocessing page you are > referring to just recently. Sounds like a good approach you are using! > Will definitely consider that to make the assembly more effective in a > next try. Thanks for that! > For now, I think I am pretty much over all the trimming and correction > steps (once I get this last thing sorted out..). As far as I can see the > next step is already building the unitigs, so I ll try to finish this > assembly as it is now. Will try to improve it afterwards. I am really > curious how a first attempt of a hybrid approach (454+illumina) will > perform in comparison to the pure illumina assemblies which I have > pretty much optimized now (and with which I am pretty happy, btw), I think. > > I am afraid, your suggestion to do doFragmentCorrection=0 directly now > will not work. For the next step (the unitigger) I ll need an intact > overlap store. As it is now, I think it is useless, being only > half-updated.. I also discovered that just rerunning the previous > overlapStore command (the one before the frg- and ovlcorrection) is not > working as I thought it would. > Seems to be a very unfortunate situation - really dont know how to > proceed.. It would be fantastic if anyone could give me a tip what to do!! > > Thanks for your help! > > much obliged, > Christoph > > > > > On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >> Hi Christoph. >> >> This is not an answer to your question, but a suggestion for a >> work-around. If I remember correctly, you have both Illumina and 454 >> reads. Celera runs, as you see below, frgcorrection and overlap based >> trimming to correct 454 reads, and merTrim to correct Illumina reads >> (can also be used on 454 reads). What I've been doing lately, is to >> run meryl on a trusted set of Illumina reads, pair end for example, I >> ran it on some overlapping reads which I had merged with FLASH. Then >> you can use the set of trusted k-mers to correct different datasets. >> For example, I first ran CA to the end of OBT (overlap based trimming) >> for my 454 reads, and then output the result as fastq-files. I used >> the trusted k-mer set to correct these 454 reads too. If you do this >> for all your reads, used either merTim or merTrim/OBT, and do >> deduplication on all the datasets too, then you'll end up with reads >> that you can use in assemblies where you skip relatively expensive >> steps as frgcorrection. >> >> I don't think frgcorrection is that useful for the type of data you're >> using anyway. >> >> If you have a set of corrected reads, you can use these settings for CA: >> doOBT=0 >> doFragmentCorrection=0 >> >> When I think of it, you might use doFragmentCorrection=0 on this >> assembly now. You might have to clean up your directory tree, like >> removing the 3-overlapcorrection directory and maybe some other steps >> too. Apply with caution. >> >> Most of the stuff I've mentioned I've taken from here: >> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preproces >> sing >> and discussion with Brian. >> >> Ole >> >> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>> Dear users and developers, >>> >>> I have the following problem: In my assembly process I have just completed >>> the fragment- and overlap error correction. Unfortunately runCA stopped in >>> the subsequent updating of the overlapStore, because of an incorrectly set >>> time limit.. >>> If I am trying to resume the assembly now, I get the following error: >>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -u >>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>> rrection/salaris.erates> >>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-upd >>> ate-erates.err >>> 2>&1 >>> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >>> seconds) >>> ERROR: Failed with signal HUP (1) >>> ============================================================================ >>> ==== >>> >>> runCA failed. >>> >>> ---------------------------------------- >>> Stack trace: >>> >>> at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >>> 1237 >>> main::caFailure('failed to apply the overlap corrections', >>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called >>> at /usit/titan/u1/chrishah/programmes/wgs >>> -7.0/Linux-amd64/bin/./runCA line 4077 >>> main::overlapCorrection() called at >>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line 5880 >>> >>> ---------------------------------------- >>> Last few lines of the relevant log file >>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-up >>> date-erates.err): >>> >>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>> reading: No such file or directory >>> >>> ---------------------------------------- >>> Failure message: >>> >>> failed to apply the overlap corrections >>> >>> >>> >>> So it can obviously not find the file /salaris.ovlStore/0001~. The reason >>> is, from what I can see, that the /salaris.ovlStore/0001~ file has already >>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems >>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a >>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of from >>> 0001~, which is obviously not there any more?? >>> Another solution I was thinking of is to run the previous overlapStore >>> command again manually (the one that was done before starting the frgcorr >>> and ovlcorr: >>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -c >>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 -L >>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>> restore the status from before the frgcorr and ovlcorr steps, before >>> resuming runCA. This should restore the 0001~ file, right? The most >>> important thing is that I want to avoid rerunning the frgcorr and ovlcorr >>> steps, because these steps were really resource intensive. >>> >>> I would really appreciate any comments or suggestions to my problem! Thanks >>> in advance for your help! >>> >>> much obliged, >>> Christoph >>> >>> University of Oslo >>> >>> >>> >>> >>> ---------------------------------------------------------------------------- >>> -- >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Christoph H. <chr...@gm...> - 2012-07-10 10:48:26
|
Hei Ole, Thanks for your reply. I had looked on the preprocessing page you are referring to just recently. Sounds like a good approach you are using! Will definitely consider that to make the assembly more effective in a next try. Thanks for that! For now, I think I am pretty much over all the trimming and correction steps (once I get this last thing sorted out..). As far as I can see the next step is already building the unitigs, so I ll try to finish this assembly as it is now. Will try to improve it afterwards. I am really curious how a first attempt of a hybrid approach (454+illumina) will perform in comparison to the pure illumina assemblies which I have pretty much optimized now (and with which I am pretty happy, btw), I think. I am afraid, your suggestion to do doFragmentCorrection=0 directly now will not work. For the next step (the unitigger) I ll need an intact overlap store. As it is now, I think it is useless, being only half-updated.. I also discovered that just rerunning the previous overlapStore command (the one before the frg- and ovlcorrection) is not working as I thought it would. Seems to be a very unfortunate situation - really dont know how to proceed.. It would be fantastic if anyone could give me a tip what to do!! Thanks for your help! much obliged, Christoph On 09.07.2012 13:20, Ole Kristian Tørresen wrote: > Hi Christoph. > > This is not an answer to your question, but a suggestion for a > work-around. If I remember correctly, you have both Illumina and 454 > reads. Celera runs, as you see below, frgcorrection and overlap based > trimming to correct 454 reads, and merTrim to correct Illumina reads > (can also be used on 454 reads). What I've been doing lately, is to > run meryl on a trusted set of Illumina reads, pair end for example, I > ran it on some overlapping reads which I had merged with FLASH. Then > you can use the set of trusted k-mers to correct different datasets. > For example, I first ran CA to the end of OBT (overlap based trimming) > for my 454 reads, and then output the result as fastq-files. I used > the trusted k-mer set to correct these 454 reads too. If you do this > for all your reads, used either merTim or merTrim/OBT, and do > deduplication on all the datasets too, then you'll end up with reads > that you can use in assemblies where you skip relatively expensive > steps as frgcorrection. > > I don't think frgcorrection is that useful for the type of data you're > using anyway. > > If you have a set of corrected reads, you can use these settings for CA: > doOBT=0 > doFragmentCorrection=0 > > When I think of it, you might use doFragmentCorrection=0 on this > assembly now. You might have to clean up your directory tree, like > removing the 3-overlapcorrection directory and maybe some other steps > too. Apply with caution. > > Most of the stuff I've mentioned I've taken from here: > http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preprocessing > and discussion with Brian. > > Ole > > On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >> Dear users and developers, >> >> I have the following problem: In my assembly process I have just completed >> the fragment- and overlap error correction. Unfortunately runCA stopped in >> the subsequent updating of the overlapStore, because of an incorrectly set >> time limit.. >> If I am trying to resume the assembly now, I get the following error: >> ----------------------------------------START Mon Jul 9 11:05:53 2012 >> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -u >> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >> rrection/salaris.erates> >> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-update-erates.err >> 2>&1 >> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >> seconds) >> ERROR: Failed with signal HUP (1) >> ================================================================================ >> >> runCA failed. >> >> ---------------------------------------- >> Stack trace: >> >> at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >> 1237 >> main::caFailure('failed to apply the overlap corrections', >> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called >> at /usit/titan/u1/chrishah/programmes/wgs >> -7.0/Linux-amd64/bin/./runCA line 4077 >> main::overlapCorrection() called at >> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line 5880 >> >> ---------------------------------------- >> Last few lines of the relevant log file >> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-update-erates.err): >> >> AS_OVS_openBinaryOverlapFile()-- Failed to open >> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >> reading: No such file or directory >> >> ---------------------------------------- >> Failure message: >> >> failed to apply the overlap corrections >> >> >> >> So it can obviously not find the file /salaris.ovlStore/0001~. The reason >> is, from what I can see, that the /salaris.ovlStore/0001~ file has already >> been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems >> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a >> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of from >> 0001~, which is obviously not there any more?? >> Another solution I was thinking of is to run the previous overlapStore >> command again manually (the one that was done before starting the frgcorr >> and ovlcorr: >> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -c >> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 -L >> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >> restore the status from before the frgcorr and ovlcorr steps, before >> resuming runCA. This should restore the 0001~ file, right? The most >> important thing is that I want to avoid rerunning the frgcorr and ovlcorr >> steps, because these steps were really resource intensive. >> >> I would really appreciate any comments or suggestions to my problem! Thanks >> in advance for your help! >> >> much obliged, >> Christoph >> >> University of Oslo >> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |
From: Ole K. T. <o.k...@bi...> - 2012-07-09 11:20:54
|
Hi Christoph. This is not an answer to your question, but a suggestion for a work-around. If I remember correctly, you have both Illumina and 454 reads. Celera runs, as you see below, frgcorrection and overlap based trimming to correct 454 reads, and merTrim to correct Illumina reads (can also be used on 454 reads). What I've been doing lately, is to run meryl on a trusted set of Illumina reads, pair end for example, I ran it on some overlapping reads which I had merged with FLASH. Then you can use the set of trusted k-mers to correct different datasets. For example, I first ran CA to the end of OBT (overlap based trimming) for my 454 reads, and then output the result as fastq-files. I used the trusted k-mer set to correct these 454 reads too. If you do this for all your reads, used either merTim or merTrim/OBT, and do deduplication on all the datasets too, then you'll end up with reads that you can use in assemblies where you skip relatively expensive steps as frgcorrection. I don't think frgcorrection is that useful for the type of data you're using anyway. If you have a set of corrected reads, you can use these settings for CA: doOBT=0 doFragmentCorrection=0 When I think of it, you might use doFragmentCorrection=0 on this assembly now. You might have to clean up your directory tree, like removing the 3-overlapcorrection directory and maybe some other steps too. Apply with caution. Most of the stuff I've mentioned I've taken from here: http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preprocessing and discussion with Brian. Ole On 9 July 2012 12:47, Christoph Hahn <chr...@gm...> wrote: > Dear users and developers, > > I have the following problem: In my assembly process I have just completed > the fragment- and overlap error correction. Unfortunately runCA stopped in > the subsequent updating of the overlapStore, because of an incorrectly set > time limit.. > If I am trying to resume the assembly now, I get the following error: > ----------------------------------------START Mon Jul 9 11:05:53 2012 > /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -u > /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore > /projects/nn9201k/Celera/work2/salaris1/3-overlapco > rrection/salaris.erates> > /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-update-erates.err > 2>&1 > ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 > seconds) > ERROR: Failed with signal HUP (1) > ================================================================================ > > runCA failed. > > ---------------------------------------- > Stack trace: > > at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line > 1237 > main::caFailure('failed to apply the overlap corrections', > '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called > at /usit/titan/u1/chrishah/programmes/wgs > -7.0/Linux-amd64/bin/./runCA line 4077 > main::overlapCorrection() called at > /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line 5880 > > ---------------------------------------- > Last few lines of the relevant log file > (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-update-erates.err): > > AS_OVS_openBinaryOverlapFile()-- Failed to open > '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for > reading: No such file or directory > > ---------------------------------------- > Failure message: > > failed to apply the overlap corrections > > > > So it can obviously not find the file /salaris.ovlStore/0001~. The reason > is, from what I can see, that the /salaris.ovlStore/0001~ file has already > been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems > to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a > way to tell runCA to continue from /salaris.ovlStore/0250~, instead of from > 0001~, which is obviously not there any more?? > Another solution I was thinking of is to run the previous overlapStore > command again manually (the one that was done before starting the frgcorr > and ovlcorr: > /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -c > /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g > /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 -L > /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list > > /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to > restore the status from before the frgcorr and ovlcorr steps, before > resuming runCA. This should restore the 0001~ file, right? The most > important thing is that I want to avoid rerunning the frgcorr and ovlcorr > steps, because these steps were really resource intensive. > > I would really appreciate any comments or suggestions to my problem! Thanks > in advance for your help! > > much obliged, > Christoph > > University of Oslo > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |
From: Christoph H. <chr...@gm...> - 2012-07-09 10:47:39
|
Dear users and developers, I have the following problem: In my assembly process I have just completed the fragment- and overlap error correction. Unfortunately runCA stopped in the subsequent updating of the overlapStore, because of an incorrectly set time limit.. If I am trying to resume the assembly now, I get the following error: ----------------------------------------START Mon Jul 9 11:05:53 2012 /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -u /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore /projects/nn9201k/Celera/work2/salaris1/3-overlapco rrection/salaris.erates> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-update-erates.err 2>&1 ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 seconds) ERROR: Failed with signal HUP (1) ================================================================================ runCA failed. ---------------------------------------- Stack trace: at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line 1237 main::caFailure('failed to apply the overlap corrections', '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called at /usit/titan/u1/chrishah/programmes/wgs -7.0/Linux-amd64/bin/./runCA line 4077 main::overlapCorrection() called at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line 5880 ---------------------------------------- Last few lines of the relevant log file (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-update-erates.err): AS_OVS_openBinaryOverlapFile()-- Failed to open '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for reading: No such file or directory ---------------------------------------- Failure message: failed to apply the overlap corrections So it can obviously not find the file /salaris.ovlStore/0001~. The reason is, from what I can see, that the /salaris.ovlStore/0001~ file has already been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a way to tell runCA to continue from /salaris.ovlStore/0250~, instead of from 0001~, which is obviously not there any more?? Another solution I was thinking of is to run the previous overlapStore command again manually (the one that was done before starting the frgcorr and ovlcorr: /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -c /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 -L /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list > /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to restore the status from before the frgcorr and ovlcorr steps, before resuming runCA. This should restore the 0001~ file, right? The most important thing is that I want to avoid rerunning the frgcorr and ovlcorr steps, because these steps were really resource intensive. I would really appreciate any comments or suggestions to my problem! Thanks in advance for your help! much obliged, Christoph University of Oslo |
From: Christoph H. <chr...@gm...> - 2012-07-09 09:34:50
|
Hi Arjun, Thanks for your reply! I am in contact with the system administrator to try to avoid taking down nodes in the future. In fact I am using a shell script in a particular format to submit jobs to the cluster, where I have to set a memory limit in advance. I think it is likely that this limit is implemented as something like the ulimit command you mentioned and normally jobs are just killed when the memory limit is exceeded - this runCA job was somehow exceptional and I think the sys admins are at the moment looking into the reasons why the job took down the node instead of just being killed. Thanks again for your suggestion! cheers, Christoph On 07/05/2012 08:43 PM, Arjun Prasad wrote: > > Hi Christoph, > > I know this reply is a bit late, but we have the problem of limited > memory and runaway jobs on our cluster also. I haven't had the CA > problem you describe, but when I can't watch a job closely I use a > wrapper shell script with a ulimit command in it to engage the > operating system based limits. (bash uses ulimit, csh and derivatives > I think use limit) > > I'm also not sure using SGE based memory limits will actually help > that much. We have that functionality optionally enabled on our > cluster. I haven't tried it, but some other users have complained that > it doesn't work well. SGE doesn't seem to do a very good job of > keeping track of actual memory used. > > I haven't tried to use a wrapper script called by runCA, so I'm not > completely sure how that would work. If you're running things with > runCA's SGE tie-ins you can ask the sysadmin to add a ulimit command > to an SGE prolog script for a custom queue. > > Something like 'ulimit -d 60000000' should keep you from taking down > the node. > > Arjun > > On Tue, 3 Jul 2012, Walenz, Brian wrote: > >> Sorry about the trouble with the sysadmins. >> >> Given the mix of reads, I'd just skip the dedupe. Both of those library >> types aren't known to have artificial duplications. >> >> Memory usage depends on a lot of factors (the genome itself, genome >> size, >> depth of coverage, read length, number of reads, number of mated >> reads) and >> I don't have any good general advice anymore. >> >> Is it possible to submit a job such that the scheduler will kill it >> if some >> memory limit is exceeded? That might be generally useful enough that >> the >> sysadmins would help to set it up. (I've been arguing for that here >> for a >> while, only to have other users object to the idea - "but I don't >> know how >> big it is going to get! you can't just kill it!") >> >> >> On 7/3/12 4:43 PM, "Christoph Hahn" <chr...@gm...> wrote: >> >>> Dear Brian, >>> >>> Thanks! >>> My last attempt has apparently caused some serious problems on the node >>> it was running on. So, I have to wait for the cluster admins ok >>> before I >>> try again. >>> Will try to run it manually without the obtStore then and keep you >>> posted on the result. The dataset only contains illumina PE and 454 SE. >>> Is there a way to get an idea about the memory requirements beforehand >>> (I have to specify that on the cluster before I start the job and the >>> admin will not be happy if I kill the node again..)? I guess not? >>> >>> Thanks again for your help!!! >>> >>> cheers, >>> Christoph >>> >>> On 07/03/2012 10:24 PM, Walenz, Brian wrote: >>>> Good to know about the restart not working. >>>> >>>> You should be able to run manually without the obtStore by leaving >>>> out the >>>> -ovs option for it. >>>> >>>> To find duplicate mate pairs, it needs to save up overlaps until >>>> both of the >>>> reads in the mate have been seen. The bug in CVS was to not >>>> process mate >>>> pairs until ALL reads were seen. I've not seen this in CA7 but the >>>> same can >>>> happen if the mated reads are 'far away' in the input, for example, >>>> if all >>>> of the 'left' reads are loaded before the 'right' reads. >>>> >>>> If all else fails, you can skip deduplication. There is little >>>> gain in >>>> deduplicating Illumina PE and MP libraries -- PE duplicates don't >>>> really >>>> affect scaffolding, and MP duplicates aren't detectable from overlaps. >>>> Hopefully there aren't any 454 mates in this. >>>> >>>> b >>>> >>>> >>>> On 7/3/12 4:02 PM, "Christoph Hahn" <chr...@gm...> wrote: >>>> >>>> >>>> >>>>> Hi Brian, >>>>> >>>>> Thanks for your reply! >>>>> >>>>> I am using CA7. I am afraid updating is not really an option at the >>>>> moment - I am running it on a cluster and updating CVS might be >>>>> complicated because the cluster administrators are always very >>>>> busy and >>>>> it would thus for sure take a while.. >>>>> >>>>> Therefore, it would be great if you could give me a tip on how to >>>>> handle >>>>> that in CA7 for now. In my latest attempt I used 64 GB RAM and it >>>>> killed >>>>> the node after some 2 hours. I ran the following: >>>>> >>>>> CA version 7.0 ($Id: deduplicate.C,v 1.15 2011/12/29 09:26:03 >>>>> brianwalenz Exp $). >>>>> >>>>> Error Rates: >>>>> AS_OVL_ERROR_RATE 0.060000 >>>>> AS_CNS_ERROR_RATE 0.100000 >>>>> AS_CGW_ERROR_RATE 0.100000 >>>>> AS_MAX_ERROR_RATE 0.250000 >>>>> >>>>> Current Working Directory: >>>>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim >>>>> >>>>> Command: >>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/deduplicate >>>>> \ >>>>> -gkp /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore \ >>>>> -ovs >>>>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.obtStore >>>>> \ >>>>> -ovs >>>>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.dupStore >>>>> \ >>>>> -report >>>>> >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.lo>>> >> >> g >>>>> \ >>>>> -summary >>>>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.su >>>>> >>>>> mm >>>>> ary >>>>> >>>>> Here are the first and last few lines of salaris.deduplicate.log >>>>> (it has >>>>> 384855 lines, *.deduplicate.summary and *.deduplicate.err are empty): >>>>> >>>>> Delete 28 DUPof 3462651 a 0,76 b 0,76 hang 0,0 diff 0,0 error >>>>> 0.000000 >>>>> Delete 76 DUPof 10667558 a 0,76 b 0,76 hang 0,0 diff 0,0 error >>>>> 0.000000 >>>>> Delete 210 DUPof 8142147 a 0,70 b 0,70 hang 0,0 diff 0,0 error >>>>> 0.000000 >>>>> Delete 216 DUPof 9129559 a 0,76 b 0,76 hang 0,0 diff 0,0 error >>>>> 0.000000 >>>>> Delete 228 DUPof 7781271 a 0,76 b 0,76 hang 0,0 diff 0,0 error >>>>> 0.013200 >>>>> Delete 297 DUPof 11757250 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 319 DUPof 11174680 a 0,73 b 0,73 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> . >>>>> . >>>>> . >>>>> Delete 132295695 DUPof 211765973 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 132296968 DUPof 181491499 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 132297966 DUPof 159665067 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 132304543 DUPof 155518568 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 132307934 DUPof 134266938 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 132309546 DUPof 179301753 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 132313400 DUPof 153142824 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 132319681 DUPof 132368976 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>>> error 0.000000 >>>>> Delete 132323752 DUPof 165992623 a 0,76 (this is exactly how it >>>>> stopped..) >>>>> >>>>> Can I maybe run the deduplicate command manually and only make use of >>>>> the overlaps in the dupStore? When I tried to start CA again it >>>>> continued with finalTrim, so I removed the *.deduplicate.log, etc. >>>>> files >>>>> before I restarted CA. >>>>> It would be great if you could help me out! Thanks!! >>>>> >>>>> cheers, >>>>> Christoph >>>>> >>>>> >>>>> On 07/03/2012 06:44 PM, Walenz, Brian wrote: >>>>>> Hi, Christoph- >>>>>> >>>>>> Are you using CA7 or CVS? >>>>>> >>>>>> This behavior was introduced to CVS on May 21, and fixed on the >>>>>> 29th. The >>>>>> bug was after an optimization in loading overlaps was made - only >>>>>> overlaps >>>>>> in the 'dupStore' are needed, the 'obtStore' can be ignored. This >>>>>> eliminated a huge amount of I/O and overhead from the dedupe >>>>>> compute. >>>>>> >>>>>> If updating CVS doesn't fix the problem, can you send some of the >>>>>> logging >>>>>> from deduplicate? >>>>>> >>>>>> b >>>>>> >>>>>> >>>>>> On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>>>> >>>>>>> Dear developers and users, >>>>>>> >>>>>>> I am encountering some problems in the deduplicate step. >>>>>>> Unfortunately, >>>>>>> the memory usage is steadily increasing until the process dies >>>>>>> because >>>>>>> of exceeding memory limit. So far, I used up to 32 GB. I could >>>>>>> of course >>>>>>> just further increase the available memory, but I was wondering >>>>>>> if there >>>>>>> is a possibility to fix and/or predict the maximum memory usage >>>>>>> for this >>>>>>> step (and maybe also for the next steps) beforehand. >>>>>>> >>>>>>> Thanks for your help! >>>>>>> >>>>>>> much obliged, >>>>>>> Christoph >>>>>>> >>>>>>> Universtiy of Oslo, Norway >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> Live Security Virtual Conference >>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>> Discussions >>>>>>> will include endpoint security, mobile security and the latest >>>>>>> in malware >>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>> _______________________________________________ >>>>>>> wgs-assembler-users mailing list >>>>>>> wgs...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>> >>> >>> >> >> >> ------------------------------------------------------------------------------ >> >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. >> Discussions >> will include endpoint security, mobile security and the latest in >> malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |
From: Louis L. <lou...@ma...> - 2012-07-06 11:20:50
|
Hello, I'm trying to assemble pacbioToCA error corrected pacbio reads. The genome is ecoli. The correction went very wellm but when I tried to assemble I keep getting this error: safeWrite()-- EXPECTED 4172, ended up at 4636 buildUnitigs: AS_UTL_fileIO.C:83: void AS_UTL_safeWrite(FILE*, const void*, const char*, size_t, size_t): Assertion `AS_UTL_ftell(file) == expectedposition' failed. The numbers never change. This is being run on a linux cluster (single node) which has luster as it's underlying file system. I tried with another installation on another server which is also linux, but the underlying is NFS. On the NFS server it worked well. I saw the code of safeWrite explaining problems with FreeBSD, but I'm not using this. Wny ideas of what's going on? I tried, copying my executables from one server to the other and it didn't change anything. I don't understand why fwrite would behave this way except if it was a buffer problem. Thanks Louis |
From: Arjun P. <ap...@ma...> - 2012-07-05 18:43:28
|
Hi Christoph, I know this reply is a bit late, but we have the problem of limited memory and runaway jobs on our cluster also. I haven't had the CA problem you describe, but when I can't watch a job closely I use a wrapper shell script with a ulimit command in it to engage the operating system based limits. (bash uses ulimit, csh and derivatives I think use limit) I'm also not sure using SGE based memory limits will actually help that much. We have that functionality optionally enabled on our cluster. I haven't tried it, but some other users have complained that it doesn't work well. SGE doesn't seem to do a very good job of keeping track of actual memory used. I haven't tried to use a wrapper script called by runCA, so I'm not completely sure how that would work. If you're running things with runCA's SGE tie-ins you can ask the sysadmin to add a ulimit command to an SGE prolog script for a custom queue. Something like 'ulimit -d 60000000' should keep you from taking down the node. Arjun On Tue, 3 Jul 2012, Walenz, Brian wrote: > Sorry about the trouble with the sysadmins. > > Given the mix of reads, I'd just skip the dedupe. Both of those library > types aren't known to have artificial duplications. > > Memory usage depends on a lot of factors (the genome itself, genome size, > depth of coverage, read length, number of reads, number of mated reads) and > I don't have any good general advice anymore. > > Is it possible to submit a job such that the scheduler will kill it if some > memory limit is exceeded? That might be generally useful enough that the > sysadmins would help to set it up. (I've been arguing for that here for a > while, only to have other users object to the idea - "but I don't know how > big it is going to get! you can't just kill it!") > > > On 7/3/12 4:43 PM, "Christoph Hahn" <chr...@gm...> wrote: > >> Dear Brian, >> >> Thanks! >> My last attempt has apparently caused some serious problems on the node >> it was running on. So, I have to wait for the cluster admins ok before I >> try again. >> Will try to run it manually without the obtStore then and keep you >> posted on the result. The dataset only contains illumina PE and 454 SE. >> Is there a way to get an idea about the memory requirements beforehand >> (I have to specify that on the cluster before I start the job and the >> admin will not be happy if I kill the node again..)? I guess not? >> >> Thanks again for your help!!! >> >> cheers, >> Christoph >> >> On 07/03/2012 10:24 PM, Walenz, Brian wrote: >>> Good to know about the restart not working. >>> >>> You should be able to run manually without the obtStore by leaving out the >>> -ovs option for it. >>> >>> To find duplicate mate pairs, it needs to save up overlaps until both of the >>> reads in the mate have been seen. The bug in CVS was to not process mate >>> pairs until ALL reads were seen. I've not seen this in CA7 but the same can >>> happen if the mated reads are 'far away' in the input, for example, if all >>> of the 'left' reads are loaded before the 'right' reads. >>> >>> If all else fails, you can skip deduplication. There is little gain in >>> deduplicating Illumina PE and MP libraries -- PE duplicates don't really >>> affect scaffolding, and MP duplicates aren't detectable from overlaps. >>> Hopefully there aren't any 454 mates in this. >>> >>> b >>> >>> >>> On 7/3/12 4:02 PM, "Christoph Hahn" <chr...@gm...> wrote: >>> >>> >>> >>>> Hi Brian, >>>> >>>> Thanks for your reply! >>>> >>>> I am using CA7. I am afraid updating is not really an option at the >>>> moment - I am running it on a cluster and updating CVS might be >>>> complicated because the cluster administrators are always very busy and >>>> it would thus for sure take a while.. >>>> >>>> Therefore, it would be great if you could give me a tip on how to handle >>>> that in CA7 for now. In my latest attempt I used 64 GB RAM and it killed >>>> the node after some 2 hours. I ran the following: >>>> >>>> CA version 7.0 ($Id: deduplicate.C,v 1.15 2011/12/29 09:26:03 >>>> brianwalenz Exp $). >>>> >>>> Error Rates: >>>> AS_OVL_ERROR_RATE 0.060000 >>>> AS_CNS_ERROR_RATE 0.100000 >>>> AS_CGW_ERROR_RATE 0.100000 >>>> AS_MAX_ERROR_RATE 0.250000 >>>> >>>> Current Working Directory: >>>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim >>>> >>>> Command: >>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/deduplicate \ >>>> -gkp /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore \ >>>> -ovs >>>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.obtStore \ >>>> -ovs >>>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.dupStore \ >>>> -report >>>> > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.lo>>> > g >>>> \ >>>> -summary >>>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.su >>>> mm >>>> ary >>>> >>>> Here are the first and last few lines of salaris.deduplicate.log (it has >>>> 384855 lines, *.deduplicate.summary and *.deduplicate.err are empty): >>>> >>>> Delete 28 DUPof 3462651 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >>>> Delete 76 DUPof 10667558 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >>>> Delete 210 DUPof 8142147 a 0,70 b 0,70 hang 0,0 diff 0,0 error 0.000000 >>>> Delete 216 DUPof 9129559 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >>>> Delete 228 DUPof 7781271 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.013200 >>>> Delete 297 DUPof 11757250 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >>>> Delete 319 DUPof 11174680 a 0,73 b 0,73 hang 0,0 diff 0,0 error 0.000000 >>>> . >>>> . >>>> . >>>> Delete 132295695 DUPof 211765973 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>> error 0.000000 >>>> Delete 132296968 DUPof 181491499 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>> error 0.000000 >>>> Delete 132297966 DUPof 159665067 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>> error 0.000000 >>>> Delete 132304543 DUPof 155518568 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>> error 0.000000 >>>> Delete 132307934 DUPof 134266938 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>> error 0.000000 >>>> Delete 132309546 DUPof 179301753 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>> error 0.000000 >>>> Delete 132313400 DUPof 153142824 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>> error 0.000000 >>>> Delete 132319681 DUPof 132368976 a 0,76 b 0,76 hang 0,0 diff 0,0 >>>> error 0.000000 >>>> Delete 132323752 DUPof 165992623 a 0,76 (this is exactly how it stopped..) >>>> >>>> Can I maybe run the deduplicate command manually and only make use of >>>> the overlaps in the dupStore? When I tried to start CA again it >>>> continued with finalTrim, so I removed the *.deduplicate.log, etc. files >>>> before I restarted CA. >>>> It would be great if you could help me out! Thanks!! >>>> >>>> cheers, >>>> Christoph >>>> >>>> >>>> On 07/03/2012 06:44 PM, Walenz, Brian wrote: >>>>> Hi, Christoph- >>>>> >>>>> Are you using CA7 or CVS? >>>>> >>>>> This behavior was introduced to CVS on May 21, and fixed on the 29th. The >>>>> bug was after an optimization in loading overlaps was made - only overlaps >>>>> in the 'dupStore' are needed, the 'obtStore' can be ignored. This >>>>> eliminated a huge amount of I/O and overhead from the dedupe compute. >>>>> >>>>> If updating CVS doesn't fix the problem, can you send some of the logging >>>>> from deduplicate? >>>>> >>>>> b >>>>> >>>>> >>>>> On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>>> >>>>>> Dear developers and users, >>>>>> >>>>>> I am encountering some problems in the deduplicate step. Unfortunately, >>>>>> the memory usage is steadily increasing until the process dies because >>>>>> of exceeding memory limit. So far, I used up to 32 GB. I could of course >>>>>> just further increase the available memory, but I was wondering if there >>>>>> is a possibility to fix and/or predict the maximum memory usage for this >>>>>> step (and maybe also for the next steps) beforehand. >>>>>> >>>>>> Thanks for your help! >>>>>> >>>>>> much obliged, >>>>>> Christoph >>>>>> >>>>>> Universtiy of Oslo, Norway >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> -- >>>>>> -- >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>>> will include endpoint security, mobile security and the latest in malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> _______________________________________________ >>>>>> wgs-assembler-users mailing list >>>>>> wgs...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>> >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |
From: Walenz, B. <bw...@jc...> - 2012-07-03 21:13:55
|
Sorry about the trouble with the sysadmins. Given the mix of reads, I'd just skip the dedupe. Both of those library types aren't known to have artificial duplications. Memory usage depends on a lot of factors (the genome itself, genome size, depth of coverage, read length, number of reads, number of mated reads) and I don't have any good general advice anymore. Is it possible to submit a job such that the scheduler will kill it if some memory limit is exceeded? That might be generally useful enough that the sysadmins would help to set it up. (I've been arguing for that here for a while, only to have other users object to the idea - "but I don't know how big it is going to get! you can't just kill it!") On 7/3/12 4:43 PM, "Christoph Hahn" <chr...@gm...> wrote: > Dear Brian, > > Thanks! > My last attempt has apparently caused some serious problems on the node > it was running on. So, I have to wait for the cluster admins ok before I > try again. > Will try to run it manually without the obtStore then and keep you > posted on the result. The dataset only contains illumina PE and 454 SE. > Is there a way to get an idea about the memory requirements beforehand > (I have to specify that on the cluster before I start the job and the > admin will not be happy if I kill the node again..)? I guess not? > > Thanks again for your help!!! > > cheers, > Christoph > > On 07/03/2012 10:24 PM, Walenz, Brian wrote: >> Good to know about the restart not working. >> >> You should be able to run manually without the obtStore by leaving out the >> -ovs option for it. >> >> To find duplicate mate pairs, it needs to save up overlaps until both of the >> reads in the mate have been seen. The bug in CVS was to not process mate >> pairs until ALL reads were seen. I've not seen this in CA7 but the same can >> happen if the mated reads are 'far away' in the input, for example, if all >> of the 'left' reads are loaded before the 'right' reads. >> >> If all else fails, you can skip deduplication. There is little gain in >> deduplicating Illumina PE and MP libraries -- PE duplicates don't really >> affect scaffolding, and MP duplicates aren't detectable from overlaps. >> Hopefully there aren't any 454 mates in this. >> >> b >> >> >> On 7/3/12 4:02 PM, "Christoph Hahn" <chr...@gm...> wrote: >> >> >> >>> Hi Brian, >>> >>> Thanks for your reply! >>> >>> I am using CA7. I am afraid updating is not really an option at the >>> moment - I am running it on a cluster and updating CVS might be >>> complicated because the cluster administrators are always very busy and >>> it would thus for sure take a while.. >>> >>> Therefore, it would be great if you could give me a tip on how to handle >>> that in CA7 for now. In my latest attempt I used 64 GB RAM and it killed >>> the node after some 2 hours. I ran the following: >>> >>> CA version 7.0 ($Id: deduplicate.C,v 1.15 2011/12/29 09:26:03 >>> brianwalenz Exp $). >>> >>> Error Rates: >>> AS_OVL_ERROR_RATE 0.060000 >>> AS_CNS_ERROR_RATE 0.100000 >>> AS_CGW_ERROR_RATE 0.100000 >>> AS_MAX_ERROR_RATE 0.250000 >>> >>> Current Working Directory: >>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim >>> >>> Command: >>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/deduplicate \ >>> -gkp /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore \ >>> -ovs >>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.obtStore \ >>> -ovs >>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.dupStore \ >>> -report >>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.lo>>> g >>> \ >>> -summary >>> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.su >>> mm >>> ary >>> >>> Here are the first and last few lines of salaris.deduplicate.log (it has >>> 384855 lines, *.deduplicate.summary and *.deduplicate.err are empty): >>> >>> Delete 28 DUPof 3462651 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >>> Delete 76 DUPof 10667558 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >>> Delete 210 DUPof 8142147 a 0,70 b 0,70 hang 0,0 diff 0,0 error 0.000000 >>> Delete 216 DUPof 9129559 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >>> Delete 228 DUPof 7781271 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.013200 >>> Delete 297 DUPof 11757250 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >>> Delete 319 DUPof 11174680 a 0,73 b 0,73 hang 0,0 diff 0,0 error 0.000000 >>> . >>> . >>> . >>> Delete 132295695 DUPof 211765973 a 0,76 b 0,76 hang 0,0 diff 0,0 >>> error 0.000000 >>> Delete 132296968 DUPof 181491499 a 0,76 b 0,76 hang 0,0 diff 0,0 >>> error 0.000000 >>> Delete 132297966 DUPof 159665067 a 0,76 b 0,76 hang 0,0 diff 0,0 >>> error 0.000000 >>> Delete 132304543 DUPof 155518568 a 0,76 b 0,76 hang 0,0 diff 0,0 >>> error 0.000000 >>> Delete 132307934 DUPof 134266938 a 0,76 b 0,76 hang 0,0 diff 0,0 >>> error 0.000000 >>> Delete 132309546 DUPof 179301753 a 0,76 b 0,76 hang 0,0 diff 0,0 >>> error 0.000000 >>> Delete 132313400 DUPof 153142824 a 0,76 b 0,76 hang 0,0 diff 0,0 >>> error 0.000000 >>> Delete 132319681 DUPof 132368976 a 0,76 b 0,76 hang 0,0 diff 0,0 >>> error 0.000000 >>> Delete 132323752 DUPof 165992623 a 0,76 (this is exactly how it stopped..) >>> >>> Can I maybe run the deduplicate command manually and only make use of >>> the overlaps in the dupStore? When I tried to start CA again it >>> continued with finalTrim, so I removed the *.deduplicate.log, etc. files >>> before I restarted CA. >>> It would be great if you could help me out! Thanks!! >>> >>> cheers, >>> Christoph >>> >>> >>> On 07/03/2012 06:44 PM, Walenz, Brian wrote: >>>> Hi, Christoph- >>>> >>>> Are you using CA7 or CVS? >>>> >>>> This behavior was introduced to CVS on May 21, and fixed on the 29th. The >>>> bug was after an optimization in loading overlaps was made - only overlaps >>>> in the 'dupStore' are needed, the 'obtStore' can be ignored. This >>>> eliminated a huge amount of I/O and overhead from the dedupe compute. >>>> >>>> If updating CVS doesn't fix the problem, can you send some of the logging >>>> from deduplicate? >>>> >>>> b >>>> >>>> >>>> On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>> >>>>> Dear developers and users, >>>>> >>>>> I am encountering some problems in the deduplicate step. Unfortunately, >>>>> the memory usage is steadily increasing until the process dies because >>>>> of exceeding memory limit. So far, I used up to 32 GB. I could of course >>>>> just further increase the available memory, but I was wondering if there >>>>> is a possibility to fix and/or predict the maximum memory usage for this >>>>> step (and maybe also for the next steps) beforehand. >>>>> >>>>> Thanks for your help! >>>>> >>>>> much obliged, >>>>> Christoph >>>>> >>>>> Universtiy of Oslo, Norway >>>>> >>>>> -------------------------------------------------------------------------- >>>>> -- >>>>> -- >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>> will include endpoint security, mobile security and the latest in malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> wgs-assembler-users mailing list >>>>> wgs...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> > > |
From: Christoph H. <chr...@gm...> - 2012-07-03 20:43:39
|
Dear Brian, Thanks! My last attempt has apparently caused some serious problems on the node it was running on. So, I have to wait for the cluster admins ok before I try again. Will try to run it manually without the obtStore then and keep you posted on the result. The dataset only contains illumina PE and 454 SE. Is there a way to get an idea about the memory requirements beforehand (I have to specify that on the cluster before I start the job and the admin will not be happy if I kill the node again..)? I guess not? Thanks again for your help!!! cheers, Christoph On 07/03/2012 10:24 PM, Walenz, Brian wrote: > Good to know about the restart not working. > > You should be able to run manually without the obtStore by leaving out the > -ovs option for it. > > To find duplicate mate pairs, it needs to save up overlaps until both of the > reads in the mate have been seen. The bug in CVS was to not process mate > pairs until ALL reads were seen. I've not seen this in CA7 but the same can > happen if the mated reads are 'far away' in the input, for example, if all > of the 'left' reads are loaded before the 'right' reads. > > If all else fails, you can skip deduplication. There is little gain in > deduplicating Illumina PE and MP libraries -- PE duplicates don't really > affect scaffolding, and MP duplicates aren't detectable from overlaps. > Hopefully there aren't any 454 mates in this. > > b > > > On 7/3/12 4:02 PM, "Christoph Hahn" <chr...@gm...> wrote: > > > >> Hi Brian, >> >> Thanks for your reply! >> >> I am using CA7. I am afraid updating is not really an option at the >> moment - I am running it on a cluster and updating CVS might be >> complicated because the cluster administrators are always very busy and >> it would thus for sure take a while.. >> >> Therefore, it would be great if you could give me a tip on how to handle >> that in CA7 for now. In my latest attempt I used 64 GB RAM and it killed >> the node after some 2 hours. I ran the following: >> >> CA version 7.0 ($Id: deduplicate.C,v 1.15 2011/12/29 09:26:03 >> brianwalenz Exp $). >> >> Error Rates: >> AS_OVL_ERROR_RATE 0.060000 >> AS_CNS_ERROR_RATE 0.100000 >> AS_CGW_ERROR_RATE 0.100000 >> AS_MAX_ERROR_RATE 0.250000 >> >> Current Working Directory: >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim >> >> Command: >> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/deduplicate \ >> -gkp /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore \ >> -ovs >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.obtStore \ >> -ovs >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.dupStore \ >> -report >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.log >> \ >> -summary >> /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.summ >> ary >> >> Here are the first and last few lines of salaris.deduplicate.log (it has >> 384855 lines, *.deduplicate.summary and *.deduplicate.err are empty): >> >> Delete 28 DUPof 3462651 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >> Delete 76 DUPof 10667558 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >> Delete 210 DUPof 8142147 a 0,70 b 0,70 hang 0,0 diff 0,0 error 0.000000 >> Delete 216 DUPof 9129559 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >> Delete 228 DUPof 7781271 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.013200 >> Delete 297 DUPof 11757250 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 >> Delete 319 DUPof 11174680 a 0,73 b 0,73 hang 0,0 diff 0,0 error 0.000000 >> . >> . >> . >> Delete 132295695 DUPof 211765973 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132296968 DUPof 181491499 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132297966 DUPof 159665067 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132304543 DUPof 155518568 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132307934 DUPof 134266938 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132309546 DUPof 179301753 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132313400 DUPof 153142824 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132319681 DUPof 132368976 a 0,76 b 0,76 hang 0,0 diff 0,0 >> error 0.000000 >> Delete 132323752 DUPof 165992623 a 0,76 (this is exactly how it stopped..) >> >> Can I maybe run the deduplicate command manually and only make use of >> the overlaps in the dupStore? When I tried to start CA again it >> continued with finalTrim, so I removed the *.deduplicate.log, etc. files >> before I restarted CA. >> It would be great if you could help me out! Thanks!! >> >> cheers, >> Christoph >> >> >> On 07/03/2012 06:44 PM, Walenz, Brian wrote: >>> Hi, Christoph- >>> >>> Are you using CA7 or CVS? >>> >>> This behavior was introduced to CVS on May 21, and fixed on the 29th. The >>> bug was after an optimization in loading overlaps was made - only overlaps >>> in the 'dupStore' are needed, the 'obtStore' can be ignored. This >>> eliminated a huge amount of I/O and overhead from the dedupe compute. >>> >>> If updating CVS doesn't fix the problem, can you send some of the logging >>> from deduplicate? >>> >>> b >>> >>> >>> On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: >>> >>>> Dear developers and users, >>>> >>>> I am encountering some problems in the deduplicate step. Unfortunately, >>>> the memory usage is steadily increasing until the process dies because >>>> of exceeding memory limit. So far, I used up to 32 GB. I could of course >>>> just further increase the available memory, but I was wondering if there >>>> is a possibility to fix and/or predict the maximum memory usage for this >>>> step (and maybe also for the next steps) beforehand. >>>> >>>> Thanks for your help! >>>> >>>> much obliged, >>>> Christoph >>>> >>>> Universtiy of Oslo, Norway >>>> >>>> ---------------------------------------------------------------------------- >>>> -- >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. Discussions >>>> will include endpoint security, mobile security and the latest in malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> wgs-assembler-users mailing list >>>> wgs...@li... >>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |
From: Walenz, B. <bw...@jc...> - 2012-07-03 20:24:53
|
Good to know about the restart not working. You should be able to run manually without the obtStore by leaving out the -ovs option for it. To find duplicate mate pairs, it needs to save up overlaps until both of the reads in the mate have been seen. The bug in CVS was to not process mate pairs until ALL reads were seen. I've not seen this in CA7 but the same can happen if the mated reads are 'far away' in the input, for example, if all of the 'left' reads are loaded before the 'right' reads. If all else fails, you can skip deduplication. There is little gain in deduplicating Illumina PE and MP libraries -- PE duplicates don't really affect scaffolding, and MP duplicates aren't detectable from overlaps. Hopefully there aren't any 454 mates in this. b On 7/3/12 4:02 PM, "Christoph Hahn" <chr...@gm...> wrote: > Hi Brian, > > Thanks for your reply! > > I am using CA7. I am afraid updating is not really an option at the > moment - I am running it on a cluster and updating CVS might be > complicated because the cluster administrators are always very busy and > it would thus for sure take a while.. > > Therefore, it would be great if you could give me a tip on how to handle > that in CA7 for now. In my latest attempt I used 64 GB RAM and it killed > the node after some 2 hours. I ran the following: > > CA version 7.0 ($Id: deduplicate.C,v 1.15 2011/12/29 09:26:03 > brianwalenz Exp $). > > Error Rates: > AS_OVL_ERROR_RATE 0.060000 > AS_CNS_ERROR_RATE 0.100000 > AS_CGW_ERROR_RATE 0.100000 > AS_MAX_ERROR_RATE 0.250000 > > Current Working Directory: > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim > > Command: > /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/deduplicate \ > -gkp /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore \ > -ovs > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.obtStore \ > -ovs > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.dupStore \ > -report > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.log > \ > -summary > /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.summ > ary > > Here are the first and last few lines of salaris.deduplicate.log (it has > 384855 lines, *.deduplicate.summary and *.deduplicate.err are empty): > > Delete 28 DUPof 3462651 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 > Delete 76 DUPof 10667558 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 > Delete 210 DUPof 8142147 a 0,70 b 0,70 hang 0,0 diff 0,0 error 0.000000 > Delete 216 DUPof 9129559 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 > Delete 228 DUPof 7781271 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.013200 > Delete 297 DUPof 11757250 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 > Delete 319 DUPof 11174680 a 0,73 b 0,73 hang 0,0 diff 0,0 error 0.000000 > . > . > . > Delete 132295695 DUPof 211765973 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132296968 DUPof 181491499 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132297966 DUPof 159665067 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132304543 DUPof 155518568 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132307934 DUPof 134266938 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132309546 DUPof 179301753 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132313400 DUPof 153142824 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132319681 DUPof 132368976 a 0,76 b 0,76 hang 0,0 diff 0,0 > error 0.000000 > Delete 132323752 DUPof 165992623 a 0,76 (this is exactly how it stopped..) > > Can I maybe run the deduplicate command manually and only make use of > the overlaps in the dupStore? When I tried to start CA again it > continued with finalTrim, so I removed the *.deduplicate.log, etc. files > before I restarted CA. > It would be great if you could help me out! Thanks!! > > cheers, > Christoph > > > On 07/03/2012 06:44 PM, Walenz, Brian wrote: >> Hi, Christoph- >> >> Are you using CA7 or CVS? >> >> This behavior was introduced to CVS on May 21, and fixed on the 29th. The >> bug was after an optimization in loading overlaps was made - only overlaps >> in the 'dupStore' are needed, the 'obtStore' can be ignored. This >> eliminated a huge amount of I/O and overhead from the dedupe compute. >> >> If updating CVS doesn't fix the problem, can you send some of the logging >> from deduplicate? >> >> b >> >> >> On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: >> >>> Dear developers and users, >>> >>> I am encountering some problems in the deduplicate step. Unfortunately, >>> the memory usage is steadily increasing until the process dies because >>> of exceeding memory limit. So far, I used up to 32 GB. I could of course >>> just further increase the available memory, but I was wondering if there >>> is a possibility to fix and/or predict the maximum memory usage for this >>> step (and maybe also for the next steps) beforehand. >>> >>> Thanks for your help! >>> >>> much obliged, >>> Christoph >>> >>> Universtiy of Oslo, Norway >>> >>> ---------------------------------------------------------------------------- >>> -- >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |
From: Christoph H. <chr...@gm...> - 2012-07-03 20:02:48
|
Hi Brian, Thanks for your reply! I am using CA7. I am afraid updating is not really an option at the moment - I am running it on a cluster and updating CVS might be complicated because the cluster administrators are always very busy and it would thus for sure take a while.. Therefore, it would be great if you could give me a tip on how to handle that in CA7 for now. In my latest attempt I used 64 GB RAM and it killed the node after some 2 hours. I ran the following: CA version 7.0 ($Id: deduplicate.C,v 1.15 2011/12/29 09:26:03 brianwalenz Exp $). Error Rates: AS_OVL_ERROR_RATE 0.060000 AS_CNS_ERROR_RATE 0.100000 AS_CGW_ERROR_RATE 0.100000 AS_MAX_ERROR_RATE 0.250000 Current Working Directory: /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim Command: /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/deduplicate \ -gkp /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore \ -ovs /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.obtStore \ -ovs /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.dupStore \ -report /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.log \ -summary /projects/nn9201k/Celera/work2/salaris1/0-overlaptrim/salaris.deduplicate.summary Here are the first and last few lines of salaris.deduplicate.log (it has 384855 lines, *.deduplicate.summary and *.deduplicate.err are empty): Delete 28 DUPof 3462651 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 76 DUPof 10667558 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 210 DUPof 8142147 a 0,70 b 0,70 hang 0,0 diff 0,0 error 0.000000 Delete 216 DUPof 9129559 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 228 DUPof 7781271 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.013200 Delete 297 DUPof 11757250 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 319 DUPof 11174680 a 0,73 b 0,73 hang 0,0 diff 0,0 error 0.000000 . . . Delete 132295695 DUPof 211765973 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 132296968 DUPof 181491499 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 132297966 DUPof 159665067 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 132304543 DUPof 155518568 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 132307934 DUPof 134266938 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 132309546 DUPof 179301753 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 132313400 DUPof 153142824 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 132319681 DUPof 132368976 a 0,76 b 0,76 hang 0,0 diff 0,0 error 0.000000 Delete 132323752 DUPof 165992623 a 0,76 (this is exactly how it stopped..) Can I maybe run the deduplicate command manually and only make use of the overlaps in the dupStore? When I tried to start CA again it continued with finalTrim, so I removed the *.deduplicate.log, etc. files before I restarted CA. It would be great if you could help me out! Thanks!! cheers, Christoph On 07/03/2012 06:44 PM, Walenz, Brian wrote: > Hi, Christoph- > > Are you using CA7 or CVS? > > This behavior was introduced to CVS on May 21, and fixed on the 29th. The > bug was after an optimization in loading overlaps was made - only overlaps > in the 'dupStore' are needed, the 'obtStore' can be ignored. This > eliminated a huge amount of I/O and overhead from the dedupe compute. > > If updating CVS doesn't fix the problem, can you send some of the logging > from deduplicate? > > b > > > On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: > >> Dear developers and users, >> >> I am encountering some problems in the deduplicate step. Unfortunately, >> the memory usage is steadily increasing until the process dies because >> of exceeding memory limit. So far, I used up to 32 GB. I could of course >> just further increase the available memory, but I was wondering if there >> is a possibility to fix and/or predict the maximum memory usage for this >> step (and maybe also for the next steps) beforehand. >> >> Thanks for your help! >> >> much obliged, >> Christoph >> >> Universtiy of Oslo, Norway >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Walenz, B. <bw...@jc...> - 2012-07-03 16:44:57
|
Hi, Christoph- Are you using CA7 or CVS? This behavior was introduced to CVS on May 21, and fixed on the 29th. The bug was after an optimization in loading overlaps was made - only overlaps in the 'dupStore' are needed, the 'obtStore' can be ignored. This eliminated a huge amount of I/O and overhead from the dedupe compute. If updating CVS doesn't fix the problem, can you send some of the logging from deduplicate? b On 7/3/12 6:28 AM, "Christoph Hahn" <chr...@gm...> wrote: > Dear developers and users, > > I am encountering some problems in the deduplicate step. Unfortunately, > the memory usage is steadily increasing until the process dies because > of exceeding memory limit. So far, I used up to 32 GB. I could of course > just further increase the available memory, but I was wondering if there > is a possibility to fix and/or predict the maximum memory usage for this > step (and maybe also for the next steps) beforehand. > > Thanks for your help! > > much obliged, > Christoph > > Universtiy of Oslo, Norway > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Christoph H. <chr...@gm...> - 2012-07-03 10:28:50
|
Dear developers and users, I am encountering some problems in the deduplicate step. Unfortunately, the memory usage is steadily increasing until the process dies because of exceeding memory limit. So far, I used up to 32 GB. I could of course just further increase the available memory, but I was wondering if there is a possibility to fix and/or predict the maximum memory usage for this step (and maybe also for the next steps) beforehand. Thanks for your help! much obliged, Christoph Universtiy of Oslo, Norway |
From: Walenz, B. <bw...@jc...> - 2012-06-14 21:55:55
|
Hi, Geoff- In the context of merTrim, no. It uses the mer database to find trusted kmers. It guesses the coverage contained in the input reads, then picks a lower limit on what 'trusted' should be. IIRC, trusted is 1/4 the input coverage -- 15x of input reads will treat any kmer with less than 4 occurrences as erroneous. In the context of overlaps, no. It uses the mer database to avoid seeding an overlap alignment on high-frequency seeds. b On 6/13/12 4:45 PM, "Waldbieser, Geoff" <Geo...@AR...> wrote: > When producing a mer database from a set of paired end Illumina reads, you > recommend a lower limit of 15X genome coverage. Is there an upper limit to the > amount of genome coverage so that one does not oversample sequencing error? > > Geoff Waldbieser |
From: Waldbieser, G. <Geo...@AR...> - 2012-06-13 20:46:29
|
When producing a mer database from a set of paired end Illumina reads, you recommend a lower limit of 15X genome coverage. Is there an upper limit to the amount of genome coverage so that one does not oversample sequencing error? Geoff Waldbieser -----Original Message----- From: wgs...@li... [mailto:wgs...@li...] Sent: Tuesday, June 12, 2012 12:39 AM To: wgs...@li... Subject: wgs-assembler-users Digest, Vol 3, Issue 1 Send wgs-assembler-users mailing list submissions to wgs...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users or, via email, send a message with subject or body 'help' to wgs...@li... You can reach the person managing the list at wgs...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of wgs-assembler-users digest..." Today's Topics: 1. FastqToCA for paired-end reads (Mundy, Michael) 2. Re: FastqToCA for paired-end reads (Ole Kristian T?rresen) 3. Re: small gaps of fixed length (Sajeet Haridas) 4. Ion torrent data (Powers, Jason) 5. ContigContainment failed (Ole Kristian T?rresen) 6. Re: ContigContainment failed (Walenz, Brian) ---------------------------------------------------------------------- Message: 1 Date: Mon, 14 May 2012 13:32:19 -0500 From: "Mundy, Michael" <Mun...@ma...> Subject: [wgs-assembler-users] FastqToCA for paired-end reads To: <wgs...@li...> Message-ID: <CBD6B9E3.978%Mun...@ma...> Content-Type: text/plain; charset="iso-8859-1" I?m using WGS 7.0 and I have two synchronized fastq files with paired-end reads. Based on the documentation at http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=FastqToC A, I tried this command: wgs-7.0/Linux-amd64/bin/fastqToCA -libraryname SRR067601.000 -mates SRR067601.000_1_pair.fq,SRR067601.000_2_pair.fq But it returns this error: ERROR: Mated reads (-mates) must have am insert size (-insertsize). The documentation page says that the ?insertsize option is optional so I thought that was the flag to distinguish between paired-end reads and mate-pair reads. How do I generate a FRG file for paired-end reads? Mike Mundy -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 2 Date: Mon, 14 May 2012 20:46:31 +0200 From: Ole Kristian T?rresen <o.k...@bi...> Subject: Re: [wgs-assembler-users] FastqToCA for paired-end reads To: "Mundy, Michael" <Mun...@ma...> Cc: wgs...@li... Message-ID: <CAH...@ma...> Content-Type: text/plain; charset=windows-1252 On 14 May 2012 20:32, Mundy, Michael <Mun...@ma...> wrote: > I?m using WGS 7.0 and I have two synchronized fastq files with paired-end > reads. ?Based on the documentation at > http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=FastqToCA, > I tried this command: > > wgs-7.0/Linux-amd64/bin/fastqToCA -libraryname SRR067601.000 -mates > SRR067601.000_1_pair.fq,SRR067601.000_2_pair.fq > > But it returns this error: > > ERROR: ?Mated reads (-mates) must have am insert size (-insertsize). > > The documentation page says that the ?insertsize option is optional so I > thought that was the flag to distinguish between paired-end reads and > mate-pair reads. ?How do I generate a FRG file for paired-end reads? I guess the documentation is not up to date, so it's not optional to supply the -insertsize option. Just add -insertsize 300 30, if your reads are from a 300 bp DNA fragment and are paired end, or do something like -insertsize 5000 500 -outtie if they are mate pairs from a 5k library. Ole > > Mike Mundy > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > ------------------------------ Message: 3 Date: Fri, 18 May 2012 11:26:28 -0700 From: "Sajeet Haridas" <sa...@gm...> Subject: Re: [wgs-assembler-users] small gaps of fixed length To: <wgs...@li...> Message-ID: <000601cd3523$bed2dcb0$3c789610$@com> Content-Type: text/plain; charset="us-ascii" Hello Brian, Since all the CTP mea is more than -20, can I assume that no overlaps were skipped where contigs overlapped more than 20 bases (ie: all overlaps > 20 were successfully merged into scaffolds with no N's). How does the scaffolder behave when the overlap between contgs is 5bp or less? For a small fungal genome with relatively little repeats and no allelic variations (is haploid), are there any parameters the can reduce the number of false gaps. Will increasing cgwErrorRate help? Thank you, Sajeet -----Original Message----- From: Walenz, Brian [mailto:bw...@jc...] Sent: May-16-12 7:35 PM To: Sajeet Haridas Subject: RE: small gaps of fixed length IIRC, that's the marker for "contigs should overlap, but no overlap found". Possibilities here: the overlap is shorter than we can detect, or there is crud on the end of one contig, or the error rate is too high. Are you on the mailing list? This would have been a nice discussion there. ________________________________________ From: Sajeet Haridas Sent: Wednesday, May 16, 2012 8:22 PM To: Walenz, Brian Subject: RE: small gaps of fixed length Thank you Brian. I also notice that the minimum CTP mea is -20. Is this value also capped? Sajeet From: Walenz, Brian Sent: May-16-12 1:12 PM To: Sajeet Haridas Subject: Re: small gaps of fixed length Yes - that's the lower limit on a gap between contigs. Either mate pairs indicate the contigs should overlap but no overlap could be found, or there really is a small positive gap. Only the asm file will distinguish the two. Look under 'mea' here: http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=ASM_File s#SCF_CTP b On 5/16/12 3:29 AM, "Sajeet Haridas" wrote: Hello Brian, My fungal genome assemblies (30-35MBp) always seem to have ~2500 small gaps always represented by 20 N's - using bog, bogart and various other parameters. Is the assembler trying to tell me something? Thank you, Sajeet ------------------------------ Message: 4 Date: Sun, 20 May 2012 13:09:47 +0000 From: "Powers, Jason" <jp...@ex...> Subject: [wgs-assembler-users] Ion torrent data To: "wgs...@li..." <wgs...@li...> Message-ID: <1EF140AF181DEF48AAAB3C8A832AE432010AC1AD@EA-EXCHANGE3.ExpressionAnalysis.local> Content-Type: text/plain; charset="us-ascii" Any thoughts about the best settings in fastqToCA for Ion Torrent data? My guess is that it is closest to illumina, but thought I would see what you guys thought. Thanks Jason -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 5 Date: Mon, 11 Jun 2012 19:51:27 +0200 From: Ole Kristian T?rresen <o.k...@bi...> Subject: [wgs-assembler-users] ContigContainment failed To: wgs...@li... Message-ID: <CAH...@ma...> Content-Type: text/plain; charset=ISO-8859-1 Hi, I got an assertion fail while scaffolding today. Error message: CreateNewGraphNode()-- Contig 16085537 * Create a contig 16085537 in scaffold 131913 >>> Fixing up suspicious overlap (15370134,16085537,I) (ahg:-340 bhg:-143) to (15370134,16085537,O) (ahg:143 bhg:340) len: 137 * FOEXS: SUSPICIOUS Overlap found! Looked for (16085537,15370134,I)[20,1044] found (15370134,16085537,O) 137 WARNING: InsertChunkOverlap()-- Chunk overlap already exists. NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 WARNING: CreateChunkOverlapFromEdge()-- Chunk overlap already exists. Keeping old overlap. NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 * Switched right-left, orientation went from I to O * CreateAContigInScaffold() failed. ContigContainment failed. cgw: LeastSquaresGaps_CGW.C:1410: RecomputeOffsetsStatus RecomputeOffsetsInScaffold(ScaffoldGraphT*, CDS_CID_t, int, int, int): Assertion `0' failed. This is about 30x coverage with Illumina reads; 10x combined reads (180 bp insert, 100 nt reads and combined with FLASH), 10x PE 100 nt reads 300 insert and 5k mate pair 100 nt reads, all error corrected with Quake. It's probably not an optimal combination, but I'm testing a bit and interested in the result. Using bogart and dnc, the 5k library compared against the other two. Other options are default. Is that assertion something that can be fixed? Thank you. Ole ------------------------------ Message: 6 Date: Tue, 12 Jun 2012 01:38:29 -0400 From: "Walenz, Brian" <bw...@jc...> Subject: Re: [wgs-assembler-users] ContigContainment failed To: Ole Kristian T?rresen <o.k...@bi...>, "wgs...@li..." <wgs...@li...> Message-ID: <CBFC4E15.4865%bw...@jc...> Content-Type: text/plain; charset="iso-8859-1" Hi, Ole- After merging two scaffolds, we do a least squares estimate of the gap sizes in the new scaffold. If those gap sizes imply two contigs should be merged (via a negative gap) we try to merge them. This merge failed for reasons I haven't looked into. Probably bad sequence alignment. I think you can safely disable this assert. Gap size estimation might run through a few more iterations (with the same result) and eventually give up. The scaffold will have (slightly?) bogus gap sizes. There are, unfortunately, plenty of other scaffolds that fail to get gap size estimates, so this isn't a disaster. b On 6/11/12 1:51 PM, "Ole Kristian T?rresen" <o.k...@bi...> wrote: > Hi, > I got an assertion fail while scaffolding today. Error message: > CreateNewGraphNode()-- Contig 16085537 > * Create a contig 16085537 in scaffold 131913 >>>> Fixing up suspicious overlap (15370134,16085537,I) (ahg:-340 bhg:-143) to >>>> (15370134,16085537,O) (ahg:143 bhg:340) len: 137 > * FOEXS: SUSPICIOUS Overlap found! Looked for > (16085537,15370134,I)[20,1044] found (15370134,16085537,O) 137 > WARNING: InsertChunkOverlap()-- Chunk overlap already exists. > NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags > 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 > OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags > 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 > WARNING: CreateChunkOverlapFromEdge()-- Chunk overlap already exists. > Keeping old overlap. > NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags > 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 > OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags > 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 > * Switched right-left, orientation went from I to O > * CreateAContigInScaffold() failed. > ContigContainment failed. > cgw: LeastSquaresGaps_CGW.C:1410: RecomputeOffsetsStatus > RecomputeOffsetsInScaffold(ScaffoldGraphT*, CDS_CID_t, int, int, int): > Assertion `0' failed. > > > This is about 30x coverage with Illumina reads; 10x combined reads > (180 bp insert, 100 nt reads and combined with FLASH), 10x PE 100 nt > reads 300 insert and 5k mate pair 100 nt reads, all error corrected > with Quake. It's probably not an optimal combination, but I'm testing > a bit and interested in the result. > > Using bogart and dnc, the 5k library compared against the other two. > Other options are default. > > Is that assertion something that can be fixed? > > Thank you. > > Ole > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users ------------------------------ ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ------------------------------ _______________________________________________ wgs-assembler-users mailing list wgs...@li... https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users End of wgs-assembler-users Digest, Vol 3, Issue 1 ************************************************* This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. |
From: Ole K. T. <o.k...@bi...> - 2012-06-12 07:00:05
|
On 12 June 2012 07:38, Walenz, Brian <bw...@jc...> wrote: > Hi, Ole- > > After merging two scaffolds, we do a least squares estimate of the gap sizes > in the new scaffold. If those gap sizes imply two contigs should be merged > (via a negative gap) we try to merge them. This merge failed for reasons I > haven't looked into. Probably bad sequence alignment. > > I think you can safely disable this assert. Gap size estimation might run > through a few more iterations (with the same result) and eventually give up. > The scaffold will have (slightly?) bogus gap sizes. There are, > unfortunately, plenty of other scaffolds that fail to get gap size > estimates, so this isn't a disaster. Great! I commented it out in the source code, and rerunning now. Thank you. Ole > > b > > > On 6/11/12 1:51 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > >> Hi, >> I got an assertion fail while scaffolding today. Error message: >> CreateNewGraphNode()-- Contig 16085537 >> * Create a contig 16085537 in scaffold 131913 >>>>> Fixing up suspicious overlap (15370134,16085537,I) (ahg:-340 bhg:-143) to >>>>> (15370134,16085537,O) (ahg:143 bhg:340) len: 137 >> * FOEXS: SUSPICIOUS Overlap found! Looked for >> (16085537,15370134,I)[20,1044] found (15370134,16085537,O) 137 >> WARNING: InsertChunkOverlap()-- Chunk overlap already exists. >> NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags >> 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 >> OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags >> 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 >> WARNING: CreateChunkOverlapFromEdge()-- Chunk overlap already exists. >> Keeping old overlap. >> NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags >> 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 >> OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags >> 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 >> * Switched right-left, orientation went from I to O >> * CreateAContigInScaffold() failed. >> ContigContainment failed. >> cgw: LeastSquaresGaps_CGW.C:1410: RecomputeOffsetsStatus >> RecomputeOffsetsInScaffold(ScaffoldGraphT*, CDS_CID_t, int, int, int): >> Assertion `0' failed. >> >> >> This is about 30x coverage with Illumina reads; 10x combined reads >> (180 bp insert, 100 nt reads and combined with FLASH), 10x PE 100 nt >> reads 300 insert and 5k mate pair 100 nt reads, all error corrected >> with Quake. It's probably not an optimal combination, but I'm testing >> a bit and interested in the result. >> >> Using bogart and dnc, the 5k library compared against the other two. >> Other options are default. >> >> Is that assertion something that can be fixed? >> >> Thank you. >> >> Ole >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |
From: Walenz, B. <bw...@jc...> - 2012-06-12 05:38:42
|
Hi, Ole- After merging two scaffolds, we do a least squares estimate of the gap sizes in the new scaffold. If those gap sizes imply two contigs should be merged (via a negative gap) we try to merge them. This merge failed for reasons I haven't looked into. Probably bad sequence alignment. I think you can safely disable this assert. Gap size estimation might run through a few more iterations (with the same result) and eventually give up. The scaffold will have (slightly?) bogus gap sizes. There are, unfortunately, plenty of other scaffolds that fail to get gap size estimates, so this isn't a disaster. b On 6/11/12 1:51 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > Hi, > I got an assertion fail while scaffolding today. Error message: > CreateNewGraphNode()-- Contig 16085537 > * Create a contig 16085537 in scaffold 131913 >>>> Fixing up suspicious overlap (15370134,16085537,I) (ahg:-340 bhg:-143) to >>>> (15370134,16085537,O) (ahg:143 bhg:340) len: 137 > * FOEXS: SUSPICIOUS Overlap found! Looked for > (16085537,15370134,I)[20,1044] found (15370134,16085537,O) 137 > WARNING: InsertChunkOverlap()-- Chunk overlap already exists. > NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags > 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 > OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags > 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 > WARNING: CreateChunkOverlapFromEdge()-- Chunk overlap already exists. > Keeping old overlap. > NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags > 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 > OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags > 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 > * Switched right-left, orientation went from I to O > * CreateAContigInScaffold() failed. > ContigContainment failed. > cgw: LeastSquaresGaps_CGW.C:1410: RecomputeOffsetsStatus > RecomputeOffsetsInScaffold(ScaffoldGraphT*, CDS_CID_t, int, int, int): > Assertion `0' failed. > > > This is about 30x coverage with Illumina reads; 10x combined reads > (180 bp insert, 100 nt reads and combined with FLASH), 10x PE 100 nt > reads 300 insert and 5k mate pair 100 nt reads, all error corrected > with Quake. It's probably not an optimal combination, but I'm testing > a bit and interested in the result. > > Using bogart and dnc, the 5k library compared against the other two. > Other options are default. > > Is that assertion something that can be fixed? > > Thank you. > > Ole > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Ole K. T. <o.k...@bi...> - 2012-06-11 17:51:38
|
Hi, I got an assertion fail while scaffolding today. Error message: CreateNewGraphNode()-- Contig 16085537 * Create a contig 16085537 in scaffold 131913 >>> Fixing up suspicious overlap (15370134,16085537,I) (ahg:-340 bhg:-143) to (15370134,16085537,O) (ahg:143 bhg:340) len: 137 * FOEXS: SUSPICIOUS Overlap found! Looked for (16085537,15370134,I)[20,1044] found (15370134,16085537,O) 137 WARNING: InsertChunkOverlap()-- Chunk overlap already exists. NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 WARNING: CreateChunkOverlapFromEdge()-- Chunk overlap already exists. Keeping old overlap. NEW 15370134,16085537,O - min/max 131/142 0/0 erate 0.100000 flags 10000 overlap 137 hang 0,0 qual 0.000000 offset 0,0 OLD 15370134,16085537,O - min/max 20/1044 20/1044 erate 0.100000 flags 10001 overlap 137 hang 143,340 qual 0.000000 offset 0,0 * Switched right-left, orientation went from I to O * CreateAContigInScaffold() failed. ContigContainment failed. cgw: LeastSquaresGaps_CGW.C:1410: RecomputeOffsetsStatus RecomputeOffsetsInScaffold(ScaffoldGraphT*, CDS_CID_t, int, int, int): Assertion `0' failed. This is about 30x coverage with Illumina reads; 10x combined reads (180 bp insert, 100 nt reads and combined with FLASH), 10x PE 100 nt reads 300 insert and 5k mate pair 100 nt reads, all error corrected with Quake. It's probably not an optimal combination, but I'm testing a bit and interested in the result. Using bogart and dnc, the 5k library compared against the other two. Other options are default. Is that assertion something that can be fixed? Thank you. Ole |
From: Powers, J. <jp...@ex...> - 2012-05-20 13:22:13
|
Any thoughts about the best settings in fastqToCA for Ion Torrent data? My guess is that it is closest to illumina, but thought I would see what you guys thought. Thanks Jason |
From: Sajeet H. <sa...@gm...> - 2012-05-18 18:26:37
|
Hello Brian, Since all the CTP mea is more than -20, can I assume that no overlaps were skipped where contigs overlapped more than 20 bases (ie: all overlaps > 20 were successfully merged into scaffolds with no N's). How does the scaffolder behave when the overlap between contgs is 5bp or less? For a small fungal genome with relatively little repeats and no allelic variations (is haploid), are there any parameters the can reduce the number of false gaps. Will increasing cgwErrorRate help? Thank you, Sajeet -----Original Message----- From: Walenz, Brian [mailto:bw...@jc...] Sent: May-16-12 7:35 PM To: Sajeet Haridas Subject: RE: small gaps of fixed length IIRC, that's the marker for "contigs should overlap, but no overlap found". Possibilities here: the overlap is shorter than we can detect, or there is crud on the end of one contig, or the error rate is too high. Are you on the mailing list? This would have been a nice discussion there. ________________________________________ From: Sajeet Haridas Sent: Wednesday, May 16, 2012 8:22 PM To: Walenz, Brian Subject: RE: small gaps of fixed length Thank you Brian. I also notice that the minimum CTP mea is -20. Is this value also capped? Sajeet From: Walenz, Brian Sent: May-16-12 1:12 PM To: Sajeet Haridas Subject: Re: small gaps of fixed length Yes - that's the lower limit on a gap between contigs. Either mate pairs indicate the contigs should overlap but no overlap could be found, or there really is a small positive gap. Only the asm file will distinguish the two. Look under 'mea' here: http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=ASM_File s#SCF_CTP b On 5/16/12 3:29 AM, "Sajeet Haridas" wrote: Hello Brian, My fungal genome assemblies (30-35MBp) always seem to have ~2500 small gaps always represented by 20 N's - using bog, bogart and various other parameters. Is the assembler trying to tell me something? Thank you, Sajeet |