Re: [wgs-assembler-users] runCA stopped while updating overlapStore - how to resume???

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Brian,

Thanks for your reply!

I would be happy to try the new parallel overlap store build, but I 
think I need the *.ovb.gz outputs for that and unfortunately I dont have 
them any more. Looks like they were deleted after the ovlStore was 
build. So I guess I ll need to run the overlapper again, first. Am I 
understanding that correctly?

I have downloaded the cvs and tried to make, but I get:
*** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop.

I really appreciate your help!

cheers,
Christoph

On 07/10/2012 05:09 PM, Walenz, Brian wrote:
> Hi, Christoph-
>
> The original overlap store build is difficult to resume.  I think it can be
> done, but it will take code changes that are probably specific to the case
> you have.  Only if you do not have the *ovb.gz outputs from overlapper will
> I suggest this.
>
> Option 1 is then to restart.
>
> Option 2 is to use a new 'data-parallel' overlap store build
> (AS_RUN/runCA-overlapStoreBuild.pl).  It runs as a series of three grid
> jobs.  The first job is parallel, and transfers the overlapper output into
> buckets for sorting.  The second job, also parallel, sorts each bucket.  The
> final job, sequential, builds an index for the store.  Since this compute is
> just a collection of jobs, it can be restarted/resumed/fixed easily.
>
> Its performance can be great -- at JCVI we've seen builds that we estimated
> would take 2 days using the original sequential build, finish in a few (4?)
> hours with the data parallel version.  But on our development cluster, it is
> slower than the sequential version.  It depends on the disk throughput.  Our
> dev cluster is powered off of a 6-disk ZFS, while the production side has a
> big Isilon.
>
> It is only in CVS.  I just added command line help and a bit of
> documentation, so do an update first.
>
> Happy to provide help if you want to try it out.  More than happy to accept
> better documentation.
>
> b
>
>
> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote:
>
>> Hei Ole,
>>
>> Thanks for your reply. I had looked on the preprocessing page you are
>> referring to just recently. Sounds like a good approach you are using!
>> Will definitely consider that to make the assembly more effective in a
>> next try. Thanks for that!
>> For now, I think I am pretty much over all the trimming and correction
>> steps (once I get this last thing sorted out..). As far as I can see the
>> next step is already building the unitigs, so I ll try to finish this
>> assembly as it is now. Will try to improve it afterwards. I am really
>> curious how a first attempt of a hybrid approach (454+illumina) will
>> perform in comparison to the pure illumina assemblies which I have
>> pretty much optimized now (and with which I am pretty happy, btw), I think.
>>
>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now
>> will not work. For the next step (the unitigger) I ll need an intact
>> overlap store. As it is now, I think it is useless, being only
>> half-updated.. I also discovered that just rerunning the previous
>> overlapStore command (the one before the frg- and ovlcorrection) is not
>> working as I thought it would.
>> Seems to be a very unfortunate situation - really dont know how to
>> proceed.. It would be fantastic if anyone could give me a tip what to do!!
>>
>> Thanks for your help!
>>
>> much obliged,
>> Christoph
>>
>>
>>
>>
>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote:
>>> Hi Christoph.
>>>
>>> This is not an answer to your question, but a suggestion for a
>>> work-around. If I remember correctly, you have both Illumina and 454
>>> reads. Celera runs, as you see below, frgcorrection and overlap based
>>> trimming to correct 454 reads, and merTrim to correct Illumina reads
>>> (can also be used on 454 reads). What I've been doing lately, is to
>>> run meryl on a trusted set of Illumina reads, pair end for example, I
>>> ran it on some overlapping reads which I had merged with FLASH. Then
>>> you can use the set of trusted k-mers to correct different datasets.
>>> For example, I first ran CA to the end of OBT (overlap based trimming)
>>> for my 454 reads, and then output the result as fastq-files. I used
>>> the trusted k-mer set to correct these 454 reads too. If you do this
>>> for all your reads, used either merTim or merTrim/OBT, and do
>>> deduplication on all the datasets too, then you'll end up with reads
>>> that you can use in assemblies where you skip relatively expensive
>>> steps as frgcorrection.
>>>
>>> I don't think frgcorrection is that useful for the type of data you're
>>> using anyway.
>>>
>>> If you have a set of corrected reads, you can use these settings for CA:
>>> doOBT=0
>>> doFragmentCorrection=0
>>>
>>> When I think of it, you might use doFragmentCorrection=0 on this
>>> assembly now. You might have to clean up your directory tree, like
>>> removing the 3-overlapcorrection directory and maybe some other steps
>>> too. Apply with caution.
>>>
>>> Most of the stuff I've mentioned I've taken from here:
>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preproces
>>> sing
>>> and discussion with Brian.
>>>
>>> Ole
>>>
>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...>  wrote:
>>>> Dear users and developers,
>>>>
>>>> I have the following problem: In my assembly process I have just completed
>>>> the fragment- and overlap error correction. Unfortunately runCA stopped in
>>>> the subsequent updating of the overlapStore, because of an incorrectly set
>>>> time limit..
>>>> If I am trying to resume the assembly now, I get the following error:
>>>> ----------------------------------------START Mon Jul  9 11:05:53 2012
>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore  -u
>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore
>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco
>>>> rrection/salaris.erates>
>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-upd
>>>> ate-erates.err
>>>> 2>&1
>>>> ----------------------------------------END Mon Jul  9 11:05:54 2012 (1
>>>> seconds)
>>>> ERROR: Failed with signal HUP (1)
>>>> ============================================================================
>>>> ====
>>>>
>>>> runCA failed.
>>>>
>>>> ----------------------------------------
>>>> Stack trace:
>>>>
>>>>    at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line
>>>> 1237
>>>>           main::caFailure('failed to apply the overlap corrections',
>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called
>>>> at /usit/titan/u1/chrishah/programmes/wgs
>>>> -7.0/Linux-amd64/bin/./runCA line 4077
>>>>           main::overlapCorrection() called at
>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line 5880
>>>>
>>>> ----------------------------------------
>>>> Last few lines of the relevant log file
>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-up
>>>> date-erates.err):
>>>>
>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open
>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for
>>>> reading: No such file or directory
>>>>
>>>> ----------------------------------------
>>>> Failure message:
>>>>
>>>> failed to apply the overlap corrections
>>>>
>>>>
>>>>
>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The reason
>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has already
>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems
>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a
>>>> way to tell runCA to continue from  /salaris.ovlStore/0250~, instead of from
>>>> 0001~, which is obviously not there any more??
>>>> Another solution I was thinking of is to run the previous overlapStore
>>>> command again manually (the one that was done before starting the frgcorr
>>>> and ovlcorr:
>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore  -c
>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING  -g
>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore  -i 0 -M 14000 -L
>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list>
>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to
>>>> restore the status from before the frgcorr and ovlcorr steps, before
>>>> resuming runCA. This should restore the 0001~ file, right? The most
>>>> important thing is that I want to avoid rerunning the frgcorr and ovlcorr
>>>> steps, because these steps were really resource intensive.
>>>>
>>>> I would really appreciate any comments or suggestions to my problem! Thanks
>>>> in advance for your help!
>>>>
>>>> much obliged,
>>>> Christoph
>>>>
>>>> University of Oslo
>>>>
>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------------------
>>>> --
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond. Discussions
>>>> will include endpoint security, mobile security and the latest in malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> wgs-assembler-users mailing list
>>>> wgs...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> wgs-assembler-users mailing list
>> wgs...@li...
>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users