Re: [wgs-assembler-users] runCA stopped while updating overlapStore - how to resume???

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Brian,

I ran the runCA-overlapStoreBuild.pl script now. It created the three 
scripts:
1-bucketize.sh
2-sort.sh
3-index.sh

right now I am running 1-bucketize.sh for every job index from 1 to 
2135. I have distributed the jobs on several CPUs and that works nicely.

when this is finished I need to run 2-sort.sh. I specified -jobs 100 in 
the runCA-overlapStoreBuild.pl, so as far as I understand it should have 
created 100 jobs, right? So, I run 2-sort.sh for jobIDs 1 to 100, then? 
the jobID in this case is actually the slicenumber, right? so, for e.g. 
2-sort.sh 2 it will look through all bucket directories and pull out 
slice002.gz, read them into memory and write the overlaps into the store.

When this is done I just need to run 3-index.sh once. No jobIDs 
required, right?

Am I missing anything?

cheers,
Christoph

On 07/11/2012 05:54 AM, Walenz, Brian wrote:
> The first step will create 1 job for each overlapper job.  These should be
> small memory, but there is some internal buffering done and I usually
> request 2gb for them anyway.
>
> The second step will create '-jobs j' jobs.  Memory size here is a giant
> unknown.  The '-memory m' option will cause the job to not run if it needs
> more than that much memory.  Currently, you'll have to increase -memory for
> these jobs and find a bigger machine.
>
> All jobs in both steps are single-threaded and run independently of each
> other.
>
> b
>
>
>
>
> On 7/10/12 6:46 PM, "Christoph Hahn" <chr...@gm...> wrote:
>
>> Hi Brian,
>>
>> Thanks! overlaps are being computed now and CVS version of CA has been
>> successfully compiled. Will try the runCA-overlapStoreBuild.pl once the
>> overlapper is finished. One question there: I understand that the memory
>> usage is regulated by the -jobs j parameter. higher value for j means
>> less memory for every job. How can I specify the number of CPUs to be
>> used in the parallel steps?
>>
>> Thanks for your help! I appreciate it!
>>
>> cheers,
>> Christoph
>>
>> On 07/10/2012 10:18 PM, Walenz, Brian wrote:
>>> Quick guess is that runCA is finding the old ovlStore and assuming it is
>>> complete, then continuing on to frgcorr.  runCA tests for the existence of
>>> name.ovlStore to determine if overlaps are finished; it doesn't check that
>>> the store is valid.  So, delete *ovlStore* too.
>>>
>>> Your latest build (from scratch) is suffering from a long standing
>>> dependency issue.  It needs kmer checked out and 'make install'ed.
>>>
>>> make[1]: *** No rule to make target `sweatShop.H', needed by
>>> `classifyMates.o'.  Stop.
>>> make[1]: *** Waiting for unfinished jobs....
>>> make: *** [objs] Error 1
>>>
>>> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild.
>>>
>>> The kmer included in CA7 is too old for the CVS version of CA, so you'll
>>> need to grab it from subversion.
>>>
>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou
>>> t_and_Compile
>>>
>>> b
>>>
>>>
>>> On 7/10/12 4:00 PM, "Christoph Hahn" <chr...@gm...> wrote:
>>>
>>>> Hi,
>>>>
>>>> I actually tried to just rerun the overlapper. I moved the 1-overlapper
>>>> and the 3-overlapcorrection directories and just ran runCA and it
>>>> immediately starts with doing frgcorr. Do you mean recompute from the
>>>> very start? Is there a way to avoid recomputing the initial overlaps at
>>>> least(it took some 10000 CPUhours)??
>>>>
>>>> Tried to compile it again - not successful. Ran make in the src
>>>> directory (output in makelog) and also in the AS_RUN directory (output
>>>> AS_RUN-makelog).
>>>>
>>>> Thanks,
>>>> Christoph
>>>>
>>>>
>>>> On 07/10/2012 09:04 PM, Walenz, Brian wrote:
>>>>> Odd, the *gz should only be deleted after the store is successfully built.
>>>>> runCA might have been confused by the attempt to rerun.  The easiest will
>>>>> be
>>>>> to recompute.  :-(
>>>>>
>>>>> I've never seen the 'libCA.a' error before.  That particular program is the
>>>>> first to get built.  Looks like libCA.a wasn't created.  My fix for most
>>>>> strange compile errors is to remove the entire Linux-amd64 directory and
>>>>> recompile.  If that fails, send along the complete output of make and I'll
>>>>> take a look.
>>>>>
>>>>> b
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote:
>>>>>
>>>>>> Hi Brian,
>>>>>>
>>>>>> Thanks for your reply!
>>>>>>
>>>>>> I would be happy to try the new parallel overlap store build, but I
>>>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont have
>>>>>> them any more. Looks like they were deleted after the ovlStore was
>>>>>> build. So I guess I ll need to run the overlapper again, first. Am I
>>>>>> understanding that correctly?
>>>>>>
>>>>>> I have downloaded the cvs and tried to make, but I get:
>>>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop.
>>>>>>
>>>>>> I really appreciate your help!
>>>>>>
>>>>>> cheers,
>>>>>> Christoph
>>>>>>
>>>>>>
>>>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote:
>>>>>>> Hi, Christoph-
>>>>>>>
>>>>>>> The original overlap store build is difficult to resume.  I think it can
>>>>>>> be
>>>>>>> done, but it will take code changes that are probably specific to the
>>>>>>> case
>>>>>>> you have.  Only if you do not have the *ovb.gz outputs from overlapper
>>>>>>> will
>>>>>>> I suggest this.
>>>>>>>
>>>>>>> Option 1 is then to restart.
>>>>>>>
>>>>>>> Option 2 is to use a new 'data-parallel' overlap store build
>>>>>>> (AS_RUN/runCA-overlapStoreBuild.pl).  It runs as a series of three grid
>>>>>>> jobs.  The first job is parallel, and transfers the overlapper output
>>>>>>> into
>>>>>>> buckets for sorting.  The second job, also parallel, sorts each bucket.
>>>>>>> The
>>>>>>> final job, sequential, builds an index for the store.  Since this compute
>>>>>>> is
>>>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily.
>>>>>>>
>>>>>>> Its performance can be great -- at JCVI we've seen builds that we
>>>>>>> estimated
>>>>>>> would take 2 days using the original sequential build, finish in a few
>>>>>>> (4?)
>>>>>>> hours with the data parallel version.  But on our development cluster, it
>>>>>>> is
>>>>>>> slower than the sequential version.  It depends on the disk throughput.
>>>>>>> Our
>>>>>>> dev cluster is powered off of a 6-disk ZFS, while the production side has
>>>>>>> a
>>>>>>> big Isilon.
>>>>>>>
>>>>>>> It is only in CVS.  I just added command line help and a bit of
>>>>>>> documentation, so do an update first.
>>>>>>>
>>>>>>> Happy to provide help if you want to try it out.  More than happy to
>>>>>>> accept
>>>>>>> better documentation.
>>>>>>>
>>>>>>> b
>>>>>>>
>>>>>>>
>>>>>>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote:
>>>>>>>
>>>>>>>> Hei Ole,
>>>>>>>>
>>>>>>>> Thanks for your reply. I had looked on the preprocessing page you are
>>>>>>>> referring to just recently. Sounds like a good approach you are using!
>>>>>>>> Will definitely consider that to make the assembly more effective in a
>>>>>>>> next try. Thanks for that!
>>>>>>>> For now, I think I am pretty much over all the trimming and correction
>>>>>>>> steps (once I get this last thing sorted out..). As far as I can see the
>>>>>>>> next step is already building the unitigs, so I ll try to finish this
>>>>>>>> assembly as it is now. Will try to improve it afterwards. I am really
>>>>>>>> curious how a first attempt of a hybrid approach (454+illumina) will
>>>>>>>> perform in comparison to the pure illumina assemblies which I have
>>>>>>>> pretty much optimized now (and with which I am pretty happy, btw), I
>>>>>>>> think.
>>>>>>>>
>>>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now
>>>>>>>> will not work. For the next step (the unitigger) I ll need an intact
>>>>>>>> overlap store. As it is now, I think it is useless, being only
>>>>>>>> half-updated.. I also discovered that just rerunning the previous
>>>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is not
>>>>>>>> working as I thought it would.
>>>>>>>> Seems to be a very unfortunate situation - really dont know how to
>>>>>>>> proceed.. It would be fantastic if anyone could give me a tip what to
>>>>>>>> do!!
>>>>>>>>
>>>>>>>> Thanks for your help!
>>>>>>>>
>>>>>>>> much obliged,
>>>>>>>> Christoph
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote:
>>>>>>>>> Hi Christoph.
>>>>>>>>>
>>>>>>>>> This is not an answer to your question, but a suggestion for a
>>>>>>>>> work-around. If I remember correctly, you have both Illumina and 454
>>>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based
>>>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads
>>>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to
>>>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I
>>>>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then
>>>>>>>>> you can use the set of trusted k-mers to correct different datasets.
>>>>>>>>> For example, I first ran CA to the end of OBT (overlap based trimming)
>>>>>>>>> for my 454 reads, and then output the result as fastq-files. I used
>>>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this
>>>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do
>>>>>>>>> deduplication on all the datasets too, then you'll end up with reads
>>>>>>>>> that you can use in assemblies where you skip relatively expensive
>>>>>>>>> steps as frgcorrection.
>>>>>>>>>
>>>>>>>>> I don't think frgcorrection is that useful for the type of data you're
>>>>>>>>> using anyway.
>>>>>>>>>
>>>>>>>>> If you have a set of corrected reads, you can use these settings for
>>>>>>>>> CA:
>>>>>>>>> doOBT=0
>>>>>>>>> doFragmentCorrection=0
>>>>>>>>>
>>>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this
>>>>>>>>> assembly now. You might have to clean up your directory tree, like
>>>>>>>>> removing the 3-overlapcorrection directory and maybe some other steps
>>>>>>>>> too. Apply with caution.
>>>>>>>>>
>>>>>>>>> Most of the stuff I've mentioned I've taken from here:
>>>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Pre
>>>>>>>>> pr
>>>>>>>>> oc
>>>>>>>>> es
>>>>>>>>> sing
>>>>>>>>> and discussion with Brian.
>>>>>>>>>
>>>>>>>>> Ole
>>>>>>>>>
>>>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...>  wrote:
>>>>>>>>>> Dear users and developers,
>>>>>>>>>>
>>>>>>>>>> I have the following problem: In my assembly process I have just
>>>>>>>>>> completed
>>>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA
>>>>>>>>>> stopped
>>>>>>>>>> in
>>>>>>>>>> the subsequent updating of the overlapStore, because of an incorrectly
>>>>>>>>>> set
>>>>>>>>>> time limit..
>>>>>>>>>> If I am trying to resume the assembly now, I get the following error:
>>>>>>>>>> ----------------------------------------START Mon Jul  9 11:05:53 2012
>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore
>>>>>>>>>> -u
>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore
>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco
>>>>>>>>>> rrection/salaris.erates>
>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSto
>>>>>>>>>> re
>>>>>>>>>> -u
>>>>>>>>>> pd
>>>>>>>>>> ate-erates.err
>>>>>>>>>> 2>&1
>>>>>>>>>> ----------------------------------------END Mon Jul  9 11:05:54 2012
>>>>>>>>>> (1
>>>>>>>>>> seconds)
>>>>>>>>>> ERROR: Failed with signal HUP (1)
>>>>>>>>>> ======================================================================
>>>>>>>>>> ==
>>>>>>>>>> ==
>>>>>>>>>> ==
>>>>>>>>>> ====
>>>>>>>>>>
>>>>>>>>>> runCA failed.
>>>>>>>>>>
>>>>>>>>>> ----------------------------------------
>>>>>>>>>> Stack trace:
>>>>>>>>>>
>>>>>>>>>>       at
>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA
>>>>>>>>>> line
>>>>>>>>>> 1237
>>>>>>>>>>              main::caFailure('failed to apply the overlap corrections',
>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...')
>>>>>>>>>> called
>>>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs
>>>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077
>>>>>>>>>>              main::overlapCorrection() called at
>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA
>>>>>>>>>> line
>>>>>>>>>> 5880
>>>>>>>>>>
>>>>>>>>>> ----------------------------------------
>>>>>>>>>> Last few lines of the relevant log file
>>>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSt
>>>>>>>>>> or
>>>>>>>>>> e-
>>>>>>>>>> up
>>>>>>>>>> date-erates.err):
>>>>>>>>>>
>>>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open
>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for
>>>>>>>>>> reading: No such file or directory
>>>>>>>>>>
>>>>>>>>>> ----------------------------------------
>>>>>>>>>> Failure message:
>>>>>>>>>>
>>>>>>>>>> failed to apply the overlap corrections
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The
>>>>>>>>>> reason
>>>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has
>>>>>>>>>> already
>>>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it
>>>>>>>>>> seems
>>>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is
>>>>>>>>>> there
>>>>>>>>>> a
>>>>>>>>>> way to tell runCA to continue from  /salaris.ovlStore/0250~, instead
>>>>>>>>>> of
>>>>>>>>>> from
>>>>>>>>>> 0001~, which is obviously not there any more??
>>>>>>>>>> Another solution I was thinking of is to run the previous overlapStore
>>>>>>>>>> command again manually (the one that was done before starting the
>>>>>>>>>> frgcorr
>>>>>>>>>> and ovlcorr:
>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore
>>>>>>>>>> -c
>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING  -g
>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore  -i 0 -M
>>>>>>>>>> 14000
>>>>>>>>>> -L
>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list>
>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to
>>>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, before
>>>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most
>>>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and
>>>>>>>>>> ovlcorr
>>>>>>>>>> steps, because these steps were really resource intensive.
>>>>>>>>>>
>>>>>>>>>> I would really appreciate any comments or suggestions to my problem!
>>>>>>>>>> Thanks
>>>>>>>>>> in advance for your help!
>>>>>>>>>>
>>>>>>>>>> much obliged,
>>>>>>>>>> Christoph
>>>>>>>>>>
>>>>>>>>>> University of Oslo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Live Security Virtual Conference
>>>>>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>>>>>> Discussions
>>>>>>>>>> will include endpoint security, mobile security and the latest in
>>>>>>>>>> malware
>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>>>>>> _______________________________________________
>>>>>>>>>> wgs-assembler-users mailing list
>>>>>>>>>> wgs...@li...
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> --
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Live Security Virtual Conference
>>>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>>>> Discussions
>>>>>>>> will include endpoint security, mobile security and the latest in
>>>>>>>> malware
>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>>>> _______________________________________________
>>>>>>>> wgs-assembler-users mailing list
>>>>>>>> wgs...@li...
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>