wgs-assembler-users Mailing List for Whole-Genome Shotgun Assembler (Page 16)

Brought to you by: brianwalenz, jasonmiller9704, mcschatz, skoren

wgs-assembler-users — Discussion about Celera Assembler

You can subscribe to this list here.

2012	_Jan (1)	_Feb (2)	_Mar	_Apr (29)	_May (8)	_Jun (5)	_Jul (46)	_Aug (16)	_Sep (5)	_Oct (6)	_Nov (17)	_Dec (7)
2013	_Jan (5)	_Feb (2)	_Mar (10)	_Apr (13)	_May (20)	_Jun (7)	_Jul (6)	_Aug (14)	_Sep (9)	_Oct (19)	_Nov (17)	_Dec (3)
2014	_Jan (3)	_Feb	_Mar (7)	_Apr (1)	_May (1)	_Jun (30)	_Jul (10)	_Aug (2)	_Sep (18)	_Oct (3)	_Nov (4)	_Dec (13)
2015	_Jan (27)	_Feb	_Mar (19)	_Apr (12)	_May (10)	_Jun (18)	_Jul (4)	_Aug (2)	_Sep (2)	_Oct	_Nov (1)	_Dec (9)
2016	_Jan (6)	_Feb	_Mar	_Apr	_May	_Jun	_Jul (1)	_Aug (1)	_Sep (1)	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 14 15 16 17 18 19 > >> (Page 16 of 19)

Re: [wgs-assembler-users] huge assembly with thousands of MergeScaffoldsAggressive iterations

From: Walenz, B. <bw...@jc...> - 2012-07-30 16:44:29

Hi, Heiner-

Working backwards through your email:

We've also noticed the 'large scaffold gets lots of little contigs added'
problem.  This seems to be dominating our run time.  I'm working on this
problem at the moment.  Our previous solution was basically what you did:
let it run until we get impatient, then kill it and restart from the next
checkpoint label.

The CVS tip has a slight improvement in cgw, committed around the 20th.  I
hope to have much more within the next week.

You can ignore the mates in the library, but not the reads.  To ignore the
mates, simply delete the mate link from gkpStore.  At the very bottom of the
'gatekeeper' page on the wiki is 'allfragsunmated' which will remove the
mate link from all reads in a single library.  This is a destructive
operation!  Save a backup of gkpStore/fnm and gkpStore/fpk if you want to
revert.  (these two files store metadata for long and short fragments resp.)

http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Gatekeep
er

FYI- The 5-consensus-insert-size directory has a plot of the insert size
histogram for each library.  These are based on unitigs, and so the 20k
library might not be represented well.  tigStore (the command) can also
analyze mate pairs for contigs/unitigs in the store with -d matepair.

b

On 7/26/12 5:39 PM, "kuhl" <ku...@mo...> wrote:

> Hi Brian et al.,
> 
> I am currently running a huge assembly with CA7 (2.5Gb 30x Illumina + 454,
> cgw takes 150-300Gb RAM). It is now in step 7-2 and I have just stopped cgw
> at MergeScaffoldsAggressive iteration 1641 and restarted it at ckp08-2SM. I
> did this also in 7-0 at iteration 2xxx. Now I am not sure, if I should
> maybe rerun scaffolding without 20 kb mate pairs, which I think are
> responsible for this mess. So I have two questions:
> 
> How can I convince cgw to ignore a certain library without doing steps 0-5
> again? 
> 
> Is there a rule of thumb, when MergeScaffoldsAggressive should be stopped?
> 
> 
> In my case it looks like cgw is only very slightly progressing with each
> iteration and there is one large scaffold that is growing more and more...
> 
> ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60498
> ExamineUsableSEdges()- maxWeightEdge from 0 to 19 at idx 8774 out of 60498
> ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60500
> ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60500
> ExamineUsableSEdges()- maxWeightEdge from 0 to 16 at idx 10594 out of
> 60500
> ExamineUsableSEdges()- maxWeightEdge from 0 to 7 at idx 20348 out of 60500
> ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60489
> ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60489
> ExamineUsableSEdges()- maxWeightEdge from 0 to 19 at idx 8773 out of 60489
> ExamineUsableSEdges()- maxWeightEdge from 0 to 9 at idx 16854 out of 60489
> ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60486
> ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60486
> ExamineUsableSEdges()- maxWeightEdge from 0 to 16 at idx 10593 out of
> 60486
> ExamineUsableSEdges()- maxWeightEdge from 0 to 7 at idx 20428 out of 60486
> 
> Regards, Heiner
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] runCA and qsub...no syncing?

From: Walenz, B. <bw...@jc...> - 2012-07-28 04:57:28

I saw the syntax error.  I don't think it's from runCA.  All it is doing at that point is running the reported qsub command through perl's system().  I don't see any misplaced quotes in there.  Actually, I was a little confused on the first message as to what the error was; everything looked fine....except for that syntax error.

Do you have the 'sync' capability?  The non-grid run mode in runCA is to run N concurrent processes.  I think you should be able to modify this to instead of running the job on the local machine, to submit to the grid with -sync.  When a job finishes, the qsub command returns, and runCA runs the next job.

This might work for LSF too, if anyone is listening that cares.

b


________________________________________
From: Powers, Jason [jp...@ex...]
Sent: Friday, July 27, 2012 9:12 PM
To: Walenz, Brian; wgs...@li...
Subject: RE: [wgs-assembler-users] runCA and qsub...no syncing?

Hi Brian,

We use an internally developed scheduler called grun. At one point it was available on this website, http://code.google.com/p/ea-utils/, but we haven’t been maintaining that site very well and it appears as though it was pulled recently. Anyway I thought we had hold_jid working, but perhaps not. I will have to check it out on Monday. Alternatively, do you have any other theories regarding why it’s not working? As you can see that error is actually “Syntax error” – my assumption was that it the underlying cause was that jobs weren’t waiting for each other appropriately. Do you think that’s the root of it all?

Thanks
Jason

From: Walenz, Brian [mailto:bw...@jc...]
Sent: Friday, July 27, 2012 4:42 PM
To: Powers, Jason; wgs...@li...
Subject: Re: [wgs-assembler-users] runCA and qsub...no syncing?

Hi-

runCA isn’t using –sync.  It uses –hold_jid to tell SGE to hold a job in queue until all other jobs with that name have completed.

One ‘feature’ of runCA that might help you out here is “useGrid=1 scriptOnGrid=0”.  This will set all the jobs up for sge, but not actually submit anything.  It’ll tell you to submit an array of jobs, then when those finish, to restart runCA.

What scheduler are you using?

b


On 7/27/12 4:33 PM, "Powers, Jason" <jp...@ex...> wrote:
Hi all,

We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We’ve got pacBioToCA working now with the system, and just today I’ve started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the “-sync” flag. Predictably, this causes downstream applications to fail.

Here is the out file:

----------------------------------------START Fri Jul 27 15:31:51 2012
/opt/bin/wgs-bin/gatekeeper  -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING  -T  -F  /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1
----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds)
numFrags = 329675
----------------------------------------START Fri Jul 27 15:32:00 2012
/opt/bin/wgs-bin/meryl  -B -C -v -m 11 -memory 128000 -threads 8 -c 0  -L 2  -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain  -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1
----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/estimate-mer-threshold  -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain  -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0  > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err
----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err
----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err
----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds)
Reset OBT mer threshold from auto to 421.
Reset OVL mer threshold from auto to 421.
----------------------------------------START Fri Jul 27 15:32:36 2012
qsub -A assembly  -cwd -N mbt_Distributed_test \
  -t 1-1 \
  -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \
  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh

Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted
DEBUG START mbt_Distributed_test: jobs 3756459
DEBUG NOSYNC mbt_Distributed_test
----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:36 2012
qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh
sh: 2: Syntax error: Unterminated quoted string
Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted
----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds)




You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward






Reset OBT mer threshold from auto to 421.
Reset OVL mer threshold from auto to 421.
----------------------------------------START Fri Jul 27 15:54:59 2012
find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list
----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:54:59 2012
/opt/bin/wgs-bin/merTrimApply \
-g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
-L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \
-l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \
> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1
----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds)
----------------------------------------START Fri Jul 27 15:55:01 2012
/opt/bin/wgs-bin/initialTrim \
-log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \
-frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
>  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \
2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err
----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds)
----------------------------------------START Fri Jul 27 15:55:05 2012
/opt/bin/wgs-bin/overlap_partition \
-g  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
-bl 20000000 \
-bs 0 \
-rs 5000000 \
-o  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap
HASH          1-    210916  REFR          1-    329675  STRINGS     210916  BASES   20000041
HASH     210917-    289851  REFR          1-    329675  STRINGS      78935  BASES   20000051
HASH     289852-    303091  REFR          1-    329675  STRINGS      13240  BASES   20002127
HASH     303092-    317562  REFR          1-    329675  STRINGS      14471  BASES   20000574
HASH     317563-    329675  REFR          1-    329675  STRINGS      12113  BASES   16625470
----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds)
Created 5 overlap jobs.  Last batch '001', last job '000005'.
----------------------------------------START Fri Jul 27 15:55:05 2012
qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \
  -t 1-5 \
  -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \
  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh

Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136
DEBUG NOSYNC ovl_Distributed_test
----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds)
----------------------------------------START Fri Jul 27 15:55:07 2012
qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh
sh: 2: Syntax error: Unterminated quoted string
Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted
----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds)



Any thoughts?

Re: [wgs-assembler-users] runCA and qsub...no syncing?

From: Powers, J. <jp...@ex...> - 2012-07-28 01:12:16

Hi Brian,

We use an internally developed scheduler called grun. At one point it was available on this website, http://code.google.com/p/ea-utils/, but we haven't been maintaining that site very well and it appears as though it was pulled recently. Anyway I thought we had hold_jid working, but perhaps not. I will have to check it out on Monday. Alternatively, do you have any other theories regarding why it's not working? As you can see that error is actually "Syntax error" - my assumption was that it the underlying cause was that jobs weren't waiting for each other appropriately. Do you think that's the root of it all?

Thanks
Jason

From: Walenz, Brian [mailto:bw...@jc...]
Sent: Friday, July 27, 2012 4:42 PM
To: Powers, Jason; wgs...@li...
Subject: Re: [wgs-assembler-users] runCA and qsub...no syncing?

Hi-

runCA isn't using -sync.  It uses -hold_jid to tell SGE to hold a job in queue until all other jobs with that name have completed.

One 'feature' of runCA that might help you out here is "useGrid=1 scriptOnGrid=0".  This will set all the jobs up for sge, but not actually submit anything.  It'll tell you to submit an array of jobs, then when those finish, to restart runCA.

What scheduler are you using?

b


On 7/27/12 4:33 PM, "Powers, Jason" <jp...@ex...> wrote:
Hi all,

We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We've got pacBioToCA working now with the system, and just today I've started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the "-sync" flag. Predictably, this causes downstream applications to fail.

Here is the out file:

----------------------------------------START Fri Jul 27 15:31:51 2012
/opt/bin/wgs-bin/gatekeeper  -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING  -T  -F  /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1
----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds)
numFrags = 329675
----------------------------------------START Fri Jul 27 15:32:00 2012
/opt/bin/wgs-bin/meryl  -B -C -v -m 11 -memory 128000 -threads 8 -c 0  -L 2  -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain  -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1
----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/estimate-mer-threshold  -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain  -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0  > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err
----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err
----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err
----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds)
Reset OBT mer threshold from auto to 421.
Reset OVL mer threshold from auto to 421.
----------------------------------------START Fri Jul 27 15:32:36 2012
qsub -A assembly  -cwd -N mbt_Distributed_test \
  -t 1-1 \
  -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \
  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh

Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted
DEBUG START mbt_Distributed_test: jobs 3756459
DEBUG NOSYNC mbt_Distributed_test
----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:36 2012
qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh
sh: 2: Syntax error: Unterminated quoted string
Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted
----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds)




You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward






Reset OBT mer threshold from auto to 421.
Reset OVL mer threshold from auto to 421.
----------------------------------------START Fri Jul 27 15:54:59 2012
find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list
----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:54:59 2012
/opt/bin/wgs-bin/merTrimApply \
-g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
-L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \
-l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \
> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1
----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds)
----------------------------------------START Fri Jul 27 15:55:01 2012
/opt/bin/wgs-bin/initialTrim \
-log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \
-frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
>  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \
2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err
----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds)
----------------------------------------START Fri Jul 27 15:55:05 2012
/opt/bin/wgs-bin/overlap_partition \
-g  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
-bl 20000000 \
-bs 0 \
-rs 5000000 \
-o  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap
HASH          1-    210916  REFR          1-    329675  STRINGS     210916  BASES   20000041
HASH     210917-    289851  REFR          1-    329675  STRINGS      78935  BASES   20000051
HASH     289852-    303091  REFR          1-    329675  STRINGS      13240  BASES   20002127
HASH     303092-    317562  REFR          1-    329675  STRINGS      14471  BASES   20000574
HASH     317563-    329675  REFR          1-    329675  STRINGS      12113  BASES   16625470
----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds)
Created 5 overlap jobs.  Last batch '001', last job '000005'.
----------------------------------------START Fri Jul 27 15:55:05 2012
qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \
  -t 1-5 \
  -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \
  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh

Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136
DEBUG NOSYNC ovl_Distributed_test
----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds)
----------------------------------------START Fri Jul 27 15:55:07 2012
qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh
sh: 2: Syntax error: Unterminated quoted string
Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted
----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds)



Any thoughts?

Re: [wgs-assembler-users] runCA and qsub...no syncing?

From: Walenz, B. <bw...@jc...> - 2012-07-27 20:42:35

Hi-

runCA isn’t using –sync.  It uses –hold_jid to tell SGE to hold a job in queue until all other jobs with that name have completed.

One ‘feature’ of runCA that might help you out here is “useGrid=1 scriptOnGrid=0”.  This will set all the jobs up for sge, but not actually submit anything.  It’ll tell you to submit an array of jobs, then when those finish, to restart runCA.

What scheduler are you using?

b


On 7/27/12 4:33 PM, "Powers, Jason" <jp...@ex...> wrote:

Hi all,

We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We’ve got pacBioToCA working now with the system, and just today I’ve started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the “-sync” flag. Predictably, this causes downstream applications to fail.

Here is the out file:

----------------------------------------START Fri Jul 27 15:31:51 2012
/opt/bin/wgs-bin/gatekeeper  -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING  -T  -F  /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1
----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds)
numFrags = 329675
----------------------------------------START Fri Jul 27 15:32:00 2012
/opt/bin/wgs-bin/meryl  -B -C -v -m 11 -memory 128000 -threads 8 -c 0  -L 2  -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain  -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1
----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/estimate-mer-threshold  -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain  -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0  > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err
----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err
----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err
----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds)
Reset OBT mer threshold from auto to 421.
Reset OVL mer threshold from auto to 421.
----------------------------------------START Fri Jul 27 15:32:36 2012
qsub -A assembly  -cwd -N mbt_Distributed_test \
  -t 1-1 \
  -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \
  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh

Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted
DEBUG START mbt_Distributed_test: jobs 3756459
DEBUG NOSYNC mbt_Distributed_test
----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:36 2012
qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh
sh: 2: Syntax error: Unterminated quoted string
Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted
----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds)




You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward






Reset OBT mer threshold from auto to 421.
Reset OVL mer threshold from auto to 421.
----------------------------------------START Fri Jul 27 15:54:59 2012
find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list
----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:54:59 2012
/opt/bin/wgs-bin/merTrimApply \
-g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
-L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \
-l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \
> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1
----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds)
----------------------------------------START Fri Jul 27 15:55:01 2012
/opt/bin/wgs-bin/initialTrim \
-log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \
-frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
>  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \
2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err
----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds)
----------------------------------------START Fri Jul 27 15:55:05 2012
/opt/bin/wgs-bin/overlap_partition \
-g  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
-bl 20000000 \
-bs 0 \
-rs 5000000 \
-o  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap
HASH          1-    210916  REFR          1-    329675  STRINGS     210916  BASES   20000041
HASH     210917-    289851  REFR          1-    329675  STRINGS      78935  BASES   20000051
HASH     289852-    303091  REFR          1-    329675  STRINGS      13240  BASES   20002127
HASH     303092-    317562  REFR          1-    329675  STRINGS      14471  BASES   20000574
HASH     317563-    329675  REFR          1-    329675  STRINGS      12113  BASES   16625470
----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds)
Created 5 overlap jobs.  Last batch '001', last job '000005'.
----------------------------------------START Fri Jul 27 15:55:05 2012
qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \
  -t 1-5 \
  -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \
  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh

Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136
DEBUG NOSYNC ovl_Distributed_test
----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds)
----------------------------------------START Fri Jul 27 15:55:07 2012
qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh
sh: 2: Syntax error: Unterminated quoted string
Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted
----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds)



Any thoughts?

[wgs-assembler-users] runCA and qsub...no syncing?

From: Powers, J. <jp...@ex...> - 2012-07-27 20:33:34

Hi all,

We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We've got pacBioToCA working now with the system, and just today I've started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the "-sync" flag. Predictably, this causes downstream applications to fail.

Here is the out file:

----------------------------------------START Fri Jul 27 15:31:51 2012
/opt/bin/wgs-bin/gatekeeper  -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING  -T  -F  /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1
----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds)
numFrags = 329675
----------------------------------------START Fri Jul 27 15:32:00 2012
/opt/bin/wgs-bin/meryl  -B -C -v -m 11 -memory 128000 -threads 8 -c 0  -L 2  -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain  -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1
----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/estimate-mer-threshold  -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain  -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0  > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err
----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err
----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:35 2012
/opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err
----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds)
Reset OBT mer threshold from auto to 421.
Reset OVL mer threshold from auto to 421.
----------------------------------------START Fri Jul 27 15:32:36 2012
qsub -A assembly  -cwd -N mbt_Distributed_test \
  -t 1-1 \
  -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \
  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh

Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted
DEBUG START mbt_Distributed_test: jobs 3756459
DEBUG NOSYNC mbt_Distributed_test
----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:32:36 2012
qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh
sh: 2: Syntax error: Unterminated quoted string
Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted
----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds)




You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward






Reset OBT mer threshold from auto to 421.
Reset OVL mer threshold from auto to 421.
----------------------------------------START Fri Jul 27 15:54:59 2012
find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list
----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds)
----------------------------------------START Fri Jul 27 15:54:59 2012
/opt/bin/wgs-bin/merTrimApply \
-g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
-L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \
-l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \
> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1
----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds)
----------------------------------------START Fri Jul 27 15:55:01 2012
/opt/bin/wgs-bin/initialTrim \
-log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \
-frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
>  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \
2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err
----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds)
----------------------------------------START Fri Jul 27 15:55:05 2012
/opt/bin/wgs-bin/overlap_partition \
-g  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \
-bl 20000000 \
-bs 0 \
-rs 5000000 \
-o  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap
HASH          1-    210916  REFR          1-    329675  STRINGS     210916  BASES   20000041
HASH     210917-    289851  REFR          1-    329675  STRINGS      78935  BASES   20000051
HASH     289852-    303091  REFR          1-    329675  STRINGS      13240  BASES   20002127
HASH     303092-    317562  REFR          1-    329675  STRINGS      14471  BASES   20000574
HASH     317563-    329675  REFR          1-    329675  STRINGS      12113  BASES   16625470
----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds)
Created 5 overlap jobs.  Last batch '001', last job '000005'.
----------------------------------------START Fri Jul 27 15:55:05 2012
qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \
  -t 1-5 \
  -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \
  /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh

Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted
DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136
DEBUG NOSYNC ovl_Distributed_test
----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds)
----------------------------------------START Fri Jul 27 15:55:07 2012
qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh
sh: 2: Syntax error: Unterminated quoted string
Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted
----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds)



Any thoughts?

[wgs-assembler-users] huge assembly with thousands of MergeScaffoldsAggressive iterations

From: kuhl <ku...@mo...> - 2012-07-26 21:39:23

Hi Brian et al.,

I am currently running a huge assembly with CA7 (2.5Gb 30x Illumina + 454,
cgw takes 150-300Gb RAM). It is now in step 7-2 and I have just stopped cgw
at MergeScaffoldsAggressive iteration 1641 and restarted it at ckp08-2SM. I
did this also in 7-0 at iteration 2xxx. Now I am not sure, if I should
maybe rerun scaffolding without 20 kb mate pairs, which I think are
responsible for this mess. So I have two questions: 

How can I convince cgw to ignore a certain library without doing steps 0-5
again? 

Is there a rule of thumb, when MergeScaffoldsAggressive should be stopped?


In my case it looks like cgw is only very slightly progressing with each
iteration and there is one large scaffold that is growing more and more...

ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60498
ExamineUsableSEdges()- maxWeightEdge from 0 to 19 at idx 8774 out of 60498
ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60500
ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60500
ExamineUsableSEdges()- maxWeightEdge from 0 to 16 at idx 10594 out of
60500
ExamineUsableSEdges()- maxWeightEdge from 0 to 7 at idx 20348 out of 60500
ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60489
ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60489
ExamineUsableSEdges()- maxWeightEdge from 0 to 19 at idx 8773 out of 60489
ExamineUsableSEdges()- maxWeightEdge from 0 to 9 at idx 16854 out of 60489
ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60486
ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60486
ExamineUsableSEdges()- maxWeightEdge from 0 to 16 at idx 10593 out of
60486
ExamineUsableSEdges()- maxWeightEdge from 0 to 7 at idx 20428 out of 60486

Regards, Heiner

Re: [wgs-assembler-users] No Overlaps found

From: Thomas H. <tho...@un...> - 2012-07-23 19:09:18

Hi,

thanks for the quick reply. I tried a lot, including the fixes from the 
"Known Issues" section. I also did not believe that the pipeline would 
create overlaps between pacBio-pacBio. What got me wondering was that I 
ran into the same error after I removed all but one pacBio read e.g. 
from the PacBio E.coli sample data set and tried to correct this one 
read with the corresponding illumina data set. I also tried all kinds of 
shortRead settings and smaller kmer-sizes and basically the pacbio.spec 
from the sourceforge web site.

For the sample data I used 100bp raw illumina reads (phread+64) and 
2-10kb fragments from contigs of an assembly of these reads, with and 
without artificially introduced indels.

In addition I used a mapper (Shrimp2) to manually map Illumina reads 
onto my sample pacBio reads, which always produced valid mappings. I 
will try again tomorrow - a bit more organized - and log the different 
settings and scenarios to pinpoint the problem.

'til then
thanks
Thomas




Am 23.07.2012 17:31, schrieb Sergey Koren:
> Hi Thomas,
>
> As Brian mentioned, the pipeline will only overlap PacBio to Illumina (or short read data). So the no overlaps error means that no short-reads could be mapped to the PacBio reads and thus no correction can be done. This error is normally caused by either the fastq format output by the 1.3.0 SMRTportal software (which is since fixed in 1.3.1) or by short Illumina data (<64bp). The pacBioToCA wiki page lists suggested solutions to both issues under the Known Issues section:
> https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=PacBioToCA#Known_Issues
>
> Please let me know if neither one of those suggestions fixes your issue.
>
> Thanks,
> Sergey
> Bioinformatics Scientist
> NBACC
>
> On Jul 23, 2012, at 11:23 AM, Walenz, Brian wrote:
>
>> Hi, Thomas-
>>
>> The pipeline is only looking for overlaps between Illumina and PacBio, not
>> Illumina-to-Illumina or Illumina-to-PacBio.  From the overlaps it builds a
>> consensus sequence representing the PacBio read, and that is the corrected
>> read.
>>
>> Can you describe your data a bit?  How short are the Illumina?  Any changes
>> to parameters for the pipeline?
>>
>> bri
>> --
>> Brian Walenz
>> Senior Software Engineer
>> J. Craig Venter Institute
>>
>> On 7/23/12 10:13 AM, "Thomas Hackl" <tho...@un...> wrote:
>>
>>> Hello,
>>>
>>> I am currently testing the pacBioToCA pipeline on some sample data in
>>> preparation for some upcoming experiments. The pipeline always stopped
>>> with the error: "No Overlaps found". I already figured out that
>>> obviously these overlaps are required within the pacBio read library, or
>>> at least that this solves the issue.
>>>
>>> Since I do not necessarily expect overlaps in our data but still want to
>>> correct them with Illumina short reads for further analysis, I would
>>> like to know, if this is even possible with your pipeline. And also I do
>>> not really understand, why there is an overlap computation step before
>>> the error correction step?
>>>
>>> Regards
>>> Thomas
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> wgs-assembler-users mailing list
>>> wgs...@li...
>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> wgs-assembler-users mailing list
>> wgs...@li...
>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users


-- 
Thomas Hackl
Julius-Maximilians-Universität
Department of Bioinformatics
97074 Würzburg, Germany
Fon:  +49 931 - 31 86883
Mail: tho...@un...

Re: [wgs-assembler-users] No Overlaps found

From: Sergey K. <se...@um...> - 2012-07-23 15:31:26

Hi Thomas,

As Brian mentioned, the pipeline will only overlap PacBio to Illumina (or short read data). So the no overlaps error means that no short-reads could be mapped to the PacBio reads and thus no correction can be done. This error is normally caused by either the fastq format output by the 1.3.0 SMRTportal software (which is since fixed in 1.3.1) or by short Illumina data (<64bp). The pacBioToCA wiki page lists suggested solutions to both issues under the Known Issues section:
https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=PacBioToCA#Known_Issues

Please let me know if neither one of those suggestions fixes your issue. 

Thanks,
Sergey
Bioinformatics Scientist
NBACC

On Jul 23, 2012, at 11:23 AM, Walenz, Brian wrote:

> Hi, Thomas-
> 
> The pipeline is only looking for overlaps between Illumina and PacBio, not
> Illumina-to-Illumina or Illumina-to-PacBio.  From the overlaps it builds a
> consensus sequence representing the PacBio read, and that is the corrected
> read.
> 
> Can you describe your data a bit?  How short are the Illumina?  Any changes
> to parameters for the pipeline?
> 
> bri
> --
> Brian Walenz
> Senior Software Engineer
> J. Craig Venter Institute
> 
> On 7/23/12 10:13 AM, "Thomas Hackl" <tho...@un...> wrote:
> 
>> Hello,
>> 
>> I am currently testing the pacBioToCA pipeline on some sample data in
>> preparation for some upcoming experiments. The pipeline always stopped
>> with the error: "No Overlaps found". I already figured out that
>> obviously these overlaps are required within the pacBio read library, or
>> at least that this solves the issue.
>> 
>> Since I do not necessarily expect overlaps in our data but still want to
>> correct them with Illumina short reads for further analysis, I would
>> like to know, if this is even possible with your pipeline. And also I do
>> not really understand, why there is an overlap computation step before
>> the error correction step?
>> 
>> Regards
>> Thomas
>> 
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> wgs-assembler-users mailing list
>> wgs...@li...
>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
> 
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] No Overlaps found

From: Walenz, B. <bw...@jc...> - 2012-07-23 15:23:51

Hi, Thomas-

The pipeline is only looking for overlaps between Illumina and PacBio, not
Illumina-to-Illumina or Illumina-to-PacBio.  From the overlaps it builds a
consensus sequence representing the PacBio read, and that is the corrected
read.

Can you describe your data a bit?  How short are the Illumina?  Any changes
to parameters for the pipeline?

bri
--
Brian Walenz
Senior Software Engineer
J. Craig Venter Institute

On 7/23/12 10:13 AM, "Thomas Hackl" <tho...@un...> wrote:

> Hello,
> 
> I am currently testing the pacBioToCA pipeline on some sample data in
> preparation for some upcoming experiments. The pipeline always stopped
> with the error: "No Overlaps found". I already figured out that
> obviously these overlaps are required within the pacBio read library, or
> at least that this solves the issue.
> 
> Since I do not necessarily expect overlaps in our data but still want to
> correct them with Illumina short reads for further analysis, I would
> like to know, if this is even possible with your pipeline. And also I do
> not really understand, why there is an overlap computation step before
> the error correction step?
> 
> Regards
> Thomas
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

[wgs-assembler-users] No Overlaps found

From: Thomas H. <tho...@un...> - 2012-07-23 14:43:39

Hello,

I am currently testing the pacBioToCA pipeline on some sample data in 
preparation for some upcoming experiments. The pipeline always stopped 
with the error: "No Overlaps found". I already figured out that 
obviously these overlaps are required within the pacBio read library, or 
at least that this solves the issue.

Since I do not necessarily expect overlaps in our data but still want to 
correct them with Illumina short reads for further analysis, I would 
like to know, if this is even possible with your pipeline. And also I do 
not really understand, why there is an overlap computation step before 
the error correction step?

Regards
Thomas

Re: [wgs-assembler-users] Bimodal mate pair library, can I use denovo classification?

From: Ole K. T. <o.k...@bi...> - 2012-07-19 12:29:12

Hi Heiner,
I have another 454 8 kb library and an Illumina 5 kb library (plus a 3
kb 454 library).  I'm really interested in hearing your workaround.

Ole

On 19 July 2012 13:31, kuhl <ku...@mo...> wrote:
> Hi Ole,
>
> what other kind of large insert libraries do you have? I might have a
> suitable workaround for such problems.
>
> Best wishes,
>
> Heiner
>
>
>
> On Wed, 18 Jul 2012 11:02:56 +0200, Ole Kristian Tørresen
> <o.k...@bi...> wrote:
>> Hi,
>> I have a bimodal mate pair library with one peak around 3kbp and the
>> largest and real peak at 19kbp. When I include this in the assembly,
>> the scaffolding take ages and the result is not that good (larger than
>> biological probable scaffolds). If I don't include it, scaffold is
>> quite quick, but the assembly is a bit more fragmented that with it
>> included. Without I get around 20000 scaffolds, and I get 7000
>> scaffolds with it included (around 7000 with Newbler too, which
>> handles this kinds of library better than CA I think). So I would
>> rather like to include the library.
>>
>> classifyMates does not seem to be able to find innie oriented mates,
>> only outtie PE mates as far as I can see. Could I get around this in a
>> way? Pretend that the library is outtie, and search for outtie mates
>> with seperation up to 5kbp or something?
>>
>> Thank you.
>>
>> Ole
>>
>>
> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond.
> Discussions
>> will include endpoint security, mobile security and the latest in
> malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> wgs-assembler-users mailing list
>> wgs...@li...
>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] Bimodal mate pair library, can I use denovo classification?

From: kuhl <ku...@mo...> - 2012-07-19 11:51:19

Hi Ole,

what other kind of large insert libraries do you have? I might have a
suitable workaround for such problems.

Best wishes,

Heiner



On Wed, 18 Jul 2012 11:02:56 +0200, Ole Kristian Tørresen
<o.k...@bi...> wrote:
> Hi,
> I have a bimodal mate pair library with one peak around 3kbp and the
> largest and real peak at 19kbp. When I include this in the assembly,
> the scaffolding take ages and the result is not that good (larger than
> biological probable scaffolds). If I don't include it, scaffold is
> quite quick, but the assembly is a bit more fragmented that with it
> included. Without I get around 20000 scaffolds, and I get 7000
> scaffolds with it included (around 7000 with Newbler too, which
> handles this kinds of library better than CA I think). So I would
> rather like to include the library.
> 
> classifyMates does not seem to be able to find innie oriented mates,
> only outtie PE mates as far as I can see. Could I get around this in a
> way? Pretend that the library is outtie, and search for outtie mates
> with seperation up to 5kbp or something?
> 
> Thank you.
> 
> Ole
> 
>
------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond.
Discussions 
> will include endpoint security, mobile security and the latest in
malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] Bimodal mate pair library, can I use denovo classification?

From: Walenz, B. <bw...@jc...> - 2012-07-19 11:08:05

Hi, Ole-

We tried this on a plant genome for the exact same issue, with little
success.  The 3kb mates spanned repeats that classifyMates was unable to
search through, resulting in either enormous compute times.

If you want to try it, the latest in CVS, I think, allows innie again.  Be
sure to use -nosuspicious.  If not, flipping the reads works too.

Use the 'random path' search -rfs.   Instead of exhaustively searching, this
will follow N random paths through the overlap graph.

b


On 7/18/12 5:02 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote:

> Hi,
> I have a bimodal mate pair library with one peak around 3kbp and the
> largest and real peak at 19kbp. When I include this in the assembly,
> the scaffolding take ages and the result is not that good (larger than
> biological probable scaffolds). If I don't include it, scaffold is
> quite quick, but the assembly is a bit more fragmented that with it
> included. Without I get around 20000 scaffolds, and I get 7000
> scaffolds with it included (around 7000 with Newbler too, which
> handles this kinds of library better than CA I think). So I would
> rather like to include the library.
> 
> classifyMates does not seem to be able to find innie oriented mates,
> only outtie PE mates as far as I can see. Could I get around this in a
> way? Pretend that the library is outtie, and search for outtie mates
> with seperation up to 5kbp or something?
> 
> Thank you.
> 
> Ole
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

[wgs-assembler-users] Bimodal mate pair library, can I use denovo classification?

From: Ole K. T. <o.k...@bi...> - 2012-07-18 09:03:06

Hi,
I have a bimodal mate pair library with one peak around 3kbp and the
largest and real peak at 19kbp. When I include this in the assembly,
the scaffolding take ages and the result is not that good (larger than
biological probable scaffolds). If I don't include it, scaffold is
quite quick, but the assembly is a bit more fragmented that with it
included. Without I get around 20000 scaffolds, and I get 7000
scaffolds with it included (around 7000 with Newbler too, which
handles this kinds of library better than CA I think). So I would
rather like to include the library.

classifyMates does not seem to be able to find innie oriented mates,
only outtie PE mates as far as I can see. Could I get around this in a
way? Pretend that the library is outtie, and search for outtie mates
with seperation up to 5kbp or something?

Thank you.

Ole

Re: [wgs-assembler-users] Assembling with both Illum and PacBio

From: Walenz, B. <bw...@jc...> - 2012-07-17 17:53:45

Hi, Jason-

Can you send the spec and at least one of the error files from overlapper?  Is this CA7 (the release) or the CVS version?  I suspect overlapper might be exhausting memory.  The error file reports how much memory it is allocating for the large data structures.

In CA7, Illumina reads must be loaded before long reads, otherwise memory usage is higher than it should be.  In the CVS version, the assembler will check for this problem.

bri


On 7/17/12 12:37 PM, "Powers, Jason" <jp...@ex...> wrote:

Hi all,

Trying to do a hybrid assembly with PacBio and Illum short reads. Essentially this is what I am trying to do: Correct PacBio with Illumina. Assembly Corrected PacBio reads with about 5X of Illumina reads. The reason I want to add in the paired end Illumina reads is that at the end of the assembly, I would like to use amosvalidate to evaluate assembly-correctness. While you can do amosvalidate without paired end reads, it is more powerful if you can incorporate that data, so I thought by sprinkling in a low amount of Illumina reads into the assembly, I could take advantage of this. Unfortunately assembly using both consistently fails during the overlap phase. Sometimes it gets to 1-overlapper, sometimes it fails on  0-overlaptrim-overlap. But it just doesn’t want to finish.  I’ve encountered overlap failures before, and found that some nodes in the cluster I am using that seem to have problems with the installation/dependencies. However tracking the nodes here, it seems pretty scattershot.

Any thoughts on what might be happening, or alternatively, how I can assemble the pacbio reads and add in Illumina reads post-assembly for use with amosvalidate?


Thanks,
Jason

[wgs-assembler-users] Assembling with both Illum and PacBio

From: Powers, J. <jp...@ex...> - 2012-07-17 16:49:34

Hi all,

Trying to do a hybrid assembly with PacBio and Illum short reads. Essentially this is what I am trying to do: Correct PacBio with Illumina. Assembly Corrected PacBio reads with about 5X of Illumina reads. The reason I want to add in the paired end Illumina reads is that at the end of the assembly, I would like to use amosvalidate to evaluate assembly-correctness. While you can do amosvalidate without paired end reads, it is more powerful if you can incorporate that data, so I thought by sprinkling in a low amount of Illumina reads into the assembly, I could take advantage of this. Unfortunately assembly using both consistently fails during the overlap phase. Sometimes it gets to 1-overlapper, sometimes it fails on  0-overlaptrim-overlap. But it just doesn't want to finish.  I've encountered overlap failures before, and found that some nodes in the cluster I am using that seem to have problems with the installation/dependencies. However tracking the nodes here, it seems pretty scattershot.

Any thoughts on what might be happening, or alternatively, how I can assemble the pacbio reads and add in Illumina reads post-assembly for use with amosvalidate?


Thanks,
Jason