wgs-assembler-users Mailing List for Whole-Genome Shotgun Assembler (Page 14)

Brought to you by: brianwalenz, jasonmiller9704, mcschatz, skoren

wgs-assembler-users — Discussion about Celera Assembler

You can subscribe to this list here.

2012	_Jan (1)	_Feb (2)	_Mar	_Apr (29)	_May (8)	_Jun (5)	_Jul (46)	_Aug (16)	_Sep (5)	_Oct (6)	_Nov (17)	_Dec (7)
2013	_Jan (5)	_Feb (2)	_Mar (10)	_Apr (13)	_May (20)	_Jun (7)	_Jul (6)	_Aug (14)	_Sep (9)	_Oct (19)	_Nov (17)	_Dec (3)
2014	_Jan (3)	_Feb	_Mar (7)	_Apr (1)	_May (1)	_Jun (30)	_Jul (10)	_Aug (2)	_Sep (18)	_Oct (3)	_Nov (4)	_Dec (13)
2015	_Jan (27)	_Feb	_Mar (19)	_Apr (12)	_May (10)	_Jun (18)	_Jul (4)	_Aug (2)	_Sep (2)	_Oct	_Nov (1)	_Dec (9)
2016	_Jan (6)	_Feb	_Mar	_Apr	_May	_Jun	_Jul (1)	_Aug (1)	_Sep (1)	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 12 13 14 15 16 .. 19 > >> (Page 14 of 19)

[wgs-assembler-users] sequence tracking

From: Paul C. <pca...@gm...> - 2012-12-07 14:21:22

Hi all,

Is there a way in CVS version of runCA (or even version 7.0) to get a
report about what happened to each and every sequence that went into the
assembly? Thank you,

Paul

Paul Cantalupo
University of Pittsburgh

Re: [wgs-assembler-users] sweatShop.h not found

From: Walenz, B. <bw...@jc...> - 2012-12-05 19:13:12

It also seems that sourceforge anonymous CVS is still down, with no estimate for when it will return.  http://sourceforge.net/blog/category/sitestatus/

b


On 12/5/12 2:04 PM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...> wrote:

Joanna – from one of Brian’s earlier messages:

The latest bits are at

http://wgs-assembler.sf.net/wgs-20121130.tar.bz2

Includes both kmer and the assembler proper.  I compiled with no issues (on FreeBSD, gcc 4.4). Hopefully we won’t have to debug gmake and dependencies.


Note that in our case we were unable to pass CVS traffic across our network and tar downloads from ViewVC were corrupt.  I highly recommend attempting a CVS checkout.

From: Joanna Kelley [mailto:jok...@st...]
Sent: Wednesday, December 05, 2012 2:00 PM
To: Wright, Cory (CDC/OID/NCIRD) (CTR)
Cc: Walenz, Brian; wgs...@li...
Subject: Re: [wgs-assembler-users] sweatShop.h not found

Hello,

I am having the same problem with the cvs checkout, is there another place to obtain
the up-to-date source?

Thanks!
Joanna

On Fri, Nov 30, 2012 at 9:37 AM, Wright, Cory (CDC/OID/NCIRD) (CTR) <wm...@cd...> wrote:

I used all the steps described here and in the wiki article to replicate the problem, with one exception in Step 2:  we are forced to download the source tarball via ViewVC instead of checking out via the cvs command.  Is there an alternative way to obtain the source?  Thanks for any and all help.


From: Walenz, Brian [mailto:bw...@jc...]
Sent: Friday, November 30, 2012 12:06 PM
To: Wright, Cory (CDC/OID/NCIRD) (CTR); 'wgs...@li...'
Subject: Re: [wgs-assembler-users] sweatShop.h not found


Hi-

Do you have kmer/ installed?  (It’s in SVN if you need it).  https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile.  In kmer/ it needs a ‘gmake install’ to put the libraries and headers in the correct spot.

You don’t need a fresh checkout, just a fresh compile.  Delete the Linux-amd64 directory and try again.

We keep avoiding the switch over to subversion (and keep forgetting to delete the one that is there).  We really should bite the bullet and do it, if only so we can rename *.c to *.C and get rid of all those symlinks.

b



On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd... <http://wm...@cd...> > wrote:
Hi All

In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639

Unfortunately “starting over with a fresh checkout” is not an option as our organization blocks CVS traffic.  SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time.  Is it possible to update the SVN repo so we can be on our merry way?  :)  Thanks all and have a great day.

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
wgs-assembler-users mailing list
wgs...@li...
https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] sweatShop.h not found

From: Wright, C. (CDC/OID/N. (CTR) <wm...@cd...> - 2012-12-05 19:04:41

Joanna - from one of Brian's earlier messages:

The latest bits are at

http://wgs-assembler.sf.net/wgs-20121130.tar.bz2

Includes both kmer and the assembler proper.  I compiled with no issues (on FreeBSD, gcc 4.4). Hopefully we won't have to debug gmake and dependencies.

Note that in our case we were unable to pass CVS traffic across our network and tar downloads from ViewVC were corrupt.  I highly recommend attempting a CVS checkout.

From: Joanna Kelley [mailto:jok...@st...]
Sent: Wednesday, December 05, 2012 2:00 PM
To: Wright, Cory (CDC/OID/NCIRD) (CTR)
Cc: Walenz, Brian; wgs...@li...
Subject: Re: [wgs-assembler-users] sweatShop.h not found

Hello,

I am having the same problem with the cvs checkout, is there another place to obtain
the up-to-date source?

Thanks!
Joanna
On Fri, Nov 30, 2012 at 9:37 AM, Wright, Cory (CDC/OID/NCIRD) (CTR) <wm...@cd...<mailto:wm...@cd...>> wrote:
I used all the steps described here and in the wiki article to replicate the problem, with one exception in Step 2:  we are forced to download the source tarball via ViewVC instead of checking out via the cvs command.  Is there an alternative way to obtain the source?  Thanks for any and all help.

From: Walenz, Brian [mailto:bw...@jc...<mailto:bw...@jc...>]
Sent: Friday, November 30, 2012 12:06 PM
To: Wright, Cory (CDC/OID/NCIRD) (CTR); 'wgs...@li...<mailto:wgs...@li...>'
Subject: Re: [wgs-assembler-users] sweatShop.h not found

Hi-

Do you have kmer/ installed?  (It's in SVN if you need it).  https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile.  In kmer/ it needs a 'gmake install' to put the libraries and headers in the correct spot.

You don't need a fresh checkout, just a fresh compile.  Delete the Linux-amd64 directory and try again.

We keep avoiding the switch over to subversion (and keep forgetting to delete the one that is there).  We really should bite the bullet and do it, if only so we can rename *.c to *.C and get rid of all those symlinks.

b



On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...<http://wm...@cd...>> wrote:
Hi All

In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639

Unfortunately "starting over with a fresh checkout" is not an option as our organization blocks CVS traffic.  SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time.  Is it possible to update the SVN repo so we can be on our merry way?  :)  Thanks all and have a great day.

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
BUILD Helping you discover the best ways to construct your parallel projects.
http://goparallel.sourceforge.net
_______________________________________________
wgs-assembler-users mailing list
wgs...@li...<mailto:wgs...@li...>
https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users



--
Joanna L. Kelley, PhD
Department of Genetics
Stanford School of Medicine
300 Pasteur Drive
Lane Building, Room L-333
Stanford, CA 94305-5120
jok...@st...<mailto:jok...@st...>

Re: [wgs-assembler-users] sweatShop.h not found

From: Joanna K. <jok...@st...> - 2012-12-05 19:00:18

Hello,

I am having the same problem with the cvs checkout, is there another place
to obtain
the up-to-date source?

Thanks!
Joanna

On Fri, Nov 30, 2012 at 9:37 AM, Wright, Cory (CDC/OID/NCIRD) (CTR) <
wm...@cd...> wrote:

>  I used all the steps described here and in the wiki article to replicate
> the problem, with one exception in Step 2:  we are forced to download the
> source tarball via ViewVC instead of checking out via the cvs command.  Is
> there an alternative way to obtain the source?  Thanks for any and all help.
> ****
>
> ** **
>
> *From:* Walenz, Brian [mailto:bw...@jc...]
> *Sent:* Friday, November 30, 2012 12:06 PM
> *To:* Wright, Cory (CDC/OID/NCIRD) (CTR); '
> wgs...@li...'
> *Subject:* Re: [wgs-assembler-users] sweatShop.h not found****
>
> ** **
>
> Hi-
>
> Do you have kmer/ installed?  (It’s in SVN if you need it).
> https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile.
>  In kmer/ it needs a ‘gmake install’ to put the libraries and headers in
> the correct spot.
>
> You don’t need a fresh checkout, just a fresh compile.  Delete the
> Linux-amd64 directory and try again.
>
> We keep avoiding the switch over to subversion (and keep forgetting to
> delete the one that is there).  We really should bite the bullet and do it,
> if only so we can rename *.c to *.C and get rid of all those symlinks.
>
> b
>
>
>
> On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...>
> wrote:****
>
> Hi All
>
> In trying to compile the latest source code I receive the error seen here:
> http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639
>
> Unfortunately “starting over with a fresh checkout” is not an option as
> our organization blocks CVS traffic.  SVN is allowed, and I noticed that
> there is a SVN repo, but it does not appear to have been updated in some
> time.  Is it possible to update the SVN repo so we can be on our merry way?
>  J  Thanks all and have a great day.****
>
>
> ------------------------------------------------------------------------------
> Keep yourself connected to Go Parallel:
> BUILD Helping you discover the best ways to construct your parallel
> projects.
> http://goparallel.sourceforge.net
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>
>


-- 
Joanna L. Kelley, PhD
Department of Genetics
Stanford School of Medicine
300 Pasteur Drive
Lane Building, Room L-333
Stanford, CA 94305-5120
jok...@st...

Re: [wgs-assembler-users] Error: .mcidx is not a merylStream index file!

From: Walenz, B. <bw...@jc...> - 2012-11-30 23:59:14

Hi, Arjun-

Thanks for attaching the spec file.  overlapper=mer is the problem.  It, and the trimming we do on Illumina reads, are incompatible.  Use overlapper=ovl instead.  There isn't much gain from overlapper=mer anymore (it was designed for early 454 reads with lots of homopolymer errors) and it doesn't scale much past a microbe.

We (still) haven't written a pipeline for this, but our current strategy for correcting and trimming Illumina reads is at:

http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preprocessing

If you don't have outtie matepair libraries it amounts to a run of meryl, and a run of merTrim.

b


________________________________________
From: Arjun Prasad [ap...@ma...]
Sent: Friday, November 30, 2012 5:12 PM
To: wgs...@li...
Subject: [wgs-assembler-users] Error: .mcidx is not a merylStream index file!

Hi,

I've been trying to assemble some 2x250bp Illumina reads and I'm getting
the following error in 0-mertrim/*.err:

% cat miseq5.0001.err
opening gkStore '/cluster/ifs/projects/pcoat/celass/miseq5/miseq5.gkpStore'
loading mer database.
merylStreamReader()-- ERROR: /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcidx is not a merylStream index file!
merylStreamReader()-- ERROR: /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcdat is not a merylStream data file!

% ls -l /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm?.mcdat
-rw-r--r-- 1 aprasad zoo        32 Nov 30 14:50 /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcdat
-rw-r--r-- 1 aprasad zoo 486424472 Nov 30 14:50 /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm1.mcdat

I've been able to run CA on several other datasets including 454 and older
illumina sequence, so I'm guessing there's a path or format issue.

Any ideas as to what might be wrong?

Thanks,
Arjun

--
Genome Technology Branch
National Human Genome Research Institute
National Institutes of Health
5625 Fishers Lane                Phone: 301-594-9199
Room 5N-01L                        Fax: 301-435-6170
Rockville, MD 20892-9400        E-Mail: ap...@nh...

[wgs-assembler-users] Error: .mcidx is not a merylStream index file!

From: Arjun P. <ap...@ma...> - 2012-11-30 22:12:32

Attachments: miseq5.specfile

Hi,

I've been trying to assemble some 2x250bp Illumina reads and I'm getting 
the following error in 0-mertrim/*.err:

% cat miseq5.0001.err
opening gkStore '/cluster/ifs/projects/pcoat/celass/miseq5/miseq5.gkpStore'
loading mer database.
merylStreamReader()-- ERROR: /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcidx is not a merylStream index file!
merylStreamReader()-- ERROR: /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcdat is not a merylStream data file!

% ls -l /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm?.mcdat
-rw-r--r-- 1 aprasad zoo        32 Nov 30 14:50 /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm0.mcdat
-rw-r--r-- 1 aprasad zoo 486424472 Nov 30 14:50 /cluster/ifs/projects/pcoat/celass/miseq5/0-mercounts/miseq5-C-ms22-cm1.mcdat

I've been able to run CA on several other datasets including 454 and older 
illumina sequence, so I'm guessing there's a path or format issue.

Any ideas as to what might be wrong?

Thanks,
Arjun

-- 
Genome Technology Branch
National Human Genome Research Institute
National Institutes of Health
5625 Fishers Lane                Phone: 301-594-9199
Room 5N-01L                        Fax: 301-435-6170
Rockville, MD 20892-9400        E-Mail: ap...@nh...

Re: [wgs-assembler-users] sweatShop.h not found

From: Wright, C. (CDC/OID/N. (CTR) <wm...@cd...> - 2012-11-30 17:38:57

I used all the steps described here and in the wiki article to replicate the problem, with one exception in Step 2:  we are forced to download the source tarball via ViewVC instead of checking out via the cvs command.  Is there an alternative way to obtain the source?  Thanks for any and all help.

From: Walenz, Brian [mailto:bw...@jc...]
Sent: Friday, November 30, 2012 12:06 PM
To: Wright, Cory (CDC/OID/NCIRD) (CTR); 'wgs...@li...'
Subject: Re: [wgs-assembler-users] sweatShop.h not found

Hi-

Do you have kmer/ installed?  (It's in SVN if you need it).  https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile.  In kmer/ it needs a 'gmake install' to put the libraries and headers in the correct spot.

You don't need a fresh checkout, just a fresh compile.  Delete the Linux-amd64 directory and try again.

We keep avoiding the switch over to subversion (and keep forgetting to delete the one that is there).  We really should bite the bullet and do it, if only so we can rename *.c to *.C and get rid of all those symlinks.

b

On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...> wrote:
Hi All

In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639

Unfortunately "starting over with a fresh checkout" is not an option as our organization blocks CVS traffic.  SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time.  Is it possible to update the SVN repo so we can be on our merry way?  :)  Thanks all and have a great day.

Re: [wgs-assembler-users] sweatShop.h not found

From: Walenz, B. <bw...@jc...> - 2012-11-30 17:05:25

Hi-

Do you have kmer/ installed?  (It’s in SVN if you need it).  https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile.  In kmer/ it needs a ‘gmake install’ to put the libraries and headers in the correct spot.

You don’t need a fresh checkout, just a fresh compile.  Delete the Linux-amd64 directory and try again.

We keep avoiding the switch over to subversion (and keep forgetting to delete the one that is there).  We really should bite the bullet and do it, if only so we can rename *.c to *.C and get rid of all those symlinks.

b



On 11/30/12 11:50 AM, "Wright, Cory (CDC/OID/NCIRD) (CTR)" <wm...@cd...> wrote:

Hi All

In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639

Unfortunately “starting over with a fresh checkout” is not an option as our organization blocks CVS traffic.  SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time.  Is it possible to update the SVN repo so we can be on our merry way?  :)  Thanks all and have a great day.

[wgs-assembler-users] sweatShop.h not found

From: Wright, C. (CDC/OID/N. (CTR) <wm...@cd...> - 2012-11-30 16:50:41

Hi All

In trying to compile the latest source code I receive the error seen here: http://sourceforge.net/tracker/?func=detail&aid=3420389&group_id=106905&atid=645639

Unfortunately "starting over with a fresh checkout" is not an option as our organization blocks CVS traffic.  SVN is allowed, and I noticed that there is a SVN repo, but it does not appear to have been updated in some time.  Is it possible to update the SVN repo so we can be on our merry way?  :)  Thanks all and have a great day.

Re: [wgs-assembler-users] fastqToCA and fastaToCA conversion confusion

From: Walenz, B. <bw...@jc...> - 2012-11-26 15:14:22

Hi-

There are two different methods to load sequences into the assembler.  The
older method rewrites sequence/quality data into the frg output as you saw
with fastaToCA.  The newer method leaves the sequence/quality data in the
original fastq file, and gives a wrapper to the assembler.  In the wrapper
are pointers to the original fastq (along with the format of the QV and
orientation of mate pairs):

fastqQualityValues=sanger
fastqOrientation=innie
fastqMates=/tmp2/bcs03/melonFastqCorrected/paired/F7HI6DR01-corrected.fastq

Is telling the assembler you have interleaved mated reads with the Sanger
(offset=33) encoding that are 5'3' -- 3'5' orientation.

'gatekeeper -dumpinfo *gkpStore' will give a summary of the number of reads
loaded for each library.

Getting the QV format wrong, I think, will generate a ton of warnings in
gkpStore.err or gkpStore.errorLog.  I'm not sure if the reads are discarded
or 'fixed'.  In the CVS version of the assembler, 'fastqAnalyze some.fastq'
will make a decent guess at what QV encoding you have.

b

On 11/22/12 6:29 AM, "Jens Hooge" <jen...@go...> wrote:

> Hi,
> 
> I have converted a number of 454 reads in FASTQ format and Sanger reads in
> FASTA format. For the Sanger reads I have generated my own quality value file
> (as well in FASTA format).
> 
> I called the conversion routines as follows:
> 
> Sanger Library:
> ./fastaToCA -l BES_random_shear_library_reverse -s
> <pathtofasta>/BES_random_shear_library_reverse -q
> <pathtoqual>/BES_random_shear_library_reverse.qlt >
> <outpath>/BES_random_shear_library_reverse.frg
> 
> 454 Library:
> ./fastqToCA -insertsize 2834 172 -libraryname F7HI6DR01-corrected -technology
> 454 -mates <pathtofastq>/F7HI6DR01-corrected.fastq >
> <outpath>/F7HI6DR01-corrected.frg
> 
> What strikes me here, is that the conversion of my Sanger library results in
> an FRG file where the fields seq: and qlt: are filled, while the conversion of
> my 454 library doesn't. This especially confusion to me because when I dump a
> FRG file "after" running the assembly using the command
> 
> gatekeeper -dumpfrg -allreads assembly.gkpStore > asm.frgs
> 
> The frg file is filled with the assumably correct sequences and quality
> values, even though I only used converted FASTQ files for the assembly. Is
> this an expected behaviour?
> 
> Thanks in advance for any help on that matter.
> 
> 
> fastaToCA Conversion Result: please find BES_random_shear_library_reverse.frg

[wgs-assembler-users] fastqToCA and fastaToCA conversion confusion

From: Jens H. <jen...@go...> - 2012-11-22 11:30:48

Attachments: BES_random_shear_library_reverse.frg F7HI6DR01-corrected.frg

Hi,

I have converted a number of 454 reads in FASTQ format and Sanger reads in FASTA format. For the Sanger reads I have generated my own quality value file (as well in FASTA format).

I called the conversion routines as follows:

Sanger Library:
./fastaToCA -l BES_random_shear_library_reverse -s <pathtofasta>/BES_random_shear_library_reverse -q <pathtoqual>/BES_random_shear_library_reverse.qlt > <outpath>/BES_random_shear_library_reverse.frg

454 Library:
./fastqToCA -insertsize 2834 172 -libraryname F7HI6DR01-corrected -technology 454 -mates <pathtofastq>/F7HI6DR01-corrected.fastq > <outpath>/F7HI6DR01-corrected.frg

What strikes me here, is that the conversion of my Sanger library results in an FRG file where the fields seq: and qlt: are filled, while the conversion of my 454 library doesn't. This especially confusion to me because when I dump a FRG file "after" running the assembly using the command

gatekeeper -dumpfrg -allreads assembly.gkpStore > asm.frgs

The frg file is filled with the assumably correct sequences and quality values, even though I only used converted FASTQ files for the assembly. Is this an expected behaviour?

Thanks in advance for any help on that matter.


fastaToCA Conversion Result: please find BES_random_shear_library_reverse.frg

Re: [wgs-assembler-users] Possibility to improve assembly result

From: Walenz, B. <bw...@jc...> - 2012-11-20 12:41:02

Hi, Xueping-

That’s frustrating!  Can you send along the qc report?

We’re just finishing up a repetitive fish.  We had some success changing the ‘astat’ cutoffs for labeling unitigs unique/not-unique.  We used astatHighBound=0 and astatLowBound=-20 based on a plot of unitig length vs astat (numbers came from 5-consensus-coverage-stat, but I didn’t do the analysis and would have to pester someone to get any scripts to pass along).  If there are large degenerate contigs, this will help by labeling them as unique and letting them be used for scaffolds.

Or, it’s possible that unitig construction was poor.  I’ll have to think about how to measure this — are they small because of bad trimming, low coverage, biased coverage or repeat boundaries?  The signal for all of these looks basically the same, but the resolution is quite different.

Sorry I’m not much help yet.

b



On 11/20/12 6:05 AM, "Quan, Xueping" <x....@im...> wrote:

Dear All

I have a large plant genome (3.5Gb in size) with high repeat content (more than 60%). The sequencing data I got are about 45x Illumina paired-end and mate pair data (after data cleaning), and 0.5x 454 mate pair data. I have finished the assembly using celera. However, the coverage of contig (600mb) and scaffold sequences (660mb) for the genome is very low. Most of the unitigs (about 5Gb) sequences are  failed to be combined into any scaffold). Below is my spec file, could anyone help to give suggestion about how to improve the assembly:

"
utgGraphErrorRate=0.03  # bogart use utgGraphErrorRate, utgGraphErrorLimit, utgMergeErrorRate, utgMergeErrorLimit
utgGraphErrorLimit=3.25  #
utgMergeErrorRate=0.045
utgMergeErrorLimit=5.25
ovlErrorRate=0.04 # Larger than utg to allow for correction.
cnsErrorRate=0.08 # Larger than utg to avoid occasional consensus failures
cgwErrorRate=0.10 # Larger than utg to allow contig merges across high-error ends
gkpAllowInefficientStorage=1

#
frgMinLen=64 # fragment shorter than this length are not loaded into the assembler
ovlMinLen=40 # overlaps shorter than this length are not computed
#
merSize =22 # default=22; use lower to combine across heterozygosity, higher to separate near-identical repeat copies
overlapper=ovl # the mer overlapper for 454-like data is insensitive to homopolymer problems but requires more RAM and disk

#UNITIGGER configuration
unitigger = bogart
batMemory=650
utgBubblePopping = 1
batThreads=64

# utgGenomeSize = 3.5gb
#
#  MERYL calculates K-mer seeds
merylMemory   = 512000
merylThreads    = 32
#
#  OVERLAPPER calculates overlaps
ovlHashBits=24
ovlHashBlockLength=700000000
ovlThreads          = 2
ovlConcurrency      = 32
ovlRefBlockSize  = 320000000
#
#  OVERLAP STORE build the database
ovlStoreMemory = 109210 # Mbp

# ERROR CORRECTION not applied to overlaps
doFragmentCorrection=0

# Scafolder

# CONSENSUS configuration
cnsConcurrency   = 64

L1_GAIIx.frg
L2_GAIIx.frg
L3_GAIIx.frg
L4_GAIIx.frg
L5_GAIIx.frg
L6_GAIIx.frg
L7_GAIIx.frg
L8_GAIIx.frg
L3_HiSeq.frg
L4_HiSeq.frg
L5_HiSeq.frg
L6_HiSeq.frg
L1_454.frg
L2_454.frg
L3_454.frg
L4_454.frg
L5_454.frg
L6_454.frg
L7_454.frg
L8_454.frg
L9_454.frg
L10_454.frg
"

Thanks very much!

Xueping Quan

Imperial College London
Tel: +44(0)207 594 17 80

Re: [wgs-assembler-users] ILL Errors thrown by gatekeeper

From: Walenz, B. <bw...@jc...> - 2012-11-20 12:05:59

Hi, Jens-

First, I’d suggest upgrading to the cvs version (http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_out_and_Compile). Its a bit more work to install, but you’ll get a ton of fixes and optimizations. Plus, the answers below are for this version.

Answering your questions out of order:

The only requirement for a ‘library’ of reads is that they form a normal insert size distribution. It’s probably better to treat each 454 run as its own library unless you know for sure they came from the same library construction. You can mix paired- and single-ended reads in the same library. Note that the fastqToCA usage changed slightly from 7.0 to the cvs version. Use ‘-reads X.fastq’ and ‘-mates A1.fastq,B1.fastq’ to load SE and PE reads. Note that ‘-mates Y.fastq’ will expect mate pair reads interleaved.

The maximum read length is set at compile time, in file AS_global.h, variable AS_READ_MAX_NORMAL_LEN_BITS. The default is 11 (=2047 bases). For PacBio, 13-15 (=8-32kbp) has been used. 16 has been reported to not work.

For reads longer than the maximum, it depends on the fastqToCA technology. For tech ‘illumina’, the reads are truncated to the maximum ‘packed’ size (160bp by default). The ‘packed’ format is slightly more efficient storage designed for lots of short reads. The other technologies will also truncate reads, but to the NORMAL_LEN_BITS size.

‘gatekeeper –dumpinfo X.gkpStore’ will generate a table of the reads loaded, number mated, number deleted, and total bases.

The ‘not a sequence start line’ errors, I think, were caused by the fastq reader only partially reading a sequence line. On the next input, it was expecting to find “@name” but found bases/qvs instead. In any case, it’s fixed in the cvs version.

Just curious - are your reads longer than 2kbp real? I’ve seen these in the past, and they were mostly garbage.

On 11/20/12 5:33 AM, "Jens Hooge" <jen...@go...> wrote:

Hi,

I'm relatively new to NG and its tools, but at the moment I'm trying to run an assembly of about 70 single- and paired end 454 reads in FASTQ format, using the wgs-7.0-assembler. The version I've been using is the one from
http://sourceforge.net/apps/mediawik...itle=Main_Page <http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Main_Page>

I have converted my FASTQ files to FRG files using CABOGs fastqTOCA routine using a different library name for each FASTQ file. When I run the actual assembly though with runCA. I get an error message in melonAssembly.gkpStore.err.

GKP finished with 11339450 alerts or errors:
11338139 # ILL Error: not a sequence start line.
1292 # ILL Error: not a quality start line.
19 # LIB Alert: suspicious mean and standard deviation; reset stddev to 0.10 * mean.

To me this looked as if it was a problem with the format of my FASTQ files, so I ran a script to validate on format consistency of the files which resulted in no errors. Some of my reads are longer than 2047 bp and I have the feeling that the bug fix stated at http://sourceforge.net/apps/mediawik..._Release_Notes <http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Version_7.0_Release_Notes> under Bug fixes is not yet fixed in the version I'm using.

Quote:
"Gatekeeper: Numerous problems with reads longer than the maximum allowed (2047bp) and reads of very specific lengths were discovered and fixed. All of these resulted in gatekeeper crashing."

Even though gatekeeper doesn't crash, I would expect about 25 million reads to be processed by CABOG, however while running the assembly I get a stdout print message saying "numFrags = 14499910". To me this looks like not all reads are being used for the assembly. If I add the number of ILL Errors, it comes suspiciously close to my expected number of reads though, which makes me think that CABOG just get's rid of the reads which are longer than the maximally allowed length of 2047 bp.

My questions would be:

What happens with reads that are longer than the maximally allowed length?
Are those reads ignored or clipped to the maximum read length?
Is there a way to adjust the maximum read length, to make CABOG use those reads in the assembly as well?
Does every FASTQ file have to be added to a different gatekeeper library, or is it enough to put single ended and paired ended reads into their respective libraries?

I would be very grateful if anyone could help me out.

Ciao,
Jens

[wgs-assembler-users] Possibility to improve assembly result

From: Quan, X. <x....@im...> - 2012-11-20 11:06:10

Dear All

I have a large plant genome (3.5Gb in size) with high repeat content (more than 60%). The sequencing data I got are about 45x Illumina paired-end and mate pair data (after data cleaning), and 0.5x 454 mate pair data. I have finished the assembly using celera. However, the coverage of contig (600mb) and scaffold sequences (660mb) for the genome is very low. Most of the unitigs (about 5Gb) sequences are  failed to be combined into any scaffold). Below is my spec file, could anyone help to give suggestion about how to improve the assembly:

"
utgGraphErrorRate=0.03  # bogart use utgGraphErrorRate, utgGraphErrorLimit, utgMergeErrorRate, utgMergeErrorLimit
utgGraphErrorLimit=3.25  #
utgMergeErrorRate=0.045
utgMergeErrorLimit=5.25
ovlErrorRate=0.04 # Larger than utg to allow for correction.
cnsErrorRate=0.08 # Larger than utg to avoid occasional consensus failures
cgwErrorRate=0.10 # Larger than utg to allow contig merges across high-error ends
gkpAllowInefficientStorage=1

#
frgMinLen=64 # fragment shorter than this length are not loaded into the assembler
ovlMinLen=40 # overlaps shorter than this length are not computed
#
merSize =22 # default=22; use lower to combine across heterozygosity, higher to separate near-identical repeat copies
overlapper=ovl # the mer overlapper for 454-like data is insensitive to homopolymer problems but requires more RAM and disk

#UNITIGGER configuration
unitigger = bogart
batMemory=650
utgBubblePopping = 1
batThreads=64

# utgGenomeSize = 3.5gb
#
#  MERYL calculates K-mer seeds
merylMemory   = 512000
merylThreads    = 32
#
#  OVERLAPPER calculates overlaps
ovlHashBits=24
ovlHashBlockLength=700000000
ovlThreads          = 2
ovlConcurrency      = 32
ovlRefBlockSize  = 320000000
#
#  OVERLAP STORE build the database
ovlStoreMemory = 109210 # Mbp

# ERROR CORRECTION not applied to overlaps
doFragmentCorrection=0

# Scafolder

# CONSENSUS configuration
cnsConcurrency   = 64

L1_GAIIx.frg
L2_GAIIx.frg
L3_GAIIx.frg
L4_GAIIx.frg
L5_GAIIx.frg
L6_GAIIx.frg
L7_GAIIx.frg
L8_GAIIx.frg
L3_HiSeq.frg
L4_HiSeq.frg
L5_HiSeq.frg
L6_HiSeq.frg
L1_454.frg
L2_454.frg
L3_454.frg
L4_454.frg
L5_454.frg
L6_454.frg
L7_454.frg
L8_454.frg
L9_454.frg
L10_454.frg
"

Thanks very much!

Xueping Quan

Imperial College London
Tel: +44(0)207 594 17 80

[wgs-assembler-users] ILL Errors thrown by gatekeeper

From: Jens H. <jen...@go...> - 2012-11-20 10:35:26

Hi,

To me this looked as if it was a problem with the format of my FASTQ files, so I ran a script to validate on format consistency of the files which resulted in no errors. Some of my reads are longer than 2047 bp and I have the feeling that the bug fix stated at http://sourceforge.net/apps/mediawik..._Release_Notes under Bug fixes is not yet fixed in the version I'm using.

Quote:
"Gatekeeper: Numerous problems with reads longer than the maximum allowed (2047bp) and reads of very specific lengths were discovered and fixed. All of these resulted in gatekeeper crashing."

My questions would be:

I would be very grateful if anyone could help me out.

Ciao,
Jens

[wgs-assembler-users] Failusre to resume extendClearRanges job

From: Quan, X. <x....@im...> - 2012-11-10 19:57:50

Dear All

My celera failed in the extendClearRanges stage. The job extendClearRanges-scaffold.0017585.sh failed with the following error:
"examining gap 2089 from lcontig 252777756 (orient: F, pos: 1843.074083, 6837.074083) to rcontig 819682 (orient: R, pos: 8597.736468, 6893.736468), size: 56.6
62385
ReplaceEndUnitigInContig()-- contig 252777756 unitig 210835 isLeft(0)
contig 252777756
len 4994
(::::sequences)
data.unitig_coverage_stat 0.000000
data.unitig_microhet_prob 1.000000
data.unitig_status        X
data.unitig_unique_rept   X
data.contig_status        U
data.num_frags            1271
data.num_unitigs          2
insertMultiAlign()-- ERROR: multialign 252777756 C has invalid fragment/unitig layout (length 4992) -- exceeds bounds of consensus sequence (length 4994).
extendClearRanges: MultiAlignStore.C:327: void MultiAlignStore::insertMultiAlign(MultiAlignT*, bool, bool): Assertion `GetMultiAlignLength(ma) <= GetMultiAli
gnLength(ma, true)' failed.
/ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/7-1-ECR/extendClearRanges-scaffold.0017585.sh: line 18: 116736 Aborted                 /apps/wgs/
2012-11-02//Linux-amd64/bin/extendClearRanges -g /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.gkpStore -t /ax3-savolainenlab/analysis/Pl
ants/Genomics/Celera/fullrun1/Howea.tigStore -n 16 -c Howea -b 0017585 -e 0051439 -i 1 -S 3936
----------------------------------------END Thu Nov  8 15:06:42 2012 (4402 seconds)
----------------------------------------START Thu Nov  8 15:06:42 2012
/ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/7-1-ECR/extendClearRanges-scaffold.0051443.sh
----------------------------------------END Thu Nov  8 15:06:42 2012 (0 seconds)
ERROR: Failed with signal HUP (1)
================================================================================

runCA failed.

----------------------------------------
Stack trace:

 at /apps/wgs/2012-11-02//bin/runCA line 1391
        main::caFailure('extendClearRanges failed', '/ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/7...') called at /apps/wgs/2012-11-02//bin/r
unCA line 5111
        main::eCR('7-1-ECR', undef, 1) called at /apps/wgs/2012-11-02//bin/runCA line 5206
        main::scaffolder() called at /apps/wgs/2012-11-02//bin/runCA line 6146

----------------------------------------
Last few lines of the relevant log file (/ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/7-1-ECR/extendClearRanges-scaffold.0051443.err):


----------------------------------------
Failure message:

extendClearRanges failed
"
So the problem is that it didn't success and didn't produce the *.ckp.17 file needed by the following extendClearRanges-scaffold.0051443.sh. But it store a success information in some log file (I don't know which log file) so I can not re-start the extendClearRanges-scaffold.0017585.sh or skip it to the next job. 

I will really appreciate it for any help!


Thanks very much! 

Dr. Xueping Quan
Imperial College London

Re: [wgs-assembler-users] Consensus and extendClearRanges failure

From: Walenz, B. <bw...@jc...> - 2012-11-07 17:57:26

Hi, Xueping-

The consensus ‘errors’ aren’t a problem. They’re detected and resolved by utgcnsfix. The logs are in 5-consensus/consensus-fix.out if you want to see them. We need to do a bit of work on the pipeline to clean this up, but its low priority.

The ECR failure has also been reported in CGW. We’ll need an example (and some time) to figure out what’s going wrong. The fix on the wikipage is a little out of date. A simpler method is to skip the entire scaffold, or the gap, that is causing the problem. In the shell script, add either:

-s <scaffold-id>
-S <gap-number>

to the extendClearRanges command. If you can figure out which gap number it is, use that, otherwise skip the whole scaffold. Gap numbers will change based on the –b/-e scaffold ranges — meaning you can’t run just this scaffold to test which gap is failing.

Can you send the full log (to me, no need to send to the list)?

FYI,

usage: extendClearRanges [opts] -c ckpName -n ckpNumber -g gkpStore

-c ckpName Use ckpName as the checkpoint name
-n ckpNumber The checkpoint to use
-g gkpStore The gatekeeper store

-C gap# Start at a specific gap number
-b scafBeg Begin at a specific scaffold
-e scafEnd End after a specific scaffold (INCLUSIVE)

-o scafIID Process only this scaffold
-s scafIID Skip this scaffold
-O gap# Process only this gap
-S gap# Skip this gap

-i iterNum The iteration of ECR; either 1 or 2

-load Load gkpStore into memory

-V Enable VERBOSE_MULTIALIGN for debugging

On 11/7/12 4:50 AM, "Quan, Xueping" <x....@im...> wrote:

Dear All

My hybrid assembly (with about 131GB Illumina reads and 2GB 454 reads). Now it failed at the extendClearRanges stage. But when I checked all the stage. I found that in the 5-consensus/ folder, there are 110 jobs, only 60 of them succeed (with both err and success output file). Below is the last few lines of the job61.err
"NumColumnsInUnitigs = 261474722
NumGapsInUnitigs = 280587
NumRunsOfGapsInUnitigReads = 11234803
NumColumnsInContigs = 0
NumGapsInContigs = 0
NumRunsOfGapsInContigReads = 0
NumAAMismatches = 0
NumVARRecords = 0
NumVARStringsWithFlankingGaps = 0
NumUnitigRetrySuccess = 0

WARNING: Total number of unitig failures = 1

Consensus did NOT finish successfully."

Should I re-run the consensus and how can to make the failed consensus jobs succeed?

The failure information in file extendClearRanges-scaffold.0017585.err is
"FRG type R ident 757459763 container 94540021 parent 94540021 hanginsertMultiAlign()-- ERROR: multialign 636535 C has invalid fragment/unitig layout (lengt
h 2214) -- exceeds bounds of consensus sequence (length 2215).
extendClearRanges: MultiAlignStore.C:327: void MultiAlignStore::insertMultiAlign(MultiAlignT*, bool, bool): Assertion `GetMultiAlignLength(ma) <= GetMultiAli
gnLength(ma, true)' failed.
57 0 position 2106 2185
FRG type R ident 757459764 container 94540021 parent 94540021 hang 57 0 position 2106 2185
FRG type R ident 930433904 container 94540021 parent 94540021 hang 61 0 position 2185 2110
UTG type X ident 636535 position 0 2214 num_instances 0"

Can I follow this instruction to restart it
http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Extend_clear_ranges_failure?

Thanks very much!

Dr. Xueping Quan

Imperial College London

[wgs-assembler-users] Consensus and extendClearRanges failure

From: Quan, X. <x....@im...> - 2012-11-07 09:51:18

Dear All

My hybrid assembly (with about 131GB Illumina reads and 2GB 454 reads).  Now it failed at the extendClearRanges stage. But when I checked all the stage. I found that in the 5-consensus/ folder, there are 110 jobs, only 60 of them succeed (with both err and success output file). Below is the last few lines of the job61.err
"NumColumnsInUnitigs             = 261474722
NumGapsInUnitigs                = 280587
NumRunsOfGapsInUnitigReads      = 11234803
NumColumnsInContigs             = 0
NumGapsInContigs                = 0
NumRunsOfGapsInContigReads      = 0
NumAAMismatches                 = 0
NumVARRecords                   = 0
NumVARStringsWithFlankingGaps   = 0
NumUnitigRetrySuccess           = 0

WARNING:  Total number of unitig failures = 1

Consensus did NOT finish successfully."

Should I re-run the consensus and how can to make the failed consensus jobs succeed?

The failure information in file extendClearRanges-scaffold.0017585.err is
"FRG type R ident 757459763 container  94540021 parent  94540021 hanginsertMultiAlign()-- ERROR: multialign 636535 C has invalid fragment/unitig layout (lengt
h 2214) -- exceeds bounds of consensus sequence (length 2215).
extendClearRanges: MultiAlignStore.C:327: void MultiAlignStore::insertMultiAlign(MultiAlignT*, bool, bool): Assertion `GetMultiAlignLength(ma) <= GetMultiAli
gnLength(ma, true)' failed.
     57      0 position   2106   2185
FRG type R ident 757459764 container  94540021 parent  94540021 hang     57      0 position   2106   2185
FRG type R ident 930433904 container  94540021 parent  94540021 hang     61      0 position   2185   2110
UTG type X ident    636535 position      0   2214 num_instances 0"

Can I follow this instruction to restart it
http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Extend_clear_ranges_failure?

Thanks very much!






Dr. Xueping Quan

Imperial College London

Re: [wgs-assembler-users] Unitigger output

From: Walenz, B. <bw...@jc...> - 2012-11-02 19:23:48

On 11/1/12 2:07 PM, "Ole Kristian Tørresen" <o.k...@bi...> wrote:

> On 1 November 2012 17:42, Quan, Xueping <x....@im...> wrote:
>> My assembly is now in the unitigger stage. In the 4-unitigger/, I saw three
>> files:
>> 3.5M 2012-10-30 03:34 HX.001.bestoverlapgraph.log
>> 12G 2012-10-29 11:41 HX.fragmentInfo
>> 75K 2012-10-29 22:32 unitigger.err
>> 
>> The bestoverlapgraph.log stopped updating for nearly two days now while the
>> unitigger (bogart) keep running occupying 100%cpu and 624gb memory but
>> producing no result.
>> Below is the last few lines of the HX.001.bestoverlapgraph.log
>> "BestOverlapGraph()-- frag 1005759619 is suspicious (174 overlaps).
>> BestOverlapGraph()-- frag 1005759692 is suspicious (217 overlaps).
>> BestOverlapGraph()-- frag 1005759967 is suspicious (219 overlaps).
>> BestOverlapGraph()-- frag 1005760214 is suspicious (22 overlaps).
>> BestOverlapGraph()-- fra"
>> Was there something going wrong and bogart not really running (though seems
>> running in top command) or it is working but not no output yet. If it is the
>> second case, how long usually unitigger take to finish or output further
>> result?
> 
> I'm not completely certain, but I think it can run for a bit some
> times. Bogart can use quite a long time, a couple of weeks depending
> on your amount of data and genome. It seems that you are using CA 7.0,
> if you use the CVS version you can take advantage of all those 64 CPUs
> you have, and complete the bogart stage much, much quicker than if you
> use only one CPU (which is the case for CA 7.0).
> 
> Ole

The step after this is to examine those 625gb of overlaps and pick out the
best for each end of each read.  There isn't any logging until it finishes
this, at which time it will dump the overlaps to 'best.contains',
'best.edges', and 'best.singletons' files.  I don't remember this taking
days to finish though.

As for speed, on a large 3gb fish with ~ 1.5 billion reads, bogart took
around 10 days.  With the parallelization it was down to less than 2.

If you do choose to restart with the CVS version, you can build the overlap
graph on disk, then load it for the unitig computation.  Building the
overlap graph is LOTS of I/O and is the slower part.  Add option '-create'
to the bogart command.  It will stop after the graph is built.  Then you can
restart bogart (without -create) to do the computation.

b

Re: [wgs-assembler-users] Unitigger output

From: Ole K. T. <o.k...@bi...> - 2012-11-01 18:31:44

On 1 November 2012 17:42, Quan, Xueping <x....@im...> wrote:
> My assembly is now in the unitigger stage. In the 4-unitigger/, I saw three
> files:
> 3.5M 2012-10-30 03:34 HX.001.bestoverlapgraph.log
> 12G 2012-10-29 11:41 HX.fragmentInfo
> 75K 2012-10-29 22:32 unitigger.err
>
> The bestoverlapgraph.log stopped updating for nearly two days now while the
> unitigger (bogart) keep running occupying 100%cpu and 624gb memory but
> producing no result.
> Below is the last few lines of the HX.001.bestoverlapgraph.log
> "BestOverlapGraph()-- frag 1005759619 is suspicious (174 overlaps).
> BestOverlapGraph()-- frag 1005759692 is suspicious (217 overlaps).
> BestOverlapGraph()-- frag 1005759967 is suspicious (219 overlaps).
> BestOverlapGraph()-- frag 1005760214 is suspicious (22 overlaps).
> BestOverlapGraph()-- fra"
> Was there something going wrong and bogart not really running (though seems
> running in top command) or it is working but not no output yet. If it is the
> second case, how long usually unitigger take to finish or output further
> result?

I'm not completely certain, but I think it can run for a bit some
times. Bogart can use quite a long time, a couple of weeks depending
on your amount of data and genome. It seems that you are using CA 7.0,
if you use the CVS version you can take advantage of all those 64 CPUs
you have, and complete the bogart stage much, much quicker than if you
use only one CPU (which is the case for CA 7.0).

Ole

>
> Thanks very much!
>
>
> Xueping Quan
>
> Imperial College London
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>

[wgs-assembler-users] Unitigger output

From: Quan, X. <x....@im...> - 2012-11-01 16:43:54

My assembly is now in the unitigger stage. In the 4-unitigger/, I saw three files:
3.5M 2012-10-30 03:34 HX.001.bestoverlapgraph.log
12G 2012-10-29 11:41 HX.fragmentInfo
75K 2012-10-29 22:32 unitigger.err

The bestoverlapgraph.log stopped updating for nearly two days now while the unitigger (bogart) keep running occupying 100%cpu and 624gb memory but producing no result.
Below is the last few lines of the HX.001.bestoverlapgraph.log
"BestOverlapGraph()-- frag 1005759619 is suspicious (174 overlaps).
BestOverlapGraph()-- frag 1005759692 is suspicious (217 overlaps).
BestOverlapGraph()-- frag 1005759967 is suspicious (219 overlaps).
BestOverlapGraph()-- frag 1005760214 is suspicious (22 overlaps).
BestOverlapGraph()-- fra"
Was there something going wrong and bogart not really running (though seems running in top command) or it is working but not no output yet. If it is the second case, how long usually unitigger take to finish or output further result?

Thanks very much!


Xueping Quan

Imperial College London

Re: [wgs-assembler-users] ovlstore our of memory

From: Walenz, B. <bw...@jc...> - 2012-10-23 17:50:24

Hi-

The quick answer is to use a much much lower memory limit, like 8192 (8gb). There is no gain to using more memory here. It seems that the algorithm that partitions the data — already known to be be not well balanced with mixes of short and long reads — gets more imbalanced with larger memory sizes.

If you still have the *BUILDING store, you can take the size of it, divide by 512 for an excellent memory limit.

The store build, unfortunately, must restart from scratch; remove Howea.ovlStore.BUILDING then restart runCA.

On 10/23/12 6:38 AM, "Quan, Xueping" <x....@im...> wrote:

I am running my job to assembly Illumina and 454 reads. After running runCA 7.0 my job was killed in the ovlstore stage due to out of memory.
"/apps/wgs/7.0/wgs-7.0/Linux-amd64/bin/overlapStore -c /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.BUILDING -g /ax3-savolaine
nlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.gkpStore -i 0 -M 655360 -L /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.li
st > /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.err 2>&1
=>> PBS: job killed: mem 1014599844kb exceeded limit 1006632960kb
"

Is it possible to set a memory limit which ovlstore will not exceed? "ovlStoreMemory" is currently set as 655360 and seems doesn't work.
And am I able to resume the job from the ovlstore stage rather than run from the beginning? It really took very long time to finish the overlapper stage.

Thanks very much!

Dr. Xueping Quan
Imperial College London

[wgs-assembler-users] ovlstore our of memory

From: Quan, X. <x....@im...> - 2012-10-23 10:39:18

Hi

I am running my job to assembly Illumina and 454 reads. After running runCA 7.0 my job was killed in the ovlstore stage due to out of memory.
"/apps/wgs/7.0/wgs-7.0/Linux-amd64/bin/overlapStore  -c /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.BUILDING  -g /ax3-savolaine
nlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.gkpStore  -i 0 -M 655360 -L /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.li
st  > /ax3-savolainenlab/analysis/Plants/Genomics/Celera/fullrun1/Howea.ovlStore.err 2>&1
=>> PBS: job killed: mem 1014599844kb exceeded limit 1006632960kb
"

Is it possible to set a memory limit which ovlstore will not exceed? "ovlStoreMemory" is currently set as 655360 and seems doesn't work.
And am I able to resume the job from the ovlstore stage rather than run from the beginning? It really took very long time to finish the overlapper stage.

Thanks very much!



Dr. Xueping Quan
Imperial College London

[wgs-assembler-users] Large genome hybrid 454 Illumina assembly- excessive run time

From: Audrey N. <aud...@ho...> - 2012-10-22 18:28:59



Hi
wgs-users,

I’m trying
to generate hybrid assembly with Illumina and 454 reads for a 520Mb estimated
genome. Illumina are 108bp Paired-end sequences and 454 are shotgun and 6Kb Paired-end
sequences. 

We already used
runCA 7.0 on a subset of these same sequences (including the whole Illumina dataset)
and we obtained a complete assembly with a N50 consistent with the partial
dataset used.

The current
“big” assembly is working with 454 shotgun short and long reads,  5 000 000 000 bases; 454 PE, 500 000 000
bases and Illumina PE, 10 000 000 000 bases giving a total of 16 603 061 677
bases for 81 460 599 reads.

I
understood that ovlHashBits, ovlHashBlockLength and
ovlRefBlockSize were critical settings, as set in the below specfile, this
runCA started on 4th October on a Silicon Graphics UV 100
with 1TB RAM, 64 CPU and is still running (on 22nd October). A total
of 14 overlap jobs were created, 8 jobs were done in almost 2 days… 2 others
were then completed on the 14th  and 17th October, and the 4
remaining are still in working process… Do you think this is an acceptable run
time?!  This is just the 0-overlaptrim
process! And I know there is still much to do… What
parameters would you suggest with such inputs? Are there any other important
settings to consider to optimize the process?

I already
tried to change some parameters settings but today I hesitate to abort this run
without knowing exactly what to do to improve the run time. 

You will
find my specfile at the end with the output I got to date.

Thanks,

Audrey
Nisole.

#____________________________________________________________

#20121004

# Spec file 

# Sequences from 454 Titanium technology and Illumina

#_____________________________________________________________

#  ERROR Rates

utgErrorRate = 0.03

utgErrorLimit = 2.5 

ovlErrorRate=0.06 


cnsErrorRate=0.06 

cgwErrorRate=0.10 

# Minimum Fragment Length and Minimum Overlap Length

frgMinLen = 64 

ovlMinLen = 40 

# OVERLAPPER

overlapper = ovl

obtOverlapper = ovl 


ovlOverlapper = ovl 


ovlStoreMemory = 100000 

saveOverlaps = 1 

merSize = 22    

# OVL Ovelapper

ovlThreads = 6 

ovlConcurrency = 9 

ovlHashBits = 28 

ovlHashBlockLength = 1200000000 

ovlRefBlockSize = 600000000 

# MERYL calculates K-mer seeds

merylMemory = 200000 

merylThreads = 
24 

# ERROR CORRECTION applied to overlaps

frgCorrBatchSize = 200000 

frgCorrThreads = 6 

frgCorrConcurrency = 9 

# UNITIGGER

unitigger = bog 

#utgGenomeSize = 520 

# SCAFFOLDER

computeInsertSize = 0 

# CONSENSUS

cnsConcurrency = 2 


# Terminator 

closureOverlaps = 0 

closurePlacement = 2 

createACE = 0 

#----------------------------------------------------

# FRG files

#----------------------------------------------------

#454 Shotguns, 22 fichiers sff

/project/…/frg/454shotgun.frg

#Paired-ends, 4 fichiers sff + 2 fichiers fastq

/project/…/frg/454Pairend.frg

/project/…/sequences/frg/S1.frg

#



I got this
output :

runCA -d Budworm_wgs_20121004 –p budworm_20121004 -s
specfile_Budworm_20121004

----------------------------------------START Thu
Oct  4 11:13:39 2012

/prg/wgs/7.0/Linux-amd64/bin/gatekeeper -o /…/wgs-assembly/Budworm_wgs_20121004/budworm_20121004.gkpStore.BUILDING
-T -F /project/…/frg/Budworm_shotgun.frg /project/…/frg/Budworm_Pairend.frg /project/…/frg/S1.frg
> /…/wgs-assembly/Budworm_wgs_20121004/budworm_20121004.gkpStore.err 2>&1

----------------------------------------END Thu
Oct  4 11:35:38 2012 (1319 seconds)

numFrags = 81460599

----------------------------------------START Thu
Oct  4 11:35:40 2012

/prg/wgs/7.0/Linux-amd64/bin/meryl  -B -C -v -m 22 -memory 200000 –threads 24 -c
0  -L 2 
-s /…/wgs-assembly/Budworm_wgs_20121004/budworm_20121004.gkpStore:chain
–o /…/wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0
> /…/wgs-assembly/Budworm_wgs_20121004/0-mercounts/meryl.err 2>&1

----------------------------------------END Thu
Oct  4 13:10:59 2012 (5719 seconds)

----------------------------------------START Thu
Oct  4 13:10:59 2012

/prg/wgs/7.0/Linux-amd64/bin/estimate-mer-threshold  -g /…/wgs-assembly/Budworm_wgs_20121004/budworm_20121004.gkpStore:chain
–m /…/wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0
> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0.estMerThresh.out
2> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0.estMerThresh.err

----------------------------------------END Thu
Oct  4 13:10:59 2012 (0 seconds)

----------------------------------------START Thu
Oct  4 13:10:59 2012

/prg/wgs/7.0/Linux-amd64/bin/meryl -Dt -n 91 –s /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0
> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004.nmers.ovl.fasta
2> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004.nmers.ovl.fasta.err

----------------------------------------END Thu
Oct  4 13:12:04 2012 (65 seconds)

----------------------------------------START Thu
Oct  4 13:12:04 2012

/prg/wgs/7.0/Linux-amd64/bin/meryl -Dt -n 91 –s /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004-C-ms22-cm0
> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004.nmers.obt.fasta
2> /.../wgs-assembly/Budworm_wgs_20121004/0-mercounts/budworm_20121004.nmers.obt.fasta.err

----------------------------------------END Thu Oct  4 13:13:10 2012 (66 seconds)

Reset OBT mer threshold from auto to 91.

Reset OVL mer threshold from auto to 91.

----------------------------------------START
CONCURRENT Thu Oct  4 13:13:10 2012

/.../wgs-assembly/Budworm_wgs_20121004/0-mertrim/mertrim.sh
1 > /.../wgs-assembly/Budworm_wgs_20121004/0-mertrim/budworm_20121004.0001.err
2>&1

 (…)

 HASH  
13788239-  16725048  REFR  
1-  81460599  STRINGS   
2936810 BASES 1200000016

HASH  
16725049-  27637181  REFR   1- 
81460599  STRINGS   10912133 BASES 1200000017

HASH  
27637182-  38807450  REFR  
1-  81460599  STRINGS  
11170269 BASES 1200000061

HASH  
38807451-  50000808  REFR   1-  81460599 
STRINGS   11193358 BASES
1200000028

HASH  
50000809-  61105478  REFR   1- 
81460599  STRINGS   11104670 BASES 1200000077

HASH  
61105479-  72196104  REFR  
1-  81460599  STRINGS  
11090626 BASES 1200000008

HASH  
72196105-  81460599  REFR  
1-  81460599  STRINGS   
9264495 BASES 1003058249

----------------------------------------END Fri
Oct  5 17:59:40 2012 (49 seconds)

Created 14 overlap jobs.  Last batch '001', last job '000014'.

----------------------------------------START
CONCURRENT Fri Oct  5 17:59:40 2012

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
1 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000001.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
2 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000002.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
3 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000003.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
4 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000004.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
5 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000005.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
6 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000006.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
7 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000007.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
8 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000008.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
9 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000009.out 2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
10 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000010.out
2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
11 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000011.out
2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
12 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000012.out
2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
13 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000013.out
2>&1

/.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/overlap.sh
14 > /.../wgs-assembly/Budworm_wgs_20121004/0-overlaptrim-overlap/000014.out
2>&1

Re: [wgs-assembler-users] gatekeeper failed to add fragments

From: Walenz, B. <bw...@jc...> - 2012-10-20 02:31:36

Hi, Paul-

The default minimum allowed read length is 64 bases, and so gatekeeper threw out all your reads.  Our error reporting on fastq inputs isn't terribly great, I'm afraid.  You can change the minimum with frgMinLen=40 on the command line.  The minimum overlap length is 40, an unless you've got great coverage, you might want to drop it as well (ovlMinLen).

The CA7 release is rather old, and we're suggesting people start using the 'unstable' version from CVS.

b
--
Brian Walenz
Senior Software Engineer
J. Craig Venter Institute


________________________________________
From: Paul Cantalupo [pca...@gm...]
Sent: Friday, October 19, 2012 4:01 PM
To: wgs-assembler-users
Subject: [wgs-assembler-users] gatekeeper failed to add fragments

Hi

I'm trying for the first time to assemble Illumina fastq reads. After running runCA 7.0 with:

runCA -d cabogout -p SRR073769.uniq.bowtie.unmap SRR073769.uniq.bowtie.unmap.frg

I got this output:

----------------------------------------START Fri Oct 19 15:54:42 2012
/Users/pgc92/Public/usr/local/wgs-7.0/Darwin-amd64/bin/gatekeeper  -o /Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/cabogout/SRR073769.uniq.bowtie.unmap.gkpStore.BUILDING  -T  -F  /Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR073769.uniq.bowtie.unmap.frg > /Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/cabogout/SRR073769.uniq.bowtie.unmap.gkpStore.err 2>&1
----------------------------------------END Fri Oct 19 15:54:46 2012 (4 seconds)
numFrags = 0
================================================================================

runCA failed.

----------------------------------------
Stack trace:

 at /Users/pgc92/Public/usr/local/wgs/Darwin-i386/bin/runCA line 1237
        main::caFailure('gatekeeper failed to add fragments', '/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/cabog...') called at /Users/pgc92/Public/usr/local/wgs/Darwin-i386/bin/runCA line 1698
        main::preoverlap('/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR07...') called at /Users/pgc92/Public/usr/local/wgs/Darwin-i386/bin/runCA line 5874
----------------------------------------
Last few lines of the relevant log file (/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/cabogout/SRR073769.uniq.bowtie.unmap.gkpStore.err):

Starting file '/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR073769.uniq.bowtie.unmap.frg'.

Processing SINGLE-ENDED SANGER QV encoding reads from:
      '/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR073769.uniq.bowtie.unmap.fq'
GKP finished with no alerts or errors.
----------------------------------------
Failure message:

gatekeeper failed to add fragments



What am I doing wrong? My fastq file contains ~1 million 45 bp reads with sanger quality values. Here is head output of the fastq file

(57989) $ head SRR073769.uniq.bowtie.unmap.fq
@SRR073769.109 PATHBIO-SOLEXA2:2:1:3:1029 length=45
CTGCCCAGGCATAGTTCACCATCTTTCGGGTCCTAACACGTGCGC
+SRR073769.109 PATHBIO-SOLEXA2:2:1:3:1029 length=45
@@?@>@>@7@?9==@B@;@@@29>@6>3950:467>#########
@SRR073769.111 PATHBIO-SOLEXA2:2:1:3:1362 length=45
TGGTTAGTTTCTTCTCCTCCGCTGACTAATATGCTTAAATTCAGA
+SRR073769.111 PATHBIO-SOLEXA2:2:1:3:1362 length=45
CCCCCCC@CCCCCBCCBCCA@ABBCBBBCCBB8AB?6@ACB;?97
@SRR073769.113 PATHBIO-SOLEXA2:2:1:3:1458 length=45
GATCCACGGGGGCCGACCCGGTGACCCGGTTACCCGCCAGGTCCT



Here is the output of the FRG file:
(57990) $ cat *frg
{VER
ver:2
}
{LIB
act:A
acc:SRR073769.uniq.bowtie.unmap
ori:U
mea:0.000
std:0.000
src:
.
nft:16
fea:
forceBOGunitigger=1
isNotRandom=0
doNotTrustHomopolymerRuns=0
doTrim_initialNone=0
doTrim_initialMerBased=1
doTrim_initialFlowBased=0
doTrim_initialQualityBased=0
doRemoveDuplicateReads=1
doTrim_finalLargestCovered=1
doTrim_finalEvidenceBased=0
doRemoveSpurReads=1
doRemoveChimericReads=1
doConsensusCorrection=0
fastqQualityValues=sanger
fastqOrientation=innie
fastqReads=/Users/pgc92/vdiscovery/analysis/Prensner2011/GSM618509/SRR073769.uniq.bowtie.unmap.fq
.
}
{VER
ver:1
}



Thank you,

Paul


Paul Cantalupo
University of Pittsburgh

9 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 12 13 14 15 16 .. 19 > >> (Page 14 of 19)

2012	Jan (1)	Feb (2)	Mar	Apr (29)	May (8)	Jun (5)	Jul (46)	Aug (16)	Sep (5)	Oct (6)	Nov (17)	Dec (7)
2013	Jan (5)	Feb (2)	Mar (10)	Apr (13)	May (20)	Jun (7)	Jul (6)	Aug (14)	Sep (9)	Oct (19)	Nov (17)	Dec (3)
2014	Jan (3)	Feb	Mar (7)	Apr (1)	May (1)	Jun (30)	Jul (10)	Aug (2)	Sep (18)	Oct (3)	Nov (4)	Dec (13)
2015	Jan (27)	Feb	Mar (19)	Apr (12)	May (10)	Jun (18)	Jul (4)	Aug (2)	Sep (2)	Oct	Nov (1)	Dec (9)
2016	Jan (6)	Feb	Mar	Apr	May	Jun	Jul (1)	Aug (1)	Sep (1)	Oct	Nov	Dec