wgs-assembler-users Mailing List for Whole-Genome Shotgun Assembler (Page 6)

Brought to you by: brianwalenz, jasonmiller9704, mcschatz, skoren

wgs-assembler-users — Discussion about Celera Assembler

You can subscribe to this list here.

2012	_Jan (1)	_Feb (2)	_Mar	_Apr (29)	_May (8)	_Jun (5)	_Jul (46)	_Aug (16)	_Sep (5)	_Oct (6)	_Nov (17)	_Dec (7)
2013	_Jan (5)	_Feb (2)	_Mar (10)	_Apr (13)	_May (20)	_Jun (7)	_Jul (6)	_Aug (14)	_Sep (9)	_Oct (19)	_Nov (17)	_Dec (3)
2014	_Jan (3)	_Feb	_Mar (7)	_Apr (1)	_May (1)	_Jun (30)	_Jul (10)	_Aug (2)	_Sep (18)	_Oct (3)	_Nov (4)	_Dec (13)
2015	_Jan (27)	_Feb	_Mar (19)	_Apr (12)	_May (10)	_Jun (18)	_Jul (4)	_Aug (2)	_Sep (2)	_Oct	_Nov (1)	_Dec (9)
2016	_Jan (6)	_Feb	_Mar	_Apr	_May	_Jun	_Jul (1)	_Aug (1)	_Sep (1)	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 .. 4 5 6 7 8 .. 19 > >> (Page 6 of 19)

[wgs-assembler-users] overlap job

From: Manjari D. <man...@gm...> - 2014-12-08 06:01:29

Hi

i am running celera with 700000 reads each more  than 200 bp length and
genome size is approx 1.3Gb. The celera has generated 438064 overlap job.

I want to know if there is any method to increase the assembly speed. I
want results in 3 or 4 days.

Re: [wgs-assembler-users] error in celera

From: Ole K. T. <o.k...@ib...> - 2014-11-28 13:02:16

Hi,
what settings have you used for fastqToCA? (Alternatively sffToCA which is most common used for 454 reads).

fastqToCA has Illumina as default technology (-technology option), which only accepts reads shorter than 160 bp. Use ‘-technology 454’ or ‘-technology illumina-long’ if you have longer reads.

Ole

On 28 Nov 2014, at 13:29, Manjari Deshmukh <man...@gm...> wrote:

> Hi 
> I am trying to run celera on 454 FLX
> it is giving error as 
> 
> GKP finished with 20417312 alerts or errors:
> 20417312        # ILL Error: seq longer than longer than gkpShortReadLength bases, truncating.
> 
> ERROR: library IID 1 'Celera_10x' has 100.00% errors or warnings.
> 
> what does it mean?
> 
> Manjari
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk_______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

[wgs-assembler-users] error in celera

From: Manjari D. <man...@gm...> - 2014-11-28 12:29:53

Hi
I am trying to run celera on 454 FLX
it is giving error as

GKP finished with 20417312 alerts or errors:
20417312        # ILL Error: seq longer than longer than gkpShortReadLength
bases, truncating.

ERROR: library IID 1 'Celera_10x' has 100.00% errors or warnings.

what does it mean?

Manjari

[wgs-assembler-users] Celera failed

From: Manjari D. <man...@gm...> - 2014-11-24 09:30:39

Attachments: all454.fastq.xls

Hi,

We tried to run Celera on single end 454 FLX dataset for 1.4 Gb genome
assebmly with 250 GB RAM, 32 core processor and 8 TB hard disc and mainly
default parameter. It ran for 10 days and ran out of memory. We couldn't
get any useful results from it.

I have attached the spreadsheet for distribution of sequence length for the
given data set.

Thanks and regards,

Manjari

[wgs-assembler-users] minimum 454FLX read length for celera

From: Manjari D. <man...@gm...> - 2014-11-24 08:06:36

Hi,

I am interested to know the minimum read length of single end 454 FLX data
that can be use for celera assembler.
We have minimum read length from 40 to 1700 bp.

Thanks and regards

manjari

Re: [wgs-assembler-users] Viewing .asm files generated by Celera

From: Akshaya R. <ar...@bu...> - 2014-10-20 17:46:53

Ah beautiful! I did eventually get amos installed, but these are much
better options.
Thank you so much,
Akshaya

On Sat, Oct 11, 2014 at 3:53 PM, Brian Walenz <th...@gm...> wrote:

> Hi-
>
> The .asm is unwieldy and terrible to parse.  Use the 'posmap' files,
> specifically the frgctg or frgscf files list the position of each fragment
> in a contig/scaffold.
>
> http://wgs-assembler.sourceforge.net/wiki/index.php/POSMAP
>
> Another option:
>
> https://sourceforge.net/p/wgs-assembler/mailman/message/31494576/
>
> b
>
>
>
> On Mon, Oct 6, 2014 at 11:13 AM, Akshaya Ramesh <ar...@bu...> wrote:
>
>> Dear All,
>>
>> I would like to get information on coverage and the reads that were used
>> to form a contig/scaffold. Is it right to assume that the .asm file
>> contains this information? And if it does, I was wondering what package you
>> use to view these files?
>>
>> I have been working on installing amos3.1 which has a utility called
>> hawkeye that can be used to view .asm files. However, I am unable to
>> configure amos such that it recognizes CA files.I have sent an e-mail to
>> the amos-help, posted in seqanswers (
>> http://seqanswers.com/forums/showthread.php?t=47221) with no luck.
>>
>> Do any of you have any suggestions or have run into similar problems?
>>
>> I really appreciate your help.
>> Best,
>> Akshaya
>> --
>> Akshaya Ramesh
>> PhD candidate
>> Kepler Lab
>> Laboratory of Computational Immunology
>> Boston University School of Medicine
>> 72 E Concord Street, Room 504D
>> Boston, MA 02118
>>
>>
>> ------------------------------------------------------------------------------
>> Slashdot TV.  Videos for Nerds.  Stuff that Matters.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
>> _______________________________________________
>> wgs-assembler-users mailing list
>> wgs...@li...
>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>
>>
>


-- 
Akshaya Ramesh
PhD candidate
Kepler Lab
Laboratory of Computational Immunology
Boston University School of Medicine
72 E Concord Street, Room 504D
Boston, MA 02118

Re: [wgs-assembler-users] Viewing .asm files generated by Celera

From: Brian W. <th...@gm...> - 2014-10-11 19:53:26

Hi-

The .asm is unwieldy and terrible to parse.  Use the 'posmap' files,
specifically the frgctg or frgscf files list the position of each fragment
in a contig/scaffold.

http://wgs-assembler.sourceforge.net/wiki/index.php/POSMAP

Another option:

https://sourceforge.net/p/wgs-assembler/mailman/message/31494576/

b



On Mon, Oct 6, 2014 at 11:13 AM, Akshaya Ramesh <ar...@bu...> wrote:

> Dear All,
>
> I would like to get information on coverage and the reads that were used
> to form a contig/scaffold. Is it right to assume that the .asm file
> contains this information? And if it does, I was wondering what package you
> use to view these files?
>
> I have been working on installing amos3.1 which has a utility called
> hawkeye that can be used to view .asm files. However, I am unable to
> configure amos such that it recognizes CA files.I have sent an e-mail to
> the amos-help, posted in seqanswers (
> http://seqanswers.com/forums/showthread.php?t=47221) with no luck.
>
> Do any of you have any suggestions or have run into similar problems?
>
> I really appreciate your help.
> Best,
> Akshaya
> --
> Akshaya Ramesh
> PhD candidate
> Kepler Lab
> Laboratory of Computational Immunology
> Boston University School of Medicine
> 72 E Concord Street, Room 504D
> Boston, MA 02118
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.  Videos for Nerds.  Stuff that Matters.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>
>

[wgs-assembler-users] Viewing .asm files generated by Celera

From: Akshaya R. <ar...@bu...> - 2014-10-06 15:14:14

Dear All,

I would like to get information on coverage and the reads that were used to
form a contig/scaffold. Is it right to assume that the .asm file contains
this information? And if it does, I was wondering what package you use to
view these files?

I have been working on installing amos3.1 which has a utility called
hawkeye that can be used to view .asm files. However, I am unable to
configure amos such that it recognizes CA files.I have sent an e-mail to
the amos-help, posted in seqanswers (
http://seqanswers.com/forums/showthread.php?t=47221) with no luck.

Do any of you have any suggestions or have run into similar problems?

I really appreciate your help.
Best,
Akshaya
-- 
Akshaya Ramesh
PhD candidate
Kepler Lab
Laboratory of Computational Immunology
Boston University School of Medicine
72 E Concord Street, Room 504D
Boston, MA 02118

Re: [wgs-assembler-users] Trouble interpreting POSMAP info

From: Brian W. <th...@gm...> - 2014-09-26 20:28:53

Hi, Ivan-

I finally had a chance to look at this.  I see no problems.  I computed the
distance between the posmap placement and a placement found by mapping with
'bwa mem'.  Most of the placements are within 100bp of the two methods.

I suspect you used the fastqUIDmap from the 'trimming' run, and not from
the 'assembly' run.  The two read sets are different; trimming deletes many
reads.  In particular, the second read id (#24884) you list in the posmap
file doesn't exist in my assembly.

b


On Tue, Sep 16, 2014 at 9:44 AM, Brian Walenz <th...@gm...> wrote:

> The posmap positions are derived from the untig/contig multialignments,
> and I doubt they're incorrect.  Too much other stuff would be broken too.
>
> There are some big repeats in this genome, if I remember, one at the start
> of the contig.  Since most reads are in the same contig, can you compute
> the distance between posmap-position and blasr-position?  I don't have
> (yet) this assembly to analyze.
>
> On Sun, Sep 14, 2014 at 2:58 AM, Ivan Sovic <iva...@gm...> wrote:
>
>> Hi Brian!
>>
>> Thank you for your reply, and I apologize for my slow response.
>> It's nice to hear that I'm not the only one with this problem :)
>>
>> I would be happy to share an example.
>> Here is the first 5 lines of the posmap.frg.ctg file, where I have
>> replaced the IDs of reads with their actual names (the relation was taken
>> from asm.gkpStore.fastqUIDmap):
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/9515/0_4256
>> ctg7180000000002    0    8435    f
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24884/2806_11942
>> ctg7180000000002    1495    8147    f
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24752/0_1244
>> ctg7180000000002    1617    12822    f
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/14271/1648_4730
>> ctg7180000000002    1699    8847    r
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/10369/11194_16593
>> ctg7180000000002    1760    8558    r
>>
>> The last two numbers of each read's name roughly gives its length (I
>> think they are subreads, so read 2 should be 9136 bases long).
>> Here is where BLASR placed  them (I copy only the first few fields of the
>> SAM entries, up to the CIGAR string):
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/9515/0_4256/0_4256
>> 0    ctg7180000000002    4174066    254
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24884/2806_11942/0_9136
>> 16    ctg7180000000002    1510215    254
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24752/0_1244/0_1244
>> 16    ctg7180000000002    881151    254
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/14271/1648_4730/0_3082
>> 16    ctg7180000000002    4413614    254
>> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/10369/11194_16593/0_5399
>> 0    ctg7180000000002    1891829    254
>>
>> Contig placement is good, but it's kind of hard to miss that - there are
>> only two contigs, one is the size of the genome (E. Coli,
>> ctg7180000000002), and the other is the size of two reads (7180000000003).
>> I checked manually, none of the listed reads were clipped (according to
>> their CIGAR strings).
>>
>> The assembly is described here, together with the E. Coli datasets
>> (PacBio reads extracted into FASTQ files) and with instructions on how to
>> run the assembly:
>>
>> http://wgs-assembler.sourceforge.net/wiki/index.php/Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1
>> It runs for about half an hour, and produces a complete assembly of E.
>> Coli.
>>
>> Do you have any ideas what's going on with these files?
>>
>>
>> Thank you and best regards!
>> Ivan
>>
>>
>>
>>
>> On Thu, Sep 11, 2014 at 8:17 PM, Brian Walenz <th...@gm...> wrote:
>>
>>> When evaluating the read trimming used in the uncorrected assemblies, we
>>> had _great_ trouble comparing results from mappings (blasr, nucmer, blast,
>>> whatever) against what CA was doing.  BLASR was probably the worst offender
>>> here, usually failing to map portions of the read that we thought were
>>> good.  I think you're seeing the same effect.
>>>
>>> Are the placements to different contigs, or are they mostly overlapping
>>> but with different end points?  Can you share a small example?  I'll try
>>> the same experiment here.
>>>
>>> Mapping trimmed reads might get closer to what posmap claims, but aside
>>> from a sanity check, there might be little value in it.  Kind of like
>>> validating with only "good" mate pairs, you won't see any mistakes.
>>>
>>> b
>>>
>>>
>>> On Thu, Sep 11, 2014 at 2:08 AM, Ivan Sovic <iva...@gm...>
>>> wrote:
>>>
>>>> Hi everyone!
>>>>
>>>> I have trouble with interpreting the POSMAP data of an assembly.
>>>> In short - when I compare the positions of reads that are given in the
>>>> asm.posmap.frgctg file with the positions I obtain after aligning the reads
>>>> to the assembly in asm.ctg.fasta, I can see no relation between the two.
>>>> For alignment, I used both BLASR and BWA-MEM.
>>>>
>>>> Description of what I am doing in more details:
>>>> Following this tutorial (
>>>> http://wgs-assembler.sourceforge.net/wiki/index.php/Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1)
>>>> I assembled the E. Coli genome from a set of PacBio reads, and the results
>>>> were exactly as described.
>>>> After that, I parsed the asm.posmap.frgctg file to obtain the list of
>>>> reads that were actually used in the assembly.
>>>> I extracted their original headers from the asm.gkpStore.fastqUIDmap
>>>> file, and filtered the initial set of reads, so the resulting set contains
>>>> only those reads listed in the asm.posmap.frgctg file.
>>>> After that, I used both BLASR with default parameters, and BWA-MEM with
>>>> PacBio parameters to align those reads on the contig file asm.ctg.fasta.
>>>> I then compared the positions of obtained alignments to the positions
>>>> that are reported in asm.posmap.frgctg, and I see no correspondance.
>>>>
>>>> Can anyone provide any insight into this?
>>>> Am I missing something?
>>>> Or maybe the POSMAP files weren't updated with the rest of Celera?
>>>>
>>>>
>>>> Thank you for your help!
>>>>
>>>>
>>>> Best regards,
>>>> Ivan Sovic.
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Want excitement?
>>>> Manually upgrade your production database.
>>>> When you want reliability, choose Perforce
>>>> Perforce version control. Predictably reliable.
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> wgs-assembler-users mailing list
>>>> wgs...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>>>
>>>>
>>>
>>
>

Re: [wgs-assembler-users] /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)

From: Walenz, B. <wa...@nb...> - 2014-09-23 14:41:18

You might still be compiling with the older compiler.  Try adding the new compiler to the start of your path, or set environment variables CC and CXX to point to the new version.

Your glibc package might be ancient, try updating it.

This isn’t a problem specific to the assembler.  Searching for ‘GLIBCXX_3.4.10 not found’ gives lots of other suggestions.  For example: https://bbs.archlinux.org/viewtopic.php?pid=1065388

b


From: wuk...@16... [mailto:wuk...@16...]
Sent: Tuesday, September 23, 2014 9:07 AM
To: Brian Walenz
Cc: wgs-assembler-users
Subject: Re: [wgs-assembler-users] /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)

Hi,
I compile it from source code like:

bzip2 -dc wgs-8.1.tar.bz2 | tar -xf - cd wgs-8.1 cd kmer && make install && cd .. cd samtools && make && cd .. cd src && make && cd .. cd ..


The old gcc version is 4.1.2, and I  install a new version of gcc 4.5.1 on own account.



________________________________
Best,

Kai Wu

From: Brian Walenz<mailto:th...@gm...>
Date: 2014-09-23 20:58
To: wuk...@16...<mailto:wuk...@16...>
CC: wgs-assembler-users<mailto:wgs...@li...>
Subject: Re: [wgs-assembler-users] /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)
Hi-

Did you compile this yourself, or is it the pre-compiled version from sourceforge?  Try compiling yourself.  How old is 'too old'?

b

On Sun, Sep 21, 2014 at 2:14 AM, wuk...@16...<mailto:wuk...@16...> <wuk...@16...<mailto:wuk...@16...>> wrote:
Dear colleagues,

When I run the command "runCA -p ipagpj029hmc001 -d ipagpj029hmc001_raw useGrid=1 scriptOnGrid=1 doOBT=1 unitigger=bogart /home/kwu/workdir/my_projects/ipag_pj029/data/CA_data/ipagpj029hmc001_1.1.frg /home/kwu/workdir/my_projects/ipag_pj029/data/CA_data/ipagpj029hmc001_2.1.frg"
After a while, the error massege of ipagpj029hmc001.gkpStore.err is:
/share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/
my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)

I don't know why it? I know my gcc version is too old. So, I install a new version of gcc on my own account. And I set the environment variable:
export LD_LIBRARY_PATH=/home/kwu/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/home/kwu/lib64:$LD_LIBRARY_PATH

But, it seem can't find it.

________________________________
Best,

Kai Wu

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
wgs-assembler-users mailing list
wgs...@li...<mailto:wgs...@li...>
https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)

From: <wuk...@16...> - 2014-09-23 13:07:15

Hi,
I compile it from source code like:

bzip2 -dc wgs-8.1.tar.bz2 | tar -xf - cd wgs-8.1 cd kmer && make install && cd .. cd samtools && make && cd .. cd src && make && cd .. cd .. 

The old gcc version is 4.1.2, and I  install a new version of gcc 4.5.1 on own account.




Best,
 
Kai Wu
 
From: Brian Walenz
Date: 2014-09-23 20:58
To: wuk...@16...
CC: wgs-assembler-users
Subject: Re: [wgs-assembler-users] /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)
Hi-

Did you compile this yourself, or is it the pre-compiled version from sourceforge?  Try compiling yourself.  How old is 'too old'?

b


On Sun, Sep 21, 2014 at 2:14 AM, wuk...@16... <wuk...@16...> wrote:
Dear colleagues,

When I run the command "runCA -p ipagpj029hmc001 -d ipagpj029hmc001_raw useGrid=1 scriptOnGrid=1 doOBT=1 unitigger=bogart /home/kwu/workdir/my_projects/ipag_pj029/data/CA_data/ipagpj029hmc001_1.1.frg /home/kwu/workdir/my_projects/ipag_pj029/data/CA_data/ipagpj029hmc001_2.1.frg" 
After a while, the error massege of ipagpj029hmc001.gkpStore.err is: 
/share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/ 
my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)

I don't know why it? I know my gcc version is too old. So, I install a new version of gcc on my own account. And I set the environment variable: 
export LD_LIBRARY_PATH=/home/kwu/lib:$LD_LIBRARY_PATH 
export LD_LIBRARY_PATH=/home/kwu/lib64:$LD_LIBRARY_PATH 

But, it seem can't find it.



Best,
 
Kai Wu

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
wgs-assembler-users mailing list
wgs...@li...
https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)

From: Brian W. <th...@gm...> - 2014-09-23 12:58:53

Hi-

Did you compile this yourself, or is it the pre-compiled version from
sourceforge?  Try compiling yourself.  How old is 'too old'?

b


On Sun, Sep 21, 2014 at 2:14 AM, wuk...@16... <wuk...@16...>
wrote:

> Dear colleagues,
>
> When I run the command "runCA -p ipagpj029hmc001 -d ipagpj029hmc001_raw
> useGrid=1 scriptOnGrid=1 doOBT=1 unitigger=bogart
> /home/kwu/workdir/my_projects/ipag_pj029/data/CA_data/ipagpj029hmc001_1.1.frg
> /home/kwu/workdir/my_projects/ipag_pj029/data/CA_data/ipagpj029hmc001_2.1.frg
> "
> After a while, the error massege of ipagpj029hmc001.gkpStore.err is:
> /share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper:
> /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by
> /share/work/lhuang/
> my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)
>
> I don't know why it? I know my gcc version is too old. So, I install a new
> version of gcc on my own account. And I set the environment variable:
> export LD_LIBRARY_PATH=/home/kwu/lib:$LD_LIBRARY_PATH
> export LD_LIBRARY_PATH=/home/kwu/lib64:$LD_LIBRARY_PATH
>
> But, it seem can't find it.
>
> ------------------------------
>  Best,
>
> Kai Wu
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>
>

[wgs-assembler-users] /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)

From: <wuk...@16...> - 2014-09-21 06:14:37

Dear colleagues,

When I run the command "runCA -p ipagpj029hmc001 -d ipagpj029hmc001_raw useGrid=1 scriptOnGrid=1 doOBT=1 unitigger=bogart /home/kwu/workdir/my_projects/ipag_pj029/data/CA_data/ipagpj029hmc001_1.1.frg /home/kwu/workdir/my_projects/ipag_pj029/data/CA_data/ipagpj029hmc001_2.1.frg" 
After a while, the error massege of ipagpj029hmc001.gkpStore.err is: 
/share/work/lhuang/my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.10' not found (required by /share/work/lhuang/ 
my_apps/wgs-8.1/Linux-amd64/bin/gatekeeper)

I don't know why it? I know my gcc version is too old. So, I install a new version of gcc on my own account. And I set the environment variable: 
export LD_LIBRARY_PATH=/home/kwu/lib:$LD_LIBRARY_PATH 
export LD_LIBRARY_PATH=/home/kwu/lib64:$LD_LIBRARY_PATH 

But, it seem can't find it.



Best,
 
Kai Wu

Re: [wgs-assembler-users] Trouble interpreting POSMAP info

From: Brian W. <th...@gm...> - 2014-09-16 13:44:41

The posmap positions are derived from the untig/contig multialignments, and
I doubt they're incorrect.  Too much other stuff would be broken too.

There are some big repeats in this genome, if I remember, one at the start
of the contig.  Since most reads are in the same contig, can you compute
the distance between posmap-position and blasr-position?  I don't have
(yet) this assembly to analyze.

On Sun, Sep 14, 2014 at 2:58 AM, Ivan Sovic <iva...@gm...> wrote:

> Hi Brian!
>
> Thank you for your reply, and I apologize for my slow response.
> It's nice to hear that I'm not the only one with this problem :)
>
> I would be happy to share an example.
> Here is the first 5 lines of the posmap.frg.ctg file, where I have
> replaced the IDs of reads with their actual names (the relation was taken
> from asm.gkpStore.fastqUIDmap):
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/9515/0_4256
> ctg7180000000002    0    8435    f
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24884/2806_11942
> ctg7180000000002    1495    8147    f
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24752/0_1244
> ctg7180000000002    1617    12822    f
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/14271/1648_4730
> ctg7180000000002    1699    8847    r
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/10369/11194_16593
> ctg7180000000002    1760    8558    r
>
> The last two numbers of each read's name roughly gives its length (I think
> they are subreads, so read 2 should be 9136 bases long).
> Here is where BLASR placed  them (I copy only the first few fields of the
> SAM entries, up to the CIGAR string):
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/9515/0_4256/0_4256
> 0    ctg7180000000002    4174066    254
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24884/2806_11942/0_9136
> 16    ctg7180000000002    1510215    254
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24752/0_1244/0_1244
> 16    ctg7180000000002    881151    254
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/14271/1648_4730/0_3082
> 16    ctg7180000000002    4413614    254
> m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/10369/11194_16593/0_5399
> 0    ctg7180000000002    1891829    254
>
> Contig placement is good, but it's kind of hard to miss that - there are
> only two contigs, one is the size of the genome (E. Coli,
> ctg7180000000002), and the other is the size of two reads (7180000000003).
> I checked manually, none of the listed reads were clipped (according to
> their CIGAR strings).
>
> The assembly is described here, together with the E. Coli datasets (PacBio
> reads extracted into FASTQ files) and with instructions on how to run the
> assembly:
>
> http://wgs-assembler.sourceforge.net/wiki/index.php/Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1
> It runs for about half an hour, and produces a complete assembly of E.
> Coli.
>
> Do you have any ideas what's going on with these files?
>
>
> Thank you and best regards!
> Ivan
>
>
>
>
> On Thu, Sep 11, 2014 at 8:17 PM, Brian Walenz <th...@gm...> wrote:
>
>> When evaluating the read trimming used in the uncorrected assemblies, we
>> had _great_ trouble comparing results from mappings (blasr, nucmer, blast,
>> whatever) against what CA was doing.  BLASR was probably the worst offender
>> here, usually failing to map portions of the read that we thought were
>> good.  I think you're seeing the same effect.
>>
>> Are the placements to different contigs, or are they mostly overlapping
>> but with different end points?  Can you share a small example?  I'll try
>> the same experiment here.
>>
>> Mapping trimmed reads might get closer to what posmap claims, but aside
>> from a sanity check, there might be little value in it.  Kind of like
>> validating with only "good" mate pairs, you won't see any mistakes.
>>
>> b
>>
>>
>> On Thu, Sep 11, 2014 at 2:08 AM, Ivan Sovic <iva...@gm...> wrote:
>>
>>> Hi everyone!
>>>
>>> I have trouble with interpreting the POSMAP data of an assembly.
>>> In short - when I compare the positions of reads that are given in the
>>> asm.posmap.frgctg file with the positions I obtain after aligning the reads
>>> to the assembly in asm.ctg.fasta, I can see no relation between the two.
>>> For alignment, I used both BLASR and BWA-MEM.
>>>
>>> Description of what I am doing in more details:
>>> Following this tutorial (
>>> http://wgs-assembler.sourceforge.net/wiki/index.php/Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1)
>>> I assembled the E. Coli genome from a set of PacBio reads, and the results
>>> were exactly as described.
>>> After that, I parsed the asm.posmap.frgctg file to obtain the list of
>>> reads that were actually used in the assembly.
>>> I extracted their original headers from the asm.gkpStore.fastqUIDmap
>>> file, and filtered the initial set of reads, so the resulting set contains
>>> only those reads listed in the asm.posmap.frgctg file.
>>> After that, I used both BLASR with default parameters, and BWA-MEM with
>>> PacBio parameters to align those reads on the contig file asm.ctg.fasta.
>>> I then compared the positions of obtained alignments to the positions
>>> that are reported in asm.posmap.frgctg, and I see no correspondance.
>>>
>>> Can anyone provide any insight into this?
>>> Am I missing something?
>>> Or maybe the POSMAP files weren't updated with the rest of Celera?
>>>
>>>
>>> Thank you for your help!
>>>
>>>
>>> Best regards,
>>> Ivan Sovic.
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want excitement?
>>> Manually upgrade your production database.
>>> When you want reliability, choose Perforce
>>> Perforce version control. Predictably reliable.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> wgs-assembler-users mailing list
>>> wgs...@li...
>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>>
>>>
>>
>

Re: [wgs-assembler-users] Trouble interpreting POSMAP info

From: Ivan S. <iva...@gm...> - 2014-09-14 06:58:43

Hi Brian!

Thank you for your reply, and I apologize for my slow response.
It's nice to hear that I'm not the only one with this problem :)

I would be happy to share an example.
Here is the first 5 lines of the posmap.frg.ctg file, where I have replaced
the IDs of reads with their actual names (the relation was taken from
asm.gkpStore.fastqUIDmap):
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/9515/0_4256
ctg7180000000002    0    8435    f
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24884/2806_11942
ctg7180000000002    1495    8147    f
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24752/0_1244
ctg7180000000002    1617    12822    f
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/14271/1648_4730
ctg7180000000002    1699    8847    r
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/10369/11194_16593
ctg7180000000002    1760    8558    r

The last two numbers of each read's name roughly gives its length (I think
they are subreads, so read 2 should be 9136 bases long).
Here is where BLASR placed  them (I copy only the first few fields of the
SAM entries, up to the CIGAR string):
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/9515/0_4256/0_4256
0    ctg7180000000002    4174066    254
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24884/2806_11942/0_9136
16    ctg7180000000002    1510215    254
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/24752/0_1244/0_1244
16    ctg7180000000002    881151    254
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/14271/1648_4730/0_3082
16    ctg7180000000002    4413614    254
m130404_014004_sidney_c100506902550000001823076808221337_s1_p0/10369/11194_16593/0_5399
0    ctg7180000000002    1891829    254

Contig placement is good, but it's kind of hard to miss that - there are
only two contigs, one is the size of the genome (E. Coli,
ctg7180000000002), and the other is the size of two reads (7180000000003).
I checked manually, none of the listed reads were clipped (according to
their CIGAR strings).

The assembly is described here, together with the E. Coli datasets (PacBio
reads extracted into FASTQ files) and with instructions on how to run the
assembly:
http://wgs-assembler.sourceforge.net/wiki/index.php/Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1
It runs for about half an hour, and produces a complete assembly of E. Coli.

Do you have any ideas what's going on with these files?


Thank you and best regards!
Ivan



On Thu, Sep 11, 2014 at 8:17 PM, Brian Walenz <th...@gm...> wrote:

> When evaluating the read trimming used in the uncorrected assemblies, we
> had _great_ trouble comparing results from mappings (blasr, nucmer, blast,
> whatever) against what CA was doing.  BLASR was probably the worst offender
> here, usually failing to map portions of the read that we thought were
> good.  I think you're seeing the same effect.
>
> Are the placements to different contigs, or are they mostly overlapping
> but with different end points?  Can you share a small example?  I'll try
> the same experiment here.
>
> Mapping trimmed reads might get closer to what posmap claims, but aside
> from a sanity check, there might be little value in it.  Kind of like
> validating with only "good" mate pairs, you won't see any mistakes.
>
> b
>
>
> On Thu, Sep 11, 2014 at 2:08 AM, Ivan Sovic <iva...@gm...> wrote:
>
>> Hi everyone!
>>
>> I have trouble with interpreting the POSMAP data of an assembly.
>> In short - when I compare the positions of reads that are given in the
>> asm.posmap.frgctg file with the positions I obtain after aligning the reads
>> to the assembly in asm.ctg.fasta, I can see no relation between the two.
>> For alignment, I used both BLASR and BWA-MEM.
>>
>> Description of what I am doing in more details:
>> Following this tutorial (
>> http://wgs-assembler.sourceforge.net/wiki/index.php/Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1)
>> I assembled the E. Coli genome from a set of PacBio reads, and the results
>> were exactly as described.
>> After that, I parsed the asm.posmap.frgctg file to obtain the list of
>> reads that were actually used in the assembly.
>> I extracted their original headers from the asm.gkpStore.fastqUIDmap
>> file, and filtered the initial set of reads, so the resulting set contains
>> only those reads listed in the asm.posmap.frgctg file.
>> After that, I used both BLASR with default parameters, and BWA-MEM with
>> PacBio parameters to align those reads on the contig file asm.ctg.fasta.
>> I then compared the positions of obtained alignments to the positions
>> that are reported in asm.posmap.frgctg, and I see no correspondance.
>>
>> Can anyone provide any insight into this?
>> Am I missing something?
>> Or maybe the POSMAP files weren't updated with the rest of Celera?
>>
>>
>> Thank you for your help!
>>
>>
>> Best regards,
>> Ivan Sovic.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce
>> Perforce version control. Predictably reliable.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> wgs-assembler-users mailing list
>> wgs...@li...
>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>
>>
>

Re: [wgs-assembler-users] Trouble interpreting POSMAP info

From: Brian W. <th...@gm...> - 2014-09-11 12:17:54

When evaluating the read trimming used in the uncorrected assemblies, we
had _great_ trouble comparing results from mappings (blasr, nucmer, blast,
whatever) against what CA was doing.  BLASR was probably the worst offender
here, usually failing to map portions of the read that we thought were
good.  I think you're seeing the same effect.

Are the placements to different contigs, or are they mostly overlapping but
with different end points?  Can you share a small example?  I'll try the
same experiment here.

Mapping trimmed reads might get closer to what posmap claims, but aside
from a sanity check, there might be little value in it.  Kind of like
validating with only "good" mate pairs, you won't see any mistakes.

b


On Thu, Sep 11, 2014 at 2:08 AM, Ivan Sovic <iva...@gm...> wrote:

> Hi everyone!
>
> I have trouble with interpreting the POSMAP data of an assembly.
> In short - when I compare the positions of reads that are given in the
> asm.posmap.frgctg file with the positions I obtain after aligning the reads
> to the assembly in asm.ctg.fasta, I can see no relation between the two.
> For alignment, I used both BLASR and BWA-MEM.
>
> Description of what I am doing in more details:
> Following this tutorial (
> http://wgs-assembler.sourceforge.net/wiki/index.php/Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1)
> I assembled the E. Coli genome from a set of PacBio reads, and the results
> were exactly as described.
> After that, I parsed the asm.posmap.frgctg file to obtain the list of
> reads that were actually used in the assembly.
> I extracted their original headers from the asm.gkpStore.fastqUIDmap file,
> and filtered the initial set of reads, so the resulting set contains only
> those reads listed in the asm.posmap.frgctg file.
> After that, I used both BLASR with default parameters, and BWA-MEM with
> PacBio parameters to align those reads on the contig file asm.ctg.fasta.
> I then compared the positions of obtained alignments to the positions that
> are reported in asm.posmap.frgctg, and I see no correspondance.
>
> Can anyone provide any insight into this?
> Am I missing something?
> Or maybe the POSMAP files weren't updated with the rest of Celera?
>
>
> Thank you for your help!
>
>
> Best regards,
> Ivan Sovic.
>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>
>

[wgs-assembler-users] Trouble interpreting POSMAP info

From: Ivan S. <iva...@gm...> - 2014-09-11 06:08:12

Hi everyone!

I have trouble with interpreting the POSMAP data of an assembly.
In short - when I compare the positions of reads that are given in the
asm.posmap.frgctg file with the positions I obtain after aligning the reads
to the assembly in asm.ctg.fasta, I can see no relation between the two.
For alignment, I used both BLASR and BWA-MEM.

Description of what I am doing in more details:
Following this tutorial (
http://wgs-assembler.sourceforge.net/wiki/index.php/Escherichia_coli_K12_MG1655,_using_uncorrected_PacBio_reads,_with_CA8.1)
I assembled the E. Coli genome from a set of PacBio reads, and the results
were exactly as described.
After that, I parsed the asm.posmap.frgctg file to obtain the list of reads
that were actually used in the assembly.
I extracted their original headers from the asm.gkpStore.fastqUIDmap file,
and filtered the initial set of reads, so the resulting set contains only
those reads listed in the asm.posmap.frgctg file.
After that, I used both BLASR with default parameters, and BWA-MEM with
PacBio parameters to align those reads on the contig file asm.ctg.fasta.
I then compared the positions of obtained alignments to the positions that
are reported in asm.posmap.frgctg, and I see no correspondance.

Can anyone provide any insight into this?
Am I missing something?
Or maybe the POSMAP files weren't updated with the rest of Celera?


Thank you for your help!


Best regards,
Ivan Sovic.