cg-pipeline-users Mailing List for CG-Pipeline (Page 3)

A computational genomics pipeline for prokaryotic sequencing projects

Status: Beta

Brought to you by: jhumphrey6, lskatz

cg-pipeline-users — Help with CG-Pipeline may be available from users on this mailing list

You can subscribe to this list here.

2011	_Jan	_Feb (6)	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2012	_Jan (1)	_Feb	_Mar	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec (8)
2013	_Jan	_Feb	_Mar	_Apr	_May	_Jun (12)	_Jul (14)	_Aug (9)	_Sep (1)	_Oct (2)	_Nov	_Dec
2014	_Jan	_Feb	_Mar (2)	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2015	_Jan	_Feb	_Mar (1)	_Apr	_May	_Jun	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 2 3 (Page 3 of 3)

Re: [Cg-pipeline-users] Output Q

From: Todd Y. <ty...@la...> - 2011-02-24 22:48:02

That's just what I needed, thanks for the quick reply :)

On Feb 24, 2011, at 10:44 AM, Jay Humphrey wrote:

> The code is in scripts/run_annotation_genbank.pl
> http://cg-pipeline.svn.sourceforge.net/viewvc/cg-pipeline/cg_pipeline/trunk/scripts/run_annotation_genbank.pl?revision=137&view=markup
> Line 183, try adding before "my $note=remove_tags($newftr)" the following:
> 
> $newftr->remove_tag('uniprot_id');
> $newftr->add_tag_value('uniprot_id',$gene);
> my $note=remove_tags($newftr); #keep old line 183 here
> 
> 
> But then your uniprot_id is gone! Why not just add gene name, instead of replacing uniprot_id?
> 
> $newftr->add_tag_value('gene',$gene);
> my $note=remove_tags($newftr); #keep old line 183 here
> 
> 
> On Thu, Feb 24, 2011 at 5:16 PM, Todd Yilk <ty...@la...> wrote:
> Hello all,
> 
> There's a change I'd like to make to the output .gb file from the annotation stage: I'd like to replace the uniprot_id in the /note section with the name of the gene. Could you point me to where in the code this file is created?
> 
> Thanks,
> 
> Todd Yilk
> Biosciences Division
> Los Alamos National Laboratory
> ------------------------------------------------------------------------------
> Free Software Download: Index, Search & Analyze Logs and other IT data in
> Real-Time with Splunk. Collect, index and harness all the fast moving IT data
> generated by your applications, servers and devices whether physical, virtual
> or in the cloud. Deliver compliance at lower cost and gain new business
> insights. http://p.sf.net/sfu/splunk-dev2dev
> _______________________________________________
> Cg-pipeline-users mailing list
> Cg-...@li...
> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users
>

Re: [Cg-pipeline-users] Output Q

From: Jay H. <jhu...@gm...> - 2011-02-24 17:44:28

The code is in scripts/run_annotation_genbank.pl
http://cg-pipeline.svn.sourceforge.net/viewvc/cg-pipeline/cg_pipeline/trunk/scripts/run_annotation_genbank.pl?revision=137&view=markup
Line 183, try adding before "my $note=remove_tags($newftr)" the following:

$newftr->remove_tag('uniprot_id');
$newftr->add_tag_value('uniprot_id',$gene);
my $note=remove_tags($newftr); #keep old line 183 here


But then your uniprot_id is gone! Why not just add gene name, instead of
replacing uniprot_id?

$newftr->add_tag_value('gene',$gene);
my $note=remove_tags($newftr); #keep old line 183 here


On Thu, Feb 24, 2011 at 5:16 PM, Todd Yilk <ty...@la...> wrote:

> Hello all,
>
> There's a change I'd like to make to the output .gb file from the
> annotation stage: I'd like to replace the uniprot_id in the /note section
> with the name of the gene. Could you point me to where in the code this file
> is created?
>
> Thanks,
>
> Todd Yilk
> Biosciences Division
> Los Alamos National Laboratory
>
> ------------------------------------------------------------------------------
> Free Software Download: Index, Search & Analyze Logs and other IT data in
> Real-Time with Splunk. Collect, index and harness all the fast moving IT
> data
> generated by your applications, servers and devices whether physical,
> virtual
> or in the cloud. Deliver compliance at lower cost and gain new business
> insights. http://p.sf.net/sfu/splunk-dev2dev
> _______________________________________________
> Cg-pipeline-users mailing list
> Cg-...@li...
> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users
>

[Cg-pipeline-users] Output Q

From: Todd Y. <ty...@la...> - 2011-02-24 17:16:59

Hello all,

There's a change I'd like to make to the output .gb file from the annotation stage: I'd like to replace the uniprot_id in the /note section with the name of the gene. Could you point me to where in the code this file is created?

Thanks,

Todd Yilk
Biosciences Division
Los Alamos National Laboratory

Re: [Cg-pipeline-users] pipeline question

From: Matthew S. <ms...@la...> - 2011-02-16 00:30:32

Jay,

 

Thanks for the feedback.  I had a couple of follow ups.

 

We are using 0.3.1:

 

Run_pipeline makedb -p project does not appear to work, I would like to see
that output..

 

I guess I was wondering if there were any way to have the genbank file
itself contain more information than the swisprotID, which is cumbersome to
analyze by hand.  Perhaps generating the fasta would be better.

 

Thanks again,

 

____________________________

Matthew Scholz

GRA-Los Alamos National Laboratory

ms...@la...

(505) 665-8574

 

From: Jay Humphrey [mailto:jhu...@gm...] 
Sent: Tuesday, February 15, 2011 4:17 PM
To: Matthew Scholz
Cc: cg-...@li...
Subject: Re: [Cg-pipeline-users] pipeline question

 

Hi Matthew,
We appreciate your interest in the project. I'll try to answer some of these
issues.

Organism name:
http://wiki.jordan.biology.gatech.edu/index.php/CG-Pipeline#Configuration
It's not structured, just a simple string assignment to the variable
'classification'.

Good point about the genes versus ORFs.  We've gone by the guidelines for
GenBank submissions, and that is all.  The pipeline writes all of the data
that was generated into this file.  Genes and their products are named based
on UniProt hits.  If an ORF gets no sufficient scoring hits, it gets no gene
name and no product name.  This definitely deserves documentation in the
wiki, when we find the time!

You can also find a FASTA file in the [project]/database/ directory (as of
version 0.3.1; if you ran an older version, you should be able to upgrade
and run 'run_pipeline makedb -p YourProject' to generate the FASTA and other
files) which contains all of the ORFS.  If it is no known gene or product,
it will have a defline such as this, which may help you filter them out:

>lcl|M15293_draft_0007|CDS|7202|6354 predicted cds

As for the rest of your questions -- can you tell what version you've got?
We can only support the latest version, which you can get here
http://sourceforge.net/projects/cg-pipeline/files/.  We'll help you through
the upgrade.  You might need a few new third-party applications, such as
RNAmmer.  There will be a new paper written for these updates, but I can't
say when.

There should be only one pipelinerc, in conf/.  Any other one is a mistake,
sorry.

Best of luck,
Jay Humphrey


On 2/15/2011 5:58 PM, Matthew Scholz wrote: 

Hello,

 

We have installed the pipeline locally, and I was wondering if you could
answer a couple of questions

 

First of all, I have noticed two or three issues with the annotation.gbk
file.

 

Firstly, the naming structure for the organism.  Is there a defined
structure that must be entered into the cgpipelinerc file to make the name
show up appropriately in the entries?  (it defaults to nesseceria, and to
change it for viruses, for example, I would like to know what the structure
is)

 

Second, It is exceedingly difficult from first glance to determine if an ORF
has been assigned a function/homology, or is simply an identified ORF.  Is
there any way to modify the write-out to make this distinction more obvious?

 

Third, I have been attempting to find a way to have annotated genes write
out the functional name to the gbk file, rather than simply the gene_id as
it does now.

 

Also, I was unclear as to why there are two cgpipelinerc files (one in the
conf directory and one in the lib directory). 

 

Thank you,

 

____________________________

Matthew Scholz

GRA-Los Alamos National Laboratory

ms...@la...

(505) 665-8574

 

 
 
----------------------------------------------------------------------------
--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
 
 
_______________________________________________
Cg-pipeline-users mailing list
Cg-...@li...
https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users





-- 
Jay Humphrey
92 Debden Road
SAFFRON WALDEN
CB11 4AL
UNITED KINGDOM

Re: [Cg-pipeline-users] pipeline question

From: Jay H. <jhu...@gm...> - 2011-02-15 23:17:26

Hi Matthew,
We appreciate your interest in the project. I'll try to answer some of 
these issues.

Organism name: 
http://wiki.jordan.biology.gatech.edu/index.php/CG-Pipeline#Configuration
It's not structured, just a simple string assignment to the variable 
'classification'.

Good point about the genes versus ORFs.  We've gone by the guidelines 
for GenBank submissions, and that is all.  The pipeline writes all of 
the data that was generated into this file.  Genes and their products 
are named based on UniProt hits.  If an ORF gets no sufficient scoring 
hits, it gets no gene name and no product name.  This definitely 
deserves documentation in the wiki, when we find the time!

You can also find a FASTA file in the [project]/database/ directory (as 
of version 0.3.1; if you ran an older version, you should be able to 
upgrade and run 'run_pipeline makedb -p YourProject' to generate the 
FASTA and other files) which contains all of the ORFS.  If it is no 
known gene or product, it will have a defline such as this, which may 
help you filter them out:

>lcl|M15293_draft_0007|CDS|7202|6354 predicted cds

As for the rest of your questions -- can you tell what version you've 
got?  We can only support the latest version, which you can get here 
http://sourceforge.net/projects/cg-pipeline/files/.  We'll help you 
through the upgrade.  You might need a few new third-party applications, 
such as RNAmmer.  There will be a new paper written for these updates, 
but I can't say when.

There should be only one pipelinerc, in conf/.  Any other one is a 
mistake, sorry.

Best of luck,
Jay Humphrey

On 2/15/2011 5:58 PM, Matthew Scholz wrote:
>
> Hello,
>
> We have installed the pipeline locally, and I was wondering if you 
> could answer a couple of questions
>
> First of all, I have noticed two or three issues with the 
> annotation.gbk file.
>
> Firstly, the naming structure for the organism.  Is there a defined 
> structure that must be entered into the cgpipelinerc file to make the 
> name show up appropriately in the entries?  (it defaults to 
> nesseceria, and to change it for viruses, for example, I would like to 
> know what the structure is)
>
> Second, It is exceedingly difficult from first glance to determine if 
> an ORF has been assigned a function/homology, or is simply an 
> identified ORF.  Is there any way to modify the write-out to make this 
> distinction more obvious?
>
> Third, I have been attempting to find a way to have annotated genes 
> write out the functional name to the gbk file, rather than simply the 
> gene_id as it does now.
>
> Also, I was unclear as to why there are two cgpipelinerc files (one in 
> the conf directory and one in the lib directory).
>
> Thank you,
>
> ____________________________
>
> Matthew Scholz
>
> GRA-Los Alamos National Laboratory
>
> ms...@la...
>
> (505) 665-8574
>
>
> ------------------------------------------------------------------------------
> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
> Pinpoint memory and threading errors before they happen.
> Find and fix more than 250 security defects in the development cycle.
> Locate bottlenecks in serial and parallel code that limit performance.
> http://p.sf.net/sfu/intel-dev2devfeb
>
>
> _______________________________________________
> Cg-pipeline-users mailing list
> Cg-...@li...
> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users

-- 
Jay Humphrey
92 Debden Road
SAFFRON WALDEN
CB11 4AL
UNITED KINGDOM

[Cg-pipeline-users] pipeline question

From: Matthew S. <ms...@la...> - 2011-02-15 17:58:18

Hello,

 

We have installed the pipeline locally, and I was wondering if you could
answer a couple of questions

 

First of all, I have noticed two or three issues with the annotation.gbk
file.

 

Firstly, the naming structure for the organism.  Is there a defined
structure that must be entered into the cgpipelinerc file to make the name
show up appropriately in the entries?  (it defaults to nesseceria, and to
change it for viruses, for example, I would like to know what the structure
is)

 

Second, It is exceedingly difficult from first glance to determine if an ORF
has been assigned a function/homology, or is simply an identified ORF.  Is
there any way to modify the write-out to make this distinction more obvious?

 

Third, I have been attempting to find a way to have annotated genes write
out the functional name to the gbk file, rather than simply the gene_id as
it does now.

 

Also, I was unclear as to why there are two cgpipelinerc files (one in the
conf directory and one in the lib directory). 

 

Thank you,

 

____________________________

Matthew Scholz

GRA-Los Alamos National Laboratory

ms...@la...

(505) 665-8574

Flat | Threaded

<< < 1 2 3 (Page 3 of 3)

2011	Jan	Feb (6)	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec (8)
2013	Jan	Feb	Mar	Apr	May	Jun (12)	Jul (14)	Aug (9)	Sep (1)	Oct (2)	Nov	Dec
2014	Jan	Feb	Mar (2)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec