Re: [Cg-pipeline-users] pipeline question

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Matthew,
We appreciate your interest in the project. I'll try to answer some of 
these issues.

Organism name: 
http://wiki.jordan.biology.gatech.edu/index.php/CG-Pipeline#Configuration
It's not structured, just a simple string assignment to the variable 
'classification'.

Good point about the genes versus ORFs.  We've gone by the guidelines 
for GenBank submissions, and that is all.  The pipeline writes all of 
the data that was generated into this file.  Genes and their products 
are named based on UniProt hits.  If an ORF gets no sufficient scoring 
hits, it gets no gene name and no product name.  This definitely 
deserves documentation in the wiki, when we find the time!

You can also find a FASTA file in the [project]/database/ directory (as 
of version 0.3.1; if you ran an older version, you should be able to 
upgrade and run 'run_pipeline makedb -p YourProject' to generate the 
FASTA and other files) which contains all of the ORFS.  If it is no 
known gene or product, it will have a defline such as this, which may 
help you filter them out:

>lcl|M15293_draft_0007|CDS|7202|6354 predicted cds

As for the rest of your questions -- can you tell what version you've 
got?  We can only support the latest version, which you can get here 
http://sourceforge.net/projects/cg-pipeline/files/.  We'll help you 
through the upgrade.  You might need a few new third-party applications, 
such as RNAmmer.  There will be a new paper written for these updates, 
but I can't say when.

There should be only one pipelinerc, in conf/.  Any other one is a 
mistake, sorry.

Best of luck,
Jay Humphrey

On 2/15/2011 5:58 PM, Matthew Scholz wrote:
>
> Hello,
>
> We have installed the pipeline locally, and I was wondering if you 
> could answer a couple of questions
>
> First of all, I have noticed two or three issues with the 
> annotation.gbk file.
>
> Firstly, the naming structure for the organism.  Is there a defined 
> structure that must be entered into the cgpipelinerc file to make the 
> name show up appropriately in the entries?  (it defaults to 
> nesseceria, and to change it for viruses, for example, I would like to 
> know what the structure is)
>
> Second, It is exceedingly difficult from first glance to determine if 
> an ORF has been assigned a function/homology, or is simply an 
> identified ORF.  Is there any way to modify the write-out to make this 
> distinction more obvious?
>
> Third, I have been attempting to find a way to have annotated genes 
> write out the functional name to the gbk file, rather than simply the 
> gene_id as it does now.
>
> Also, I was unclear as to why there are two cgpipelinerc files (one in 
> the conf directory and one in the lib directory).
>
> Thank you,
>
> ____________________________
>
> Matthew Scholz
>
> GRA-Los Alamos National Laboratory
>
> ms...@la...
>
> (505) 665-8574
>
>
> ------------------------------------------------------------------------------
> The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
> Pinpoint memory and threading errors before they happen.
> Find and fix more than 250 security defects in the development cycle.
> Locate bottlenecks in serial and parallel code that limit performance.
> http://p.sf.net/sfu/intel-dev2devfeb
>
>
> _______________________________________________
> Cg-pipeline-users mailing list
> Cg-...@li...
> https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users

-- 
Jay Humphrey
92 Debden Road
SAFFRON WALDEN
CB11 4AL
UNITED KINGDOM

Re: [Cg-pipeline-users] pipeline question

A computational genomics pipeline for prokaryotic sequencing projects

Re: [Cg-pipeline-users] pipeline question