From: Jay H. <jhu...@gm...> - 2011-02-15 23:17:26
|
Hi Matthew, We appreciate your interest in the project. I'll try to answer some of these issues. Organism name: http://wiki.jordan.biology.gatech.edu/index.php/CG-Pipeline#Configuration It's not structured, just a simple string assignment to the variable 'classification'. Good point about the genes versus ORFs. We've gone by the guidelines for GenBank submissions, and that is all. The pipeline writes all of the data that was generated into this file. Genes and their products are named based on UniProt hits. If an ORF gets no sufficient scoring hits, it gets no gene name and no product name. This definitely deserves documentation in the wiki, when we find the time! You can also find a FASTA file in the [project]/database/ directory (as of version 0.3.1; if you ran an older version, you should be able to upgrade and run 'run_pipeline makedb -p YourProject' to generate the FASTA and other files) which contains all of the ORFS. If it is no known gene or product, it will have a defline such as this, which may help you filter them out: >lcl|M15293_draft_0007|CDS|7202|6354 predicted cds As for the rest of your questions -- can you tell what version you've got? We can only support the latest version, which you can get here http://sourceforge.net/projects/cg-pipeline/files/. We'll help you through the upgrade. You might need a few new third-party applications, such as RNAmmer. There will be a new paper written for these updates, but I can't say when. There should be only one pipelinerc, in conf/. Any other one is a mistake, sorry. Best of luck, Jay Humphrey On 2/15/2011 5:58 PM, Matthew Scholz wrote: > > Hello, > > We have installed the pipeline locally, and I was wondering if you > could answer a couple of questions > > First of all, I have noticed two or three issues with the > annotation.gbk file. > > Firstly, the naming structure for the organism. Is there a defined > structure that must be entered into the cgpipelinerc file to make the > name show up appropriately in the entries? (it defaults to > nesseceria, and to change it for viruses, for example, I would like to > know what the structure is) > > Second, It is exceedingly difficult from first glance to determine if > an ORF has been assigned a function/homology, or is simply an > identified ORF. Is there any way to modify the write-out to make this > distinction more obvious? > > Third, I have been attempting to find a way to have annotated genes > write out the functional name to the gbk file, rather than simply the > gene_id as it does now. > > Also, I was unclear as to why there are two cgpipelinerc files (one in > the conf directory and one in the lib directory). > > Thank you, > > ____________________________ > > Matthew Scholz > > GRA-Los Alamos National Laboratory > > ms...@la... > > (505) 665-8574 > > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > > > _______________________________________________ > Cg-pipeline-users mailing list > Cg-...@li... > https://lists.sourceforge.net/lists/listinfo/cg-pipeline-users -- Jay Humphrey 92 Debden Road SAFFRON WALDEN CB11 4AL UNITED KINGDOM |