[diyg-dev-l] diya outputs and working directories

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Some thoughts on the directory structure for DIYA output..

As of now, diya outputs to two locations: inside the current working  
directory that diya was launched in (CWD), and inside the current time- 
stamped directory diya generated (DWD).  Inside of the CWD can be  
found most of the various byproduct files from glimmer  
(.predict, .icm, .longorfs, etc), trnascan, rnammer, etc etc etc....  
while inside of the DWD are time-stamped intermediary files both pre- 
parsed and post-parsed.  Of most interest from the output of diya are  
arguably the full genome sequence (.fna), final annotated genbank file  
(.gbk), and the translated amino acid sequences (.faa).

re: a convo Brian and I had, I think some modifications are in order.

1) possibly make toggle-able the directory and/or file time-stamping

2) maintain throughout the pipeline that 2009_XX_XX_XX-diya/MYSEQ.gbk  
is the most current(ly updated) annotated genbank file. This goes for  
other byproducts

3) symlink ./2009_XX_XX_XX-diya folder -> ./latest (or something  
similar)

4) symlink ./2009_XX_XX_XX-diya/MYSEQ.gbk -> ./MYSEQ.gbk

The reasons being that these products and byproducts need to, in many  
emerging cases, have machine predictable namings so that parser  
modules and scripts with multi-input requirements can be accommodated  
without re-designing the generic pipeline control flow for the  
configuration files.

Any thoughts / additions / subtractions on these suggestions?

-Andrew