|
From: Andrew S. <and...@gm...> - 2009-07-08 23:33:21
|
Some thoughts on the directory structure for DIYA output.. As of now, diya outputs to two locations: inside the current working directory that diya was launched in (CWD), and inside the current time- stamped directory diya generated (DWD). Inside of the CWD can be found most of the various byproduct files from glimmer (.predict, .icm, .longorfs, etc), trnascan, rnammer, etc etc etc.... while inside of the DWD are time-stamped intermediary files both pre- parsed and post-parsed. Of most interest from the output of diya are arguably the full genome sequence (.fna), final annotated genbank file (.gbk), and the translated amino acid sequences (.faa). re: a convo Brian and I had, I think some modifications are in order. 1) possibly make toggle-able the directory and/or file time-stamping 2) maintain throughout the pipeline that 2009_XX_XX_XX-diya/MYSEQ.gbk is the most current(ly updated) annotated genbank file. This goes for other byproducts 3) symlink ./2009_XX_XX_XX-diya folder -> ./latest (or something similar) 4) symlink ./2009_XX_XX_XX-diya/MYSEQ.gbk -> ./MYSEQ.gbk The reasons being that these products and byproducts need to, in many emerging cases, have machine predictable namings so that parser modules and scripts with multi-input requirements can be accommodated without re-designing the generic pipeline control flow for the configuration files. Any thoughts / additions / subtractions on these suggestions? -Andrew |