|
From: Don G. <gil...@bi...> - 2005-11-30 21:18:11
|
Scott,
Thanks much for the quick tryout. The preliminary configurations
are be critical; I've used ENV{GMOD_ROOT} as a base for that, and see
your system won't allow you to write there. In the top of each
primary configuration file (e.g. sample conf/bulkfiles/sgdbulk.xml
or your rice revision), find
<opt
name="sgdbulk"
relid="5"
date="20051129"
ROOT="${GMOD_ROOT}/"
TMP="${GMOD_ROOT}/tmp"
datadir="genomes/Saccharomyces_cerevisiae"
Change these ROOT,TMP,datadir to some paths that you want to
be written to. If you don't have GMOD_ROOT defined in environment,
it will use the GMODTools/ folder from the software, and should work
with the sample sgdlite lite data set.
One aspect I've not stressed well in the documents: proper configuration
for a given data release set is essential to get it working, and this
is an unusual program in that it need only be run once successfully for
such a data release set, then the generated bulk files can be used by all.
So expect to spend some time pondering the meaning of all those configuration
options which are lacking good documentation in order to get it working for
a new data set.
Once a data release set is configured to work, it should work repeatably (given
solution to things like a writable data root directory).
I'd recommend testing first with the sgdlite data set, and after getting that
to work, move on to a new data set.
I hope to add some pre-make validation checks before long that will help with
basic steps like "is your data output directory there?", "does your chado
genome db have chromosomes/golden_paths that can be found?", "does the
configured sql actually return data?" Then folks can save time running
it on big datasets and wondering if they will get usable outputs.
Take a look at $ROOT/$datadir/$releasedir/tmp/featdump/ (from your config values)
for a 'chromosomes.tsv', an essential first step. If that doesn't exist
and look valid for your organism's genome, the rest won't work.
- Don
|