Menu

Project Data Flow

Ian Reid

The purpose of this article to explain the flow of project data through the application.
First, you should read about the [Application Layout] and [Users and Groups].

Submission of User Data

SnowyOwl Application requires the following files to be submitted by the user for each project (species):

  • RNA-Seq reads in FASTQ (text) format
  • Assembled genomic sequences in FASTA
  • Assembled transcript (from RNA-Seq reads above) in FASTA (optional)
  • Masked genomic sequences or genomic sequence segments to mask in FASTA (optional)

The files may be compressed with gzip.

Decompression

Files are decompress if necessary before any further processing.
Decompressed input files are saved in the root of the project directory: data/project/<project_id></project_id>

RNA-Seq Assembly and Generating Mapping Files

Before the data is passed to the SnowyOwl pipeline, RNA-Seq reads are assembled (if necessary) and the required mapping files are generated. External programs are used like Trinity and Tophat. These programs may be intensive computationally as well as disk-usage wise. These programs may fork child processes of their own. The output is saved under asm and map for the assembly and mapping, respectively. Both there directories are created in the project save directory: data/project/<project_id>.</project_id>

SnowyOwl Pipeline

Now that the input for SnowyOwl pipeline has been generated, snowyowl.pl is called on this data.
All output is generated in data/project/<project_id>/so. For more details consult the SnowyOwl pipeline documentation.</project_id>

Project Logs

Two logs are generated for each project.
The filenames are prefixed with 'soapp' and 'snowyowl', and contain the date and time the log was generated.
The former log contains messages from SnowyOwl Web Application, while the latter is the main log file generated by the SnowyOwl pipeline. On success the two log files are renamed (to exclude date) and moved to the project output directory: data/project/<project_id>/out if selected. On error, the log files are not moved or renamed.</project_id>

Final Results

After the pipleline is done processing (successful project completion), the results (only those selected by the user, at least accepted.gff3) are moved to the project output directory: data/project/<project_id>/out. On failure, the output directory will be empty, and intermediate files are not deleted.</project_id>

Notes

  • in the case of error during the processing of a project, the intermediate output files (including all logs) are not deleted for debugging purposes.
  • when processing of a project (re)starts, the "asm", "map", "so", and "out" directories are deleted.

Related

Wiki: Application Layout
Wiki: Home
Wiki: Users and Groups

MongoDB Logo MongoDB