Menu

Tree [0edb8d] master /
 History

HTTPS access


File Date Author Commit
 example 2014-02-08 Wazim MohammedIsmail Wazim MohammedIsmail [b3c3b8] Address code review
 scripts 2014-02-08 Wazim MohammedIsmail Wazim MohammedIsmail [b3c3b8] Address code review
 train 2014-01-28 Wazim MohammedIsmail Wazim MohammedIsmail [2c14bd] Initial commit
 FGS_gff.py 2014-01-28 Wazim MohammedIsmail Wazim MohammedIsmail [2c14bd] Initial commit
 FragGeneScan 2014-02-08 Wazim MohammedIsmail Wazim MohammedIsmail [b3c3b8] Address code review
 Makefile 2014-01-28 Wazim MohammedIsmail Wazim MohammedIsmail [2c14bd] Initial commit
 README 2014-02-25 Wazim MohammedIsmail Wazim MohammedIsmail [0edb8d] new changes
 hmm.h 2014-01-28 Wazim MohammedIsmail Wazim MohammedIsmail [2c14bd] Initial commit
 hmm_lib.c 2014-02-08 Wazim MohammedIsmail Wazim MohammedIsmail [b3c3b8] Address code review
 hmm_lib.o 2014-02-08 Wazim MohammedIsmail Wazim MohammedIsmail [b3c3b8] Address code review
 post_process.pl 2014-02-25 Wazim MohammedIsmail Wazim MohammedIsmail [0edb8d] new changes
 processFragOut.py 2014-01-28 Wazim MohammedIsmail Wazim MohammedIsmail [2c14bd] Initial commit
 run_TransGeneScan.pl 2014-02-25 Wazim MohammedIsmail Wazim MohammedIsmail [0edb8d] new changes
 run_hmm.c 2014-01-28 Wazim MohammedIsmail Wazim MohammedIsmail [2c14bd] Initial commit
 run_hmm.o 2014-02-08 Wazim MohammedIsmail Wazim MohammedIsmail [b3c3b8] Address code review
 util_lib.c 2014-02-08 Wazim MohammedIsmail Wazim MohammedIsmail [b3c3b8] Address code review
 util_lib.h 2014-01-28 Wazim MohammedIsmail Wazim MohammedIsmail [2c14bd] Initial commit
 util_lib.o 2014-02-08 Wazim MohammedIsmail Wazim MohammedIsmail [b3c3b8] Address code review

Read Me

Installation
=============
To install TransGeneScan, please follow the steps below:

1. Untar the downloaded file "TransGeneScan.tar.gz". This will automatically generate the directory "TransGeneScan".

2. Make sure that you also have a C compiler such as "gcc" and perl interpreter.

3. Run "makefile" to compile and build excutable
	make clean
	make fgs


Running the program
====================
1.  To run TransGeneScan, 

./run_TransGeneScan.pl -in=[seq_file_name] -out=[output_file_name]

[seq_file_name]: sequence file name including the full path
[output_file_name]: output file name including the full path


Assembly of Transcripts
=======================
1. To assemble transcripts based on read mappings onto a single reference genome,

./scripts/pipeline.sh [reference_file] [reads_prefix] [TGSHome] [n] [k] [t]

[reference_file]: reference sequence file including full path
[reads_prefix]: paired-end reads prefix (not including _1.fastq, _2.fastq) including full path. The suffixes, _1.fastq and _2.fastq, are added within the script. Please make sure the files are named appropriately. 
[TGSHome]: Full path of TransGeneScan home directory
[n],[k],[t]: These are bwa parameters (please see bwa documentation for more information). The values used for testing were 4,4,4

Source files included
=====================
1. run_hmm.c, util_lib.c, util_lib.h, hmm.h, hmm_lib.c
These files contain the main Hidden Markov Model (HMM) framework of the prediction system. Most of the code is re-used from FragGeneScan as is. 

2. run_TransGeneScan.pl
This script is the main front end for the user to call the program for prediction. (See "Running the program" above)

3. post_process.pl
This script is part of the original FragGeneScan which makes corrections in the position of start codon based on a prediction model (See reference for more details). This code is re-used as is, in TransGeneScan.

4. FGS_gff.py
This script converts the TransGeneScan output format (which is the same as FragGeneScan output format) into gff format. 

5. processFragOut.py
This script is used to output predictions on sense transcripts and antisense transcripts as separate files.

6. train/*
These files include the training parameters used by the HMM. 

7. scripts/*
These scripts are used to do the assembly of transcripts based on read mappings (See "Assembly of Transcripts" above). 

Sample files included
=====================
1. example/transcripts.fasta
This file is the transcript assembly output produced by running scripts/pipeline.sh using paired-end reads downloaded from Short Reads Archive (SRR442380) mapped on to E.coli (NC_000913) as reference.

2. example/TGSout.out, TGSout.ffn, TGSout.faa, TGSout.gff
Prediction output from TransGeneScan in FGS output format (see below), nucleic acid fasta format, amino acid fasta format and gff format.

3. example/TGSout.sn
Prediction output from TransGeneScan containing only sense transcripts in FGS output format. 

4. example/TGSout.as
Prediction output from TransGeneScan containing only antisense transcripts. Since each entire transcript is an antisense transcript, no start/stop ranges are specified.


FGS output format
=================
This format lists the coordinates of putative genes. This file consists of five columns (start position, end position, strand, frame, score).  For example,

>ftranscript:1741:5049
217     1059    +       1       1.297925        I:      D:
1061    1993    +       2       1.310458        I:      D:
1994    3280    +       2       1.289984        I:      D:
>ftranscript:6551:6792
1       242     -       3       1.304437        I:      D:


Reference
=========
Rho, M., Tang, H., Ye, Y.: Fraggenescan: predicting genes in short and error-prone reads. Nucleic acids research 38(20), 191-191 (2010)

License
============
Copyright (C) 2013 Wazim Mohammed Ismail, Yuzhen Ye and Haixu Tang.
You may redistribute this software under the terms of the GNU General Public License.
MongoDB Logo MongoDB