Name | Modified | Size | Downloads / Week |
---|---|---|---|
Parent folder | |||
README.md | 2016-08-18 | 3.4 kB | |
aligngraph.py | 2016-04-21 | 23.9 kB | |
aminoacid.py | 2016-04-21 | 1.9 kB | |
hmmdagcon_pacbio.py | 2016-04-21 | 31.0 kB | |
main_process.py | 2016-04-21 | 3.6 kB | |
preprocess.py | 2016-04-21 | 4.8 kB | |
process.py | 2016-04-21 | 17.8 kB | |
Totals: 7 Items | 86.5 kB | 0 |
Frame-Pro
Frame-Pro is a profile homology search tool for PacBio reads. Frame-Pro corrects sequencing errors and also output the profile alignments of the corrected sequences against characterized protein families. The results of Frame-Pro showed that our method enables more sensitive homology search and corrects more errors compared to a popular error correction tool that does not rely on hybrid sequencing.
System Requirement
The pipeline of Frame-Pro need following package installed on the computer.
Blasr
Python2.7:
- We suggest use Anaconda or other distribution package
HMMER3
Biopython
dna2pep
- In the code we are using dna2pep for 6 frame translation, the calling function is included in the preprocess.py, you need to modify the path of your dna2pep code (Or you can calling your own tools but make sure the output is fasta and the sequence id is end as "_rframe1","_rframe2" and so on)
Program Description
To use Fram-Pro, you need a fasta file contains all the reads(including longer seed reads and other short reads). Noticed in Frame-Pro, we assume the sequence id is XXX_XXX, so you may need convert the id to this format to avoid error.
-
Before run Frame-Pro
As introduced above, Frame-Pro needs to call Blasr and HMMER3, so you need to check if you can use the following command in your system, as the code will pass the commend to the system to run Blasr and HMMER3
blasr seed_path short_path -bestn 200 -m 5 -o out_path hmmscan --domtblout domtbl_path -E 1000 hmm_path consensus_path
-
To run Frame-Pro:
main_process.py hmm_file input_fasta output_fasta [-h] [--log LOG] [-H HMM_COEF] [-D DAG_COEF] [--network_ext NETWORK_EXT]
-
positional arguments:
hmm_file hmm model file path input_fasta sequence file path output_fasta output fasta file path
-
optional arguments:
-h, --help show this help message and exit --log LOG log file path -H HMM_COEF, --hmm_coef HMM_COEF hmm score coefficient when go through network, default is 0.25 -D DAG_COEF, --dag_coef DAG_COEF consensus network score coefficient when go through network, default is 0.75 --network_ext NETWORK_EXT extend coefficient for the network of each hmm model, default is 3
-
This script will not output HMM alignments, if you would like to see the alignments, you can either run HMMER3 for the output sequences, or use the code under /SampleCode_Single. To run the code, you need to modify the file path in dagcon_io.py and then run:
python dagcon_io.py
The output will contain the HMM alignment of forward direction and reverse direction. For both direction, there are 2 HMMER-format alignment, and the second one is the HMM alignment you need. Noticed this "single" program can only take one backbone sequence and one blasr m5 alignments.
Citation
Du, N. & Sun, Y.N. (2016). Improve homology search sensitivity of PacBio data by correcting frameshifts. Acceptted by ECCB 2016.
For questions, please contact:
Nan Du
Michigan State University, East Lansing, MI 48910
dunan [at] msu [dot] edu