Home / SampleCode
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2016-08-18 3.4 kB
aligngraph.py 2016-04-21 23.9 kB
aminoacid.py 2016-04-21 1.9 kB
hmmdagcon_pacbio.py 2016-04-21 31.0 kB
main_process.py 2016-04-21 3.6 kB
preprocess.py 2016-04-21 4.8 kB
process.py 2016-04-21 17.8 kB
Totals: 7 Items   86.5 kB 0

Frame-Pro

Frame-Pro is a profile homology search tool for PacBio reads. Frame-Pro corrects sequencing errors and also output the profile alignments of the corrected sequences against characterized protein families. The results of Frame-Pro showed that our method enables more sensitive homology search and corrects more errors compared to a popular error correction tool that does not rely on hybrid sequencing.

System Requirement

The pipeline of Frame-Pro need following package installed on the computer.

Blasr

Python2.7:

  • We suggest use Anaconda or other distribution package

HMMER3

Biopython

dna2pep

  • In the code we are using dna2pep for 6 frame translation, the calling function is included in the preprocess.py, you need to modify the path of your dna2pep code (Or you can calling your own tools but make sure the output is fasta and the sequence id is end as "_rframe1","_rframe2" and so on)

Program Description

To use Fram-Pro, you need a fasta file contains all the reads(including longer seed reads and other short reads). Noticed in Frame-Pro, we assume the sequence id is XXX_XXX, so you may need convert the id to this format to avoid error.

  • Before run Frame-Pro

    As introduced above, Frame-Pro needs to call Blasr and HMMER3, so you need to check if you can use the following command in your system, as the code will pass the commend to the system to run Blasr and HMMER3

    blasr seed_path short_path -bestn 200 -m 5 -o out_path
    hmmscan --domtblout domtbl_path -E 1000 hmm_path consensus_path
    
  • To run Frame-Pro:

    main_process.py hmm_file input_fasta output_fasta [-h] [--log LOG] [-H HMM_COEF] [-D DAG_COEF] [--network_ext NETWORK_EXT]
    
  • positional arguments:

    hmm_file                   hmm model file path
    input_fasta                sequence file path
    output_fasta               output fasta file path
    
  • optional arguments:

    -h, --help                  show this help message and exit
    --log LOG                   log file path
    -H HMM_COEF, --hmm_coef     HMM_COEF hmm score coefficient when go through network, default is 0.25
    -D DAG_COEF, --dag_coef     DAG_COEF consensus network score coefficient when go through network, default is 0.75
    --network_ext               NETWORK_EXT extend coefficient for the network of each hmm model, default is 3
    
  • This script will not output HMM alignments, if you would like to see the alignments, you can either run HMMER3 for the output sequences, or use the code under /SampleCode_Single. To run the code, you need to modify the file path in dagcon_io.py and then run:

    python dagcon_io.py
    

The output will contain the HMM alignment of forward direction and reverse direction. For both direction, there are 2 HMMER-format alignment, and the second one is the HMM alignment you need. Noticed this "single" program can only take one backbone sequence and one blasr m5 alignments.

Citation

Du, N. & Sun, Y.N. (2016). Improve homology search sensitivity of PacBio data by correcting frameshifts. Acceptted by ECCB 2016.

For questions, please contact:

Nan Du

Michigan State University, East Lansing, MI 48910

dunan [at] msu [dot] edu

Source: README.md, updated 2016-08-18