Download Latest Version clustal-2-fasta_0.9.zip (4.8 kB)
Email in envelope

Get an email when there's a new version of H-mito

Home
Name Modified Size InfoDownloads / Week
mitoP.py 2011-03-04 13.2 kB
h-mito.py 2011-03-04 6.1 kB
README.txt 2011-03-04 4.6 kB
dicts_V11.py 2011-03-04 1.0 MB
go-h-mito.py 2010-11-04 448 Bytes
clustal-2-fasta_0.9.zip 2010-11-03 4.8 kB
Totals: 6 Items   1.0 MB 0
README,txt 
H-MITO_1.0

----------------------------------------

Authors: Maria Valentini, Matteo Floris, Enrico Pieroni
{maria, floris, ep}@crs4.it

This short note briefly describe a suite of scripts for mtDNA haplogrouping prediction 
based on PhyloTree.org. All these materials are an ongoing research project, so are
not perfectly user-friendly and some trimming and checks will be required. The suite will 
be frequently updated, so keep in touch.

Please cite us for any use of the software, a publication for reference should soon come.

----------------------------------------

H-MITO_1.0

As a user, there are essentially three possible situations:

1) you already have the mutation list(s) for you sequence(s)
2) you only have the alignement of your sequence(s) to the reference
3) you only have the sequence(s)

In case:
1) just run h-mito (A).
2) first extract the mutation list(s) by using mitoP, then run h-mito (A+B).
3) first align your sequence(s) to the reference, then extract the mutation list(s),
   and finally run h-mito. Notice that, in this case, the alignment will be done 
   automatically by clustalw2 and a visual check of it can be necessary (A+B+C).

----------------------------------------

A. RUNNING h-mito.py

Notice that you MUST use python version less or equal to 2.7.
An ancillary file is provided in the last version, dicts_V11.py, that should be copied in the same folder where h-mito.py is.

Very simple, from command shell just do:
>python h-mito.py -i "mutation list in txt" -n "FULL"

In case you have multiple mutation lists and you want to analyze all automatically,
you can use the script

>python go-h-mito.py

where you just need to define the filename changing the row
f = 'test20/mutListF.txt'
The file contains a mutation list in each line, and the starting word is just the 
sequence identifier, for instance:
JAP013421.1.fasta 73 150 263 ....

----------------------------------------

B. RUNNING mitoP.py

Simply run 
>python mitoP.py

Some little trimming is necessary:

[1] Change by hands the HOME directory to your working directory, simply changing the line:
HOME = '/Users/Shared/at_work/bioinformatics/mtDNA/mitoP/'

[2] Specify the directory containing the sequences aligned pair by pair in .fasta format:
DATA = HOME + 'test20/' 

[3] Change if you like the log file name and the mutation file names.

There are two mutation lists file in outputs:

mutList.txt
containing all mutations as observed by comparing the reference and the sequence aligned.

mutListF.txt
containing all the previous mutations filtered by eliminating the mutations
known to be not useful for haplogrouping prediction.

----------------------------------------

C. Running the suite clustal-2-fasta.zip
   Preparing data for mutation list extraction and haplogroup prediction

Directory structure required:

$PWD contains all the scripts in clusta-2-fasta.zip (including the file ENTER.txt)

$PWD/seq
contains the mtDNA sequences, each one in fasta format, as downloaded for instance from NCBI.
Type file must be .fasta

$PWD/ref
contains the reference sequence you want to use for alignment, in our case just 
gi|251831106|ref|NC_012920.1| Homo sapiens mitochondrion, complete genome
Type file must be .fasta and there must be only one reference sequence in the directory.

output as specified in [C1,C2,C3] will be saved in folders
$PWD/join
$PWD/aln
$PWD/fasta

C1. Execute from $PWD the script
join2files.x
to concatenate the ref seq to each sequences, pair by pair, to prepare the data for further 
clustalw multiple alignment. 
The resulting pair (ref, seq) will just get a J at the beginning and will be saved in $PWD/join

C2. Then execute from $PWD the script
clustalAll.x
to run clustalw on each pair (ref,seq). Results will be added the .aln file type termination, 
will be in clustalw FORMAT and saved in $PWD/aln

NT you must define inside the script the path where your clustalw2 executable lies.
Clustalw2 executable downloadable at http://www.clustal.org/

C3. Finally run
clustal2fasta.py
to convert .aln clustalw format to standard .fasta. Files will be saved in $PWD/fasta

IMPORTANT: notice that clustalw align sequences by using its own scoring scheme, while for 
haplogrouping purpose it is better to adopt a preferential 3' shift.

In some cases the mitoP.py script translate the 'wrong' alignment to the 'correct' one, 
thus providing the standard mutation code, but we have still to complete this part AND, 
in any way, PLS CHECK THE MUTATION LIST AND/OR THE ALIGNMENT BY VISUAL INSPECTION TO BE
SURE ABOUT THE RESULTS.
Source: README.txt, updated 2011-03-04