Download Latest Version mitsu1.0.tar.gz (4.3 MB)
Email in envelope

Get an email when there's a new version of MITSU

Home
Name Modified Size InfoDownloads / Week
README.txt 2014-10-07 3.0 kB
mitsu1.0.tar.gz 2014-03-27 4.3 MB
Totals: 2 Items   4.3 MB 0
README

MITSU is a command line application for the discovery of transcription factor binding site (TFBS) motifs. MITSU is novel in combining a stochastic version of the Expectation-Maximisation algorithm with an improved approximation to the likelihood function which is unconstrained with regard to the distribution of motif occurrences within the input dataset.

---------------------------------------------------------------------
MITSU is run as follows:
MITSU min_w max_w rseeds cuts -qv [-o true_w occ_filename] filename

min_w : the minimum motif width to be tested
max_w : the maximum motif width to be tested; if this is different from min_w, the most likely motif width is predicted using the MCOIN heuristic.
rseeds : the number of random seeds to use; 100 usually gives reasonable results, but increasing this can improve the motif model
cuts : the maximum number of derived sequences to split each input sequence into. If the motif occurrences within the input dataset are known to be distributed according to the OOPS or ZOOPS model, a value of 1 is fine here. If not, this value may be increased.
-qv : output results in quiet (-q) or verbose (-v) mode. Note that a lot of output is produced in verbose mode, so quiet mode is recommended unless more detail is required.
-o [optional] : if the known occurrences are available, perform nucleotide and site-level classification on the predicted results.
true_w [optional] : the true width of the known occurrences.
occ_filename [optional] : a file containing a list of the known occurrences, in the form of sequence,position (one prediction to a line).
filename : a FASTA-formatted file containing the sequences to be tested.

---------------------------------------------------------------------
As an example, the crp folder contains the well-known CRP binding site dataset (CRP-dataset.txt) and a file containing a list of the known occurrences (CRP-dataset-occ.txt). Running MITSU with the following parameters should return a similar model as presented in the paper (dependent on random numbers chosen by the system): 
min_w = max_w = 22; rseeds = 100; cuts = 2.
./MITSU.sh 22 22 100 2 -q -o 22 crp/CRP-dataset-occ.txt crp/CRP-dataset.txt

---------------------------------------------------------------------
MITSU requires and is bundled with:
BioJava (Legacy) 1.8.1
Apache Commons Math 3.0

BioJava (Legacy) 1.8.1 is distributed under the GNU Lesser General Public License, version 2.1 (http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html, see http://biojava.org/wiki/BioJava:License)
Apache Commons Math 3.0 is distributed under the Apache License Version 2.0, January 2004 (http://www.apache.org/licenses/)

---------------------------------------------------------------------
If you use MITSU in your work, please cite our paper:
A. M. Kilpatrick, B. Ward & S. Aitken, Stochastic EM-based TFBS motif discovery with MITSU
Bioinformatics, 30(12):i310-i318, 2014

Alastair M. Kilpatrick (a.m.kilpatrick@sms.ed.ac.uk)
Source: README.txt, updated 2014-10-07