Download Latest Version README (3.6 kB)
Email in envelope

Get an email when there's a new version of Tandem repeat finder parser

Home / trf_parser
Name Modified Size InfoDownloads / Week
Parent folder
README 2009-08-10 3.6 kB
trfparser_v1.pl 2009-08-10 3.5 kB
Totals: 2 Items   7.1 kB 0
TRFPARSER

trfparser is a perl program, of use in bioinformatics, which allows parsing the output from freely available program Tandem Repeat Finder.


REQUIREMENTS:

Perl should be installed, and the program MUST run from a unix shell, as system commands are part of the code. 


INPUT & OUTPUT:

Tandem repeat finder (trf) is a freely available program for finding tandem repetitions in nucleotide sequences. The program generates several output files, including

1) a .dat file, or data file in text format, created using -d option with trf. This file contains all the information concerning identified repeats, like start, end, repeat pattern, consensus, entropy etc.

2) a .txt.html file, which displays alignment of repeat pattern with the sequence. if -f flag is used with trf, this file will contain 500 bp sequence upstream and downstream of the repeat sequence. Such sequence is very useful to identify location of the repeat in the genome and/or primer designing for molecular biology experiments.

3) a .html file, which is same as .dat, except in html format

4) a .masked file, generated if -m option is used.

The .html and .masked files will not be used at any step by this parser. Also, it will be assumed that -f option is used so that .txt.html contains flanking sequence information, as that is the only situation when one will want to parse this file. Otherwise, only .dat file will be used.


Since trf generates a single .dat file for a given sequence, but multiple .txt.html files, a temporary file will be generated which will concatenate all available .txt.html files into a single .tmp file.



3 output files are generated by the program,

.dat.parse -> parsed dat file
.txt.parse -> parsed txt/html file
.final.parse -> all parsed information in a single file

NOTE: At present, the program uses input files and puts output file in the same directory. Path should be changed in the code if you need to use a different directory.

USAGE:

 trfparser datfilename flag_value{0 or 1}
 
 Enter 0 as flag_value if you need to use only dat file, and 1 if you want to extract information from both dat and txt/html file.


AVAILABLE FIELDS:

By default, following information will be included in the .final.parse file:

.dat file: Repeat Start, Repeat End, Period Size, Copy No., Alignment Score, Consensus
.txt.html file: Left Flanking Sequence, Right Flanking Sequence

However, you can always modify the code to display other information, following is a list of all available fields and description:

.dat file

$rep_start			:		Indices of the repeat relative to the start of the sequence
$rep_end 	
$period_size		:		Period size of the repeat
$copy_no			:		Number of copies aligned with the consensus pattern
$pattern_size		:		Size of consensus pattern (may differ slightly from the period size)
$percent_match	:		Percent of matches between adjacent copies overall
$percent_indel	:		Percent of indels between adjacent copies overall
$align_score		:		Alignment score
$a_percent			:		Percent composition for each of the four nucleotides
$c_percent
$g_percent
$t_percent
$entropy			:		Entropy measure based on percent composition
$consensus		:		Consensus sequence
$repeat				:		Repeat sequence

.txt.html file

$start					:		Indices of the repeat relative to the start of the sequence
$end			
$left_start			:		Indices of the Left flanking sequence relative to the start of the sequence
$left_end		
$right_start		:		Indices of the Right flanking sequence relative to the start of the sequence
$right_end		
$left_seq				:		Left flanking sequence
$right_seq			:		Right flanking sequence
Source: README, updated 2009-08-10