Menu

Walkthrough

Filip Wierzbicki

Introduction

Here, we show a walkthrough for Manna's multiple sequence alignment of transposable element (TE) annotations of the piRNA cluster 42AB of 5 D. melanogaster lines.

Requirements

Preparatory work

Setup directories

We store the fasta files and the TE library in dedicated directories and set the directory for the Repeatmasker output.

mkdir fasta
mv *.fasta fasta/.

mkdir resources
mv TE_dmel.fasta resources/.

mkdir rm

Annotating TEs with Repeatmasker

Using the following command we can run Repeatmasker on all fasta files in the current working directory.

cd fasta
for i in *fasta;do RepeatMasker -pa 20 -no_is -s -nolow -dir ../rm/ -lib ../resources/TE_dmel.fasta $i;done

After Repeatmasker completed its job, we need the Repeatmasker output files with the suffix 'fasta.out', which are stored in the rm directory.
These are the input files that are requrired for the Manna analysis.

Multiple sequence alignment

Using the following command we can obtain the multiple sequence alignment of the Repeatmasker annotations.

python manna-code/cluster-msa.py --gap 0.09 --mm 0.1 --match 0.2 --input-format repeatmasker --output-detail long --clusters "Iso1_1.fasta.out,Pi2_1.fasta.out,Canton-S_1.fasta.out,DGRP732_1.fasta.out,A4_1.fasta.out" --sample-IDs "Iso1,Pi2,CS,D732,A4" --cluster-ID "1" > 1.msa

This results in a detailed output file. The first 10 lines of output are shown below.

#Score: 13034.129999999988
#Samples    D732    A4  CS  Iso1    Pi2
#ClusterID  1
#TE-fam clu_start   length  div score   'te_strand:te_start:te_end
ROXELEMENT  487.0   235.0   25.5    964.0   '-:4115..4357   ROXELEMENT  487.0   235.0   25.5    964.0   '-:4115..4357   ROXELEMENT  487.0   235.0   25.5    964.0   '-:4115..4357   ROXELEMENT  487.0   235.0   25.5    964.0   '-:4115..4357   ROXELEMENT  487.0   235.0   25.5    964.0   '-:4115..4357
INE1    722.0   52.0    11.8    232.0   '-:2..45    INE1    722.0   52.0    11.8    232.0   '-:2..45    INE1    722.0   52.0    11.8    232.0   '-:2..45    INE1    722.0   52.0    11.8    232.0   '-:2..45    INE1    722.0   52.0    11.8    232.0   '-:2..45
INE1    966.0   86.0    12.0    382.0   '+:475..566 INE1    966.0   86.0    12.0    382.0   '+:475..566 INE1    966.0   86.0    12.0    382.0   '+:475..566 INE1    966.0   86.0    12.0    382.0   '+:475..566 INE1    966.0   86.0    12.0    382.0   '+:475..566
INE1    1049.0  118.0   21.4    415.0   '-:213..338 INE1    1049.0  118.0   21.4    415.0   '-:213..338 INE1    1049.0  118.0   21.4    415.0   '-:213..338 INE1    1049.0  118.0   21.4    415.0   '-:213..338 INE1    1049.0  118.0   21.4    415.0   '-:213..338
INE1    1101.0  123.0   18.2    460.0   '-:285..395 INE1    1101.0  123.0   18.2    460.0   '-:285..395 INE1    1101.0  123.0   18.2    460.0   '-:285..395 INE1    1101.0  123.0   18.2    460.0   '-:285..395 INE1    1101.0  123.0   18.2    460.0   '-:285..395
BS3 1273.0  170.0   4.7 1455.0  '+:472..641 BS3 1273.0  170.0   4.7 1455.0  '+:472..641 BS3 1273.0  170.0   4.1 1477.0  '+:472..641 BS3 1273.0  170.0   4.1 1477.0  '+:472..641 BS3 1273.0  170.0   4.1 1477.0  '+:472..641

The header consists of the first 4 lines staring with a '#':
line1: the alignment score of the multiple sequence alignment
line2: ordered sample IDs
line3: the clusterID
line4: information on the columns that are printed for each sample ID. In the 'long' output the TE-family name, the first position, the length, the divergence, the Smith-Waterman score of the annotation are reported. The last column reports the orientation of TE annotation (+/-) together with the first and last matching position in the TE sequence.

The header is followed by the resulting multiple sequence alignment in the order provided by the header.


Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.