Recent changes to Manual

Manual modified by Robert Kofler

Robert Kofler — Tue, 30 Nov 2021 15:00:59 -0000

--- v6
+++ v7
@@ -60,7 +60,7 @@
 * Line2: sample-ids in the same order as the alignment
 * Line3: cluster-id
 * Line4: column header of the alignment: feature name
-* Line5 - LineN: the alignment in short format. Solely the TE name is shown for each of the samples. Here 3 samples were used. The order of the samples is the same as in Line2.
+* Line5 - LineN: the alignment in short format. Solely the TE name is shown for each samples. In this example 3 samples were used. The order of the samples is the same as in Line2.

 #####normal:
@@ -82,7 +82,8 @@
 * Line2: sample-ids in the same order as the alignment
 * Line3: cluster-id
 * Line4: column header of the alignment: feature name, length of feature (in bp), divergence from reference (in %)
-From line5 onwards the actual alignment is printed with the ordering shown in line4 (details) and line2 (samples).
+* Line5 - LineN: the alignment in normal format. For each sample the TE name, the length of the TE insertions and the divergence (see Line4) is shown. In this example 3 samples were used. The order of the samples is the same as in Line2. With 3 samples and 3 features (Line4) we have 9 rows in total (3 x 3).
+


 ######long:
@@ -104,4 +105,5 @@
 * Line2: sample-ids in the same order as the alignment
 * Line3: cluster-id
 * Line4: column header of the alignment: feature name, start position in query (bp, 1-based), length of feature (in bp), divergence from reference repeat (in %), Smith-Waterman score, orientation and position in reference repeat
-From line5 onwards the actual alignment is printed with the ordering shown in line4 (details) and line2 (samples).
+* Line5 - LineN: the alignment in full format, where all information used by Manna is shown. For each sample the TE name, the length of the TE insertions, the divergence etc (see Line4) is shown.  In this example 3 samples were used. The order of the samples is the same as in Line2. With 3 samples and 6 features (Line4) we have 18 rows in total (3 x 6).
+

Manual modified by Robert Kofler

Robert Kofler — Tue, 30 Nov 2021 14:56:37 -0000

--- v5
+++ v6
@@ -60,7 +60,7 @@
 * Line2: sample-ids in the same order as the alignment
 * Line3: cluster-id
 * Line4: column header of the alignment: feature name
-* > Line5 the alignment in short format. Solely the TE name is shown for each of the samples. Here 3 samples were used. The order of the samples is the same as in Line2.
+* Line5 - LineN: the alignment in short format. Solely the TE name is shown for each of the samples. Here 3 samples were used. The order of the samples is the same as in Line2.

 #####normal:

Manual modified by Robert Kofler

Robert Kofler — Tue, 30 Nov 2021 14:56:11 -0000

--- v4
+++ v5
@@ -60,7 +60,7 @@
 * Line2: sample-ids in the same order as the alignment
 * Line3: cluster-id
 * Line4: column header of the alignment: feature name
-From line5 onwards the actual alignment is printed with the ordering shown in line4 (details) and line2 (samples).
+* > Line5 the alignment in short format. Solely the TE name is shown for each of the samples. Here 3 samples were used. The order of the samples is the same as in Line2.

 #####normal:

Manual modified by Robert Kofler

Robert Kofler — Tue, 30 Nov 2021 14:54:08 -0000

--- v3
+++ v4
@@ -1,10 +1,10 @@
 #Introduction
-This is the manual for the manna script that allows to align multiple sequences of annotations (e.g. TE annotations of piRNA clusters). 
+This is the manual of Manna, a tool that allows to perform multiple alignments of (repeat) annotations. For example it may be used to align annotated TE insertions in piRNA clusters.

 ##Prerequisites and requirements
-The script requires input files for each sequence of annotations, ideally Repeatmasker output (suffix '.out'). 
-Alternatively,  a single column input (1 feature per line) will be accepted ('--input-format toy').
-To run manna python is required.
+The script requires a repeat annotation for each sequence of interest, ideally the Repeatmasker output should be used (suffix '.out'). 
+Alternatively,  to test the script with some simple toy examples a single column input (1 feature per line) will be accepted ('--input-format toy').
+Python is required for Manna.

 ##Installation
 Installation is recommended by using subversion.
@@ -16,7 +16,7 @@
 ~~~

 #Alignment with manna ('cluster-msa.py')
-In the example below, we use repeatmasker outputs of cluster1 from 3 samples to align them with the default parameters.
+In the example below, we use repeatmasker outputs for piRNA clusters 1 from three different samples (e.g. Drosophila strains 1, 2 and 3) and align them with the default parameters.

 ####example call:
 ~~~
@@ -35,7 +35,7 @@
 * --input-format: \[repeatmasker|toy] repeatmasker: prefix.out; toy:  1 column with 1 feature per line
 * --min-len: minimum length of feature to be considered, shorter features will be ignored
 * --cluster-ID: name of aligned sequence
-* --quick-rm: this parameter can be used if, instead of providing separate repeatmasker outputs, we can a concatenated annotation file (e.g. already in the fasta file before repeatmasking). Then, we provide the input file here (e.g. --quick-rm concatenated.fasta.out) but require empty argument for the following two parameters like this: --clusters "" --sample-IDs ""
+* --quick-rm: this is an advanced parameter, mainly for convenience.  Instead of providing separate repeatmasker outputs we may provide a single repeat masker file obtained when concatening the sequences from multiple samples (e.g. cluster 42AB for three drosophila strains) and performing the repeat annotation on this single file. In this case the strains are distinguished by a certain column in the out-file. When using this parameter the  two parameters --cluster and --sample-IDs need to be empty and provided like this: --clusters "" --sample-IDs ""

 ####example outputs:

Manual modified by Filip Wierzbicki

Filip Wierzbicki — Mon, 29 Nov 2021 10:14:04 -0000

--- v2
+++ v3
@@ -31,8 +31,8 @@
 * --mm: mismatch score (float)
 * --match: match score (float)
 * --max-div: maximum divergence of repeatmasker annotations to be considered (float), features with higher divergence will be ingnored
-* --output-detail: [short|normal|long]
-* --input-format: [repeatmasker|toy] repeatmasker: prefix.out; toy:  1 column with 1 feature per line
+* --output-detail: \[short|normal|long]
+* --input-format: \[repeatmasker|toy] repeatmasker: prefix.out; toy:  1 column with 1 feature per line
 * --min-len: minimum length of feature to be considered, shorter features will be ignored
 * --cluster-ID: name of aligned sequence
 * --quick-rm: this parameter can be used if, instead of providing separate repeatmasker outputs, we can a concatenated annotation file (e.g. already in the fasta file before repeatmasking). Then, we provide the input file here (e.g. --quick-rm concatenated.fasta.out) but require empty argument for the following two parameters like this: --clusters "" --sample-IDs ""

Manual modified by Filip Wierzbicki

Filip Wierzbicki — Mon, 29 Nov 2021 10:11:03 -0000

--- v1
+++ v2
@@ -26,20 +26,44 @@
 ####parameters:

 * --clusters: input files (comma separated)
-* --sample-IDs: names of samples (comma separated and same order as input files
+* --sample-IDs: names of samples (comma separated and same order as input files)
 * --gap: gap score (float)
 * --mm: mismatch score (float)
 * --match: match score (float)
 * --max-div: maximum divergence of repeatmasker annotations to be considered (float), features with higher divergence will be ingnored
-* --output-detail: short: only feature names; normal: name, length, divergence of feature; long: name, starting position in cluster,   length, divergence, RM-score, orientation and position in repeat
-* --input-format: repeatmasker or toy
+* --output-detail: [short|normal|long]
+* --input-format: [repeatmasker|toy] repeatmasker: prefix.out; toy:  1 column with 1 feature per line
 * --min-len: minimum length of feature to be considered, shorter features will be ignored
 * --cluster-ID: name of aligned sequence
 * --quick-rm: this parameter can be used if, instead of providing separate repeatmasker outputs, we can a concatenated annotation file (e.g. already in the fasta file before repeatmasking). Then, we provide the input file here (e.g. --quick-rm concatenated.fasta.out) but require empty argument for the following two parameters like this: --clusters "" --sample-IDs ""

-####example output:
+####example outputs:

-The first 10 lines of the final alignment that is written into 'cluster1.msa' looks like this:
+Depending on the 'output-detail' parameter, the first 10 lines of the final alignment that is written into 'cluster1.msa' looks like this:
+
+#####short:
+
+~~~
+#Score: 18098.060000000005
+#Samples   sample1 sample2 sample3
+#ClusterID cluster1
+#TE-fam
+ROXELEMENT ROXELEMENT  ROXELEMENT
+INE1   INE1    -
+INE1   INE1    INE1
+INE1   INE1    INE1
+INE1   INE1    INE1
+BS3    BS3 BS3
+~~~
+
+* Line1: total alignment score
+* Line2: sample-ids in the same order as the alignment
+* Line3: cluster-id
+* Line4: column header of the alignment: feature name
+From line5 onwards the actual alignment is printed with the ordering shown in line4 (details) and line2 (samples).
+
+
+#####normal:

 ~~~
 #Score: 18098.060000000005
@@ -54,12 +78,30 @@
 BS3    170.0   4.1 BS3 170.0   4.7 BS3 168.0   3.6
 ~~~

-####output:
 * Line1: total alignment score
 * Line2: sample-ids in the same order as the alignment
 * Line3: cluster-id
-* Line4: column header of the alignment
+* Line4: column header of the alignment: feature name, length of feature (in bp), divergence from reference (in %)
 From line5 onwards the actual alignment is printed with the ordering shown in line4 (details) and line2 (samples).


+######long:

+~~~
+#Score: 18098.060000000005
+#Samples   sample1 sample2 sample3
+#ClusterID cluster1
+#TE-fam    clu_start   length  div score   'te_strand:te_start:te_end
+ROXELEMENT 487.0   235.0   25.5    958.0   '-:4115..4357   ROXELEMENT  487.0   235.0   25.5    958.0   '-:4115..4357   ROXELEMENT  487.0   235.0   25.5    880.0   '-:4115..4357
+INE1   722.0   52.0    11.8    228.0   '-:2..45    INE1    722.0   52.0    11.8    228.0   '-:2..45    -   -   -   -   -   -
+INE1   982.0   70.0    14.7    390.0   '+:499..566 INE1    982.0   70.0    14.7    390.0   '+:499..566 INE1    982.0   70.0    14.7    367.0   '+:499..566
+INE1   1052.0  115.0   21.0    397.0   '-:213..335 INE1    1052.0  115.0   21.0    397.0   '-:213..335 INE1    1052.0  115.0   23.5    365.0   '-:213..335
+INE1   1101.0  123.0   18.2    448.0   '-:285..395 INE1    1101.0  123.0   18.2    448.0   '-:285..395 INE1    1101.0  123.0   18.2    417.0   '-:285..395
+BS3    1273.0  170.0   4.1 1447.0  '+:472..641 BS3 1273.0  170.0   4.7 1427.0  '+:472..641 BS3 1273.0  168.0   3.6 1377.0  '+:472..639
+~~~
+
+* Line1: total alignment score
+* Line2: sample-ids in the same order as the alignment
+* Line3: cluster-id
+* Line4: column header of the alignment: feature name, start position in query (bp, 1-based), length of feature (in bp), divergence from reference repeat (in %), Smith-Waterman score, orientation and position in reference repeat
+From line5 onwards the actual alignment is printed with the ordering shown in line4 (details) and line2 (samples).

Manual modified by Filip Wierzbicki

Filip Wierzbicki — Thu, 25 Nov 2021 12:13:16 -0000

Introduction

This is the manual for the manna script that allows to align multiple sequences of annotations (e.g. TE annotations of piRNA clusters).

Prerequisites and requirements

The script requires input files for each sequence of annotations, ideally Repeatmasker output (suffix '.out').
Alternatively, a single column input (1 feature per line) will be accepted ('--input-format toy').
To run manna python is required.

Installation

Installation is recommended by using subversion.
Go to a folder where you would like to install the tool and type the command provided at the code-tab https://sourceforge.net/p/manna/code

For example:

svn checkout https://svn.code.sf.net/p/manna/code/ manna-code

Alignment with manna ('cluster-msa.py')

In the example below, we use repeatmasker outputs of cluster1 from 3 samples to align them with the default parameters.

example call:

python manna-code/cluster-msa.py --clusters "sample1_cluster1.fasta.out,sample2_cluster1.fasta.out,sample3_cluster1.fasta.out" --sample-IDs "sample1,sample2,sample3" --cluster-ID "cluster1" > cluster1.msa

parameters:

--clusters: input files (comma separated)
--sample-IDs: names of samples (comma separated and same order as input files
--gap: gap score (float)
--mm: mismatch score (float)
--match: match score (float)
--max-div: maximum divergence of repeatmasker annotations to be considered (float), features with higher divergence will be ingnored
--output-detail: short: only feature names; normal: name, length, divergence of feature; long: name, starting position in cluster, length, divergence, RM-score, orientation and position in repeat
--input-format: repeatmasker or toy
--min-len: minimum length of feature to be considered, shorter features will be ignored
--cluster-ID: name of aligned sequence
--quick-rm: this parameter can be used if, instead of providing separate repeatmasker outputs, we can a concatenated annotation file (e.g. already in the fasta file before repeatmasking). Then, we provide the input file here (e.g. --quick-rm concatenated.fasta.out) but require empty argument for the following two parameters like this: --clusters "" --sample-IDs ""

example output:

The first 10 lines of the final alignment that is written into 'cluster1.msa' looks like this:

#Score: 18098.060000000005
#Samples    sample1 sample2 sample3
#ClusterID  cluster1
#TE-fam length  div
ROXELEMENT  235.0   25.5    ROXELEMENT  235.0   25.5    ROXELEMENT  235.0   25.5
INE1    52.0    11.8    INE1    52.0    11.8    -   -   -
INE1    70.0    14.7    INE1    70.0    14.7    INE1    70.0    14.7
INE1    115.0   21.0    INE1    115.0   21.0    INE1    115.0   23.5
INE1    123.0   18.2    INE1    123.0   18.2    INE1    123.0   18.2
BS3 170.0   4.1 BS3 170.0   4.7 BS3 168.0   3.6

output:

Line1: total alignment score
Line2: sample-ids in the same order as the alignment
Line3: cluster-id
Line4: column header of the alignment
From line5 onwards the actual alignment is printed with the ordering shown in line4 (details) and line2 (samples).