Menu

validation_msms

Robert Kofler

Introduction

In order to generate more realistic haplotpyes for the population samples we simulate a neutral coalesecent with msms.
In addition to the segregating SNPs we are introduce a number of fixed insertions at random positions.

Material and Methods

Generating a neutral SFS with msms

We used the following command to perform neutral coalescent simulations with msms https://www.mabs.at/ewing/msms/index.shtml.

[0,1685]rokofler%java -jar msms.jar 5 1 -t 50

We solely use output of the haplotypes

11100101010101010011001001001010000100000000001100000011010001011000011100011101000100101001101010
11011111000001110000001000110011000001110000010100001101000001011001011111011000110100101010100000
00000000101010001100010100000100111010001100100010010000101110100000000000100010000001010100010101
11100101000001010000001010000010000100000011000100100011000001011000111100011100001110101001101010
11100101000001010000101000000010000100000000000101000011000001011110111100011100000110101001101010

Note that for our purposes 1 indicates the presence of a TE and 0 the absence.

Translating the msms to a pgd

To use the haplotypes simulated by msms with SimulaTE we need to translate the haplotypes into a gdf file. Furthermore we may introduce a variable number of fixed insertions. Note that this appraoch retains the haplotype information (and not just the allele frequency).

The pgd-template can be found here https://sourceforge.net/projects/manna/files/validation/template.pgd
The other files are availbe in the folder https://sourceforge.net/projects/manna/files/validation/val3-msms/

python pgdg-msms.py --msms states.txt --fixed 100 --template template.pgd > coal.pgd

We have 100 fixed TE insertions.

The coal.pgd may now for example look like

....
424002 * * 57 * *
424321 73 73 73 73 73
428975 24 * * 24 24
430151 * * 2 * *
432074 * * 40 * *
434002 99 99 99 99 99
438997 21 21 21 21 21
443945 12 * * 12 12
444237 * 48 * * *
450511 25 25 25 25 25
451394 101 101 101 101 101
452356 120 120 120 120 120
455739 11 11 * 11 11
456264 * * 101 * *
462011 79 79 79 79 79
464137 * * 35 * *
468605 78 78 * 78 78
477569 * * * 98 *
...

Note the presence of both, fixed and segregating insertions. The segregating insertions follow the haplotypes simulated with msms.

Next we generate the fasta sequences for the 5 haplotpyes from the pgd file

python ~/dev/simulates/build-population-genome.py --chassis ../chasis.txt --te-seqs ../teseq.fasta --pgd coal.pgd --output coal.fasta

We run RepeatMasker

RepeatMasker --frag 2000000 -pa 5 -no_is -s -nolow -dir . -lib ../teseq-clean-ml100noS4.fasta coal.fasta &    

and perform a multiple alignment with Manna

python ~/dev/manna/cluster-msa.py --clusters "" --sample-IDs "" --quick-rm coal.fasta.out > coal.manna

Results

In this complex scenario it is difficult to analyse the output automatically, e.g. with a script. In fact we would propably need an alignemt algorithm for comparing the observed and the expected output. Therefore we opted to show both alignment next to each other, which allows to intuitively compare the expected and the observed alignment.

Each row is a TE insertion. The first 6 columns are the expected alignment (PGD; with stars indicating absent insertions; the very first column is the position in the chassis). The last five columns are the observed alignment (Manna; with dashes indicating the absence of TEs). The second row indicates the sample ID. Note that the ordering of the samples changes during a progressive alignment (the most closely related samples are aligned first. To enhance visibility we suggest to copy these results into Excel or Google Sheets.

EXPECTED (PGD)                              OBSERVED (manna)                
SampleID    hg1 hg2 hg3 hg4 hg5         hg4 hg5 hg1 hg2 hg3
3022    TRANSIB4    TRANSIB4    TRANSIB4    TRANSIB4    TRANSIB4            TRANSIB4    TRANSIB4    TRANSIB4    TRANSIB4    TRANSIB4
4151    *   *   G3  *   *           -   -   -   -   G3
7184    FW3 FW3 FW3 FW3 FW3         FW3 FW3 FW3 FW3 FW3
9552    DMIS297 DMIS297 DMIS297 DMIS297 DMIS297         DMIS297 DMIS297 DMIS297 DMIS297 DMIS297
9598    DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA            DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA
9924    IVK IVK IVK IVK IVK         IVK IVK IVK IVK IVK
10976   GYPSY11 GYPSY11 GYPSY11 GYPSY11 GYPSY11         GYPSY11 GYPSY11 GYPSY11 GYPSY11 GYPSY11
11817   M14653  M14653  M14653  M14653  M14653          M14653  M14653  M14653  M14653  M14653
11980   *   INE1    *   *   *           -   -   -   -   FROGGER
17742   *   *   FROGGER *   *           -   -   -   -   DMDM11
18360   *   INVADER *   *   *           -   -   -   INE1    -
21731   *   ACCORD  *   *   *           -   -   -   INVADER -
22874   *   *   DMDM11  *   *           -   -   -   ACCORD  -
23214   *   INVADER2    *   *   *           -   -   -   INVADER2    -
25366   ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR            ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR
25535   *   *   GTWIN   *   *           -   -   -   -   GTWIN
25988   DME487856   DME487856   *   DME487856   DME487856           DME487856   DME487856   DME487856   DME487856   -
26227   GYPSY5  *   *   *   *           -   -   GYPSY5  -   -
26595   STALKER3    STALKER3    *   STALKER3    STALKER3            STALKER3    STALKER3    STALKER3    STALKER3    -
30233   DMCR1A  DMCR1A  DMCR1A  DMCR1A  DMCR1A          DMCR1A  DMCR1A  DMCR1A  DMCR1A  DMCR1A
30875   DMGYPF1A    *   *   DMGYPF1A    DMGYPF1A            -   -   -   DMU89994    -
31608   *   DMU89994    *   *   *           DMGYPF1A    DMGYPF1A    DMGYPF1A    -   -
34516   AF222049    AF222049    AF222049    AF222049    AF222049            AF222049    AF222049    AF222049    AF222049    AF222049
41129   DOC5    DOC5    DOC5    DOC5    DOC5            DOC5    DOC5    DOC5    DOC5    DOC5
44026   DME278684   DME278684   DME278684   DME278684   DME278684           DME278684   DME278684   DME278684   DME278684   DME278684
44457   *   *   AF418572    *   *           -   -   -   -   AF418572
47122   ROVER   ROVER   ROVER   ROVER   ROVER           ROVER   ROVER   ROVER   ROVER   ROVER
47916   JUAN    JUAN    *   JUAN    JUAN            JUAN    JUAN    JUAN    JUAN    -
49098   AF418572    AF418572    *   AF418572    AF418572            AF418572    AF418572    AF418572    AF418572    -
49717   M14653  M14653  M14653  M14653  M14653          M14653  M14653  M14653  M14653  M14653
53328   TABOR   TABOR   TABOR   TABOR   TABOR           TABOR   TABOR   TABOR   TABOR   TABOR
53788   SPRINGER    SPRINGER    SPRINGER    SPRINGER    SPRINGER            SPRINGER    SPRINGER    SPRINGER    SPRINGER    SPRINGER
65165   ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR            ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR
69574   DIVER2  DIVER2  DIVER2  DIVER2  DIVER2          DIVER2  DIVER2  DIVER2  DIVER2  DIVER2
74728   DM33463 DM33463 DM33463 DM33463 DM33463         DM33463 DM33463 DM33463 DM33463 DM33463
78299   HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM            HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM
78465   Tinker  Tinker  *   Tinker  Tinker          Tinker  Tinker  Tinker  Tinker  -
84662   *   INVADER4    *   *   *           -   -   -   INVADER4    -
86937   McCLINTOCK  McCLINTOCK  *   McCLINTOCK  McCLINTOCK          McCLINTOCK  McCLINTOCK  McCLINTOCK  McCLINTOCK  -
88009   DMIS176 DMIS176 DMIS176 DMIS176 DMIS176         DMIS176 DMIS176 DMIS176 DMIS176 DMIS176
89407   *   *   BAGGINS *   *           -   -   -   -   BAGGINS
91526   IVK IVK IVK IVK IVK         IVK IVK IVK IVK IVK
92638   *   *   GYPSY3  *   *           -   -   -   -   GYPSY3
94029   IVK IVK IVK IVK IVK         IVK IVK IVK IVK IVK
94696   STALKER2    STALKER2    STALKER2    STALKER2    STALKER2            STALKER2    STALKER2    STALKER2    STALKER2    STALKER2
97831   BLOOD   BLOOD   BLOOD   BLOOD   BLOOD           BLOOD   BLOOD   BLOOD   BLOOD   BLOOD
100036  *   GYPSY12 *   *   *           -   -   -   -   DME278684
102727  *   *   DME278684   *   *           -   -   -   GYPSY12 -
103807  GYPSY8  GYPSY8  GYPSY8  GYPSY8  GYPSY8          GYPSY8  GYPSY8  GYPSY8  GYPSY8  GYPSY8
104038  STALKER3    STALKER3    STALKER3    STALKER3    STALKER3            STALKER3    STALKER3    STALKER3    STALKER3    STALKER3
105946  DOC3    DOC3    DOC3    DOC3    DOC3            DOC3    DOC3    DOC3    DOC3    DOC3
108834  *   *   DMBLPP  *   *           -   -   -   -   DMBLPP
109415  DMBARI1 *   *   *   *           -   -   DMBARI1 -   -
111704  S2  S2  S2  S2  S2          S2  S2  S2  S2  S2
112092  OPUS    OPUS    OPUS    OPUS    OPUS            OPUS    OPUS    OPUS    OPUS    OPUS
115183  *   GYPSY4  *   *   *           -   -   -   GYPSY4  -
119099  *   DOC5    *   *   *           -   -   -   DOC5    -
123241  GYPSY10 GYPSY10 GYPSY10 GYPSY10 GYPSY10         GYPSY10 GYPSY10 GYPSY10 GYPSY10 GYPSY10
133622  *   G6_DM   *   *   *           -   -   -   -   DMTNFB
137370  *   *   *   *   DMLINEJA            -   -   -   G6_DM   -
144341  *   GYPSY8  *   *   *           -   -   -   GYPSY8  -
145077  *   *   DMTNFB  *   *           -   DMLINEJA    -   -   -
145727  *   *   *   McCLINTOCK  *           McCLINTOCK  -   -   -   -
150146  RT1C    RT1C    RT1C    RT1C    RT1C            RT1C    RT1C    RT1C    RT1C    RT1C
154053  TC3 TC3 TC3 TC3 TC3         TC3 TC3 TC3 TC3 TC3
154627  DMW1DOC DMW1DOC *   DMW1DOC DMW1DOC         DMW1DOC DMW1DOC DMW1DOC DMW1DOC -
159086  TRANSIB1    TRANSIB1    TRANSIB1    TRANSIB1    TRANSIB1            TRANSIB1    TRANSIB1    TRANSIB1    TRANSIB1    TRANSIB1
164031  DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA            DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA
168879  *   *   DOC3    *   *           -   -   -   -   DOC3
169092  S2  *   *   S2  S2          S2  S2  S2  -   -
173398  TABOR   TABOR   TABOR   TABOR   TABOR           TABOR   TABOR   TABOR   TABOR   TABOR
173768  LOOPER1_DM  *   *   *   *           -   -   LOOPER1_DM  -   -
176426  HOPPER2 HOPPER2 HOPPER2 HOPPER2 HOPPER2         HOPPER2 HOPPER2 HOPPER2 HOPPER2 HOPPER2
180767  *   GYPSY7  *   *   *           -   -   -   GYPSY7  -
181012  DME010298   DME010298   DME010298   DME010298   DME010298           DME010298   DME010298   DME010298   DME010298   DME010298
181956  DMTHB1  DMTHB1  DMTHB1  DMTHB1  DMTHB1          DMTHB1  DMTHB1  DMTHB1  DMTHB1  DMTHB1
183960  FB  FB  FB  FB  FB          FB  FB  FB  FB  FB
186230  HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM            HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM
191147  DMBLPP  DMBLPP  DMBLPP  DMBLPP  DMBLPP          DMBLPP  DMBLPP  DMBLPP  DMBLPP  DMBLPP
191740  *   G4_DM   *   *   *           -   -   -   -   DMIS176
193790  *   INVADER5    *   *   *           -   -   -   G4_DM   -
196046  *   *   DMIS176 *   *           -   -   -   INVADER5    -
197919  INVADER2    INVADER2    INVADER2    INVADER2    INVADER2            INVADER2    INVADER2    INVADER2    INVADER2    INVADER2
198496  STALKER3    STALKER3    *   STALKER3    STALKER3            STALKER3    STALKER3    STALKER3    STALKER3    -
201078  DME9736 DME9736 DME9736 DME9736 DME9736         DME9736 DME9736 DME9736 DME9736 DME9736
201731  DME487856   *   *   DME487856   DME487856           -   -   -   -   G5_DM
202402  *   *   G5_DM   *   *           -   -   -   GYPSY8  -
203348  *   GYPSY8  *   *   *           DME487856   DME487856   DME487856   -   -
214859  *   *   *   DMIS176 *           DMIS176 -   -   -   -
216305  DMHFL1  DMHFL1  DMHFL1  DMHFL1  DMHFL1          DMHFL1  DMHFL1  DMHFL1  DMHFL1  DMHFL1
216313  *   *   AF222049    *   *           -   -   -   -   AF222049
223151  DMRER1DM    DMRER1DM    DMRER1DM    DMRER1DM    DMRER1DM            DMRER1DM    DMRER1DM    DMRER1DM    DMRER1DM    DMRER1DM
223522  TC3 TC3 *   TC3 TC3         TC3 TC3 TC3 TC3 -
225542  DMHFL1  DMHFL1  DMHFL1  DMHFL1  DMHFL1          DMHFL1  DMHFL1  DMHFL1  DMHFL1  DMHFL1
233734  Beagle  Beagle  *   Beagle  Beagle          Beagle  Beagle  Beagle  Beagle  -
241049  DMCR1A  DMCR1A  DMCR1A  DMCR1A  DMCR1A          DMCR1A  DMCR1A  DMCR1A  DMCR1A  DMCR1A
242756  QUASIMODO   QUASIMODO   QUASIMODO   QUASIMODO   QUASIMODO           QUASIMODO   QUASIMODO   QUASIMODO   QUASIMODO   QUASIMODO
243894  TC3 TC3 TC3 TC3 TC3         TC3 TC3 TC3 TC3 TC3
244296  DMZAM   DMZAM   *   DMZAM   DMZAM           DMZAM   DMZAM   DMZAM   DMZAM   -
247346  *   *   TABOR   *   *           -   -   -   -   TABOR
249407  Tinker  Tinker  Tinker  Tinker  Tinker          Tinker  Tinker  Tinker  Tinker  Tinker
249569  HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM            HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM
249819  *   *   *   *   DMRTMGD1            -   DMRTMGD1    -   -   -
252516  DM_ROO  DM_ROO  DM_ROO  DM_ROO  DM_ROO          DM_ROO  DM_ROO  DM_ROO  DM_ROO  DM_ROO
253816  *   *   TRANSIB3    *   *           -   -   -   -   TRANSIB3
255576  DMDM11  DMDM11  DMDM11  DMDM11  DMDM11          DMDM11  DMDM11  DMDM11  DMDM11  DMDM11
258801  *   *   ACCORD  *   *           -   -   -   -   ACCORD
261353  *   *   *   *   HOPPER2         -   -   -   -   STALKER2
262707  LOOPER1_DM  *   *   *   *           -   -   -   DMTN1731    -
266616  *   DMTN1731    *   *   *           -   -   LOOPER1_DM  -   -
267398  *   *   STALKER2    *   *           -   HOPPER2 -   -   -
269657  DME010298   DME010298   *   DME010298   DME010298           DME010298   DME010298   DME010298   DME010298   -
271462  McCLINTOCK  McCLINTOCK  *   McCLINTOCK  McCLINTOCK          McCLINTOCK  McCLINTOCK  McCLINTOCK  McCLINTOCK  -
287721  *   GYPSY7  *   *   *           -   -   -   GYPSY7  -
289621  *   *   DMIS297 *   *           -   -   -   -   DMIS297
296869  DME010298   DME010298   DME010298   DME010298   DME010298           DME010298   DME010298   DME010298   DME010298   DME010298
300271  *   M14653  *   *   *           -   -   -   M14653  -
300501  GYPSY7  GYPSY7  GYPSY7  GYPSY7  GYPSY7          GYPSY7  GYPSY7  GYPSY7  GYPSY7  GYPSY7
301318  HEL HEL HEL HEL HEL         HEL HEL HEL HEL HEL
303431  *   *   TRANSIB1    *   *           -   -   -   -   TRANSIB1
305874  *   *   *   G6_DM   G6_DM           G6_DM   G6_DM   -   -   -
306279  DMBLPP  DMBLPP  DMBLPP  DMBLPP  DMBLPP          DMBLPP  DMBLPP  DMBLPP  DMBLPP  DMBLPP
312690  DMTHB1  DMTHB1  *   DMTHB1  DMTHB1          DMTHB1  DMTHB1  DMTHB1  DMTHB1  -
315761  INVADER5    INVADER5    INVADER5    INVADER5    INVADER5            INVADER5    INVADER5    INVADER5    INVADER5    INVADER5
318747  *   *   DMIFACA *   *           -   -   -   -   DMIFACA
323746  TC1 TC1 TC1 TC1 TC1         TC1 TC1 TC1 TC1 TC1
325137  *   *   DMCOPIA *   *           -   -   -   -   DMCOPIA
325231  S2  S2  S2  S2  S2          S2  S2  S2  S2  S2
328443  GYPSY12 GYPSY12 GYPSY12 GYPSY12 GYPSY12         GYPSY12 GYPSY12 GYPSY12 GYPSY12 GYPSY12
330136  AF541951    AF541951    AF541951    AF541951    AF541951            AF541951    AF541951    AF541951    AF541951    AF541951
330876  AF222049    AF222049    AF222049    AF222049    AF222049            AF222049    AF222049    AF222049    AF222049    AF222049
342517  DME9736 *   *   *   *           -   -   DME9736 -   -
346889  DMDM11  DMDM11  DMDM11  DMDM11  DMDM11          DMDM11  DMDM11  DMDM11  DMDM11  DMDM11
351166  DMRTMGD1    DMRTMGD1    DMRTMGD1    DMRTMGD1    DMRTMGD1            DMRTMGD1    DMRTMGD1    DMRTMGD1    DMRTMGD1    DMRTMGD1
357098  *   *   *   RT1C    RT1C            RT1C    RT1C    -   -   -
362583  INVADER6    INVADER6    INVADER6    INVADER6    INVADER6            INVADER6    INVADER6    INVADER6    INVADER6    INVADER6
363111  *   GYPSY9  *   *   *           -   -   -   GYPSY9  -
375153  *   *   *   S2  *           S2  -   -   -   -
376951  DMREPG  DMREPG  DMREPG  DMREPG  DMREPG          DMREPG  DMREPG  DMREPG  DMREPG  DMREPG
379081  JUAN    JUAN    *   JUAN    JUAN            JUAN    JUAN    JUAN    JUAN    -
379151  G5A G5A G5A G5A G5A         G5A G5A G5A G5A G5A
382935  HOPPER2 HOPPER2 HOPPER2 HOPPER2 HOPPER2         HOPPER2 HOPPER2 HOPPER2 HOPPER2 HOPPER2
388126  McCLINTOCK  McCLINTOCK  McCLINTOCK  McCLINTOCK  McCLINTOCK          McCLINTOCK  McCLINTOCK  McCLINTOCK  McCLINTOCK  -
398825  McCLINTOCK  McCLINTOCK  *   McCLINTOCK  McCLINTOCK          McCLINTOCK  McCLINTOCK  McCLINTOCK  McCLINTOCK  McCLINTOCK
404944  GYPSY11 *   *   *   *           -   -   GYPSY11 -   -
406993  INVADER3    INVADER3    INVADER3    INVADER3    INVADER3            INVADER3    INVADER3    INVADER3    INVADER3    INVADER3
407458  ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR            ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR    ROOA_LTR
408777  GYPSY4  GYPSY4  GYPSY4  GYPSY4  GYPSY4          GYPSY4  GYPSY4  GYPSY4  GYPSY4  GYPSY4
411977  *   *   INVADER2    *   *           -   -   -   -   INVADER2
417455  HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM            HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM    HELITRON1_DM
420126  DM33463 DM33463 DM33463 DM33463 DM33463         DM33463 DM33463 DM33463 DM33463 DM33463
420518  *   *   DMRER1DM    *   *           -   -   -   -   DMRER1DM
427586  FW3 *   *   *   *           -   -   -   -   TC1-2
428634  HOPPER2 *   *   *   *           -   -   FW3 -   -
429999  *   *   TC1-2   *   *           -   -   HOPPER2 -   -
433192  DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11           DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11
433959  DMCOPIA *   *   DMCOPIA DMCOPIA         DMCOPIA DMCOPIA DMCOPIA -   -
433979  GYPSY11 *   *   GYPSY11 GYPSY11         GYPSY11 GYPSY11 GYPSY11 -   -
442398  *   *   *   INVADER2    *           INVADER2    -   -   -   -
445002  DMW1DOC DMW1DOC *   DMW1DOC DMW1DOC         DMW1DOC DMW1DOC DMW1DOC DMW1DOC -
447801  GYPSY4  GYPSY4  GYPSY4  GYPSY4  GYPSY4          GYPSY4  GYPSY4  GYPSY4  GYPSY4  GYPSY4
448097  *   *   G2  *   *           -   -   -   -   G2
454439  PPI251  *   *   *   *           -   -   PPI251  -   -
455258  RT1C    RT1C    RT1C    RT1C    RT1C            RT1C    RT1C    RT1C    RT1C    RT1C
456628  *   *   TC1-2   *   *           -   -   -   -   TC1-2
460457  GYPSY9  GYPSY9  GYPSY9  GYPSY9  GYPSY9          GYPSY9  GYPSY9  GYPSY9  GYPSY9  GYPSY9
466042  *   *   *   DMIFACA *           DMIFACA -   -   -   -
466224  DME010298   DME010298   *   DME010298   DME010298           DME010298   DME010298   DME010298   DME010298   -
467218  Tinker  Tinker  Tinker  Tinker  Tinker          Tinker  Tinker  Tinker  Tinker  Tinker
468969  DME9736 DME9736 *   DME9736 DME9736         DME9736 DME9736 DME9736 DME9736 -
471716  DMTHB1  DMTHB1  DMTHB1  DMTHB1  DMTHB1          DMTHB1  DMTHB1  DMTHB1  DMTHB1  DMTHB1
471976  AF541951    AF541951    AF541951    AF541951    AF541951            AF541951    AF541951    AF541951    AF541951    AF541951
473252  DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11           DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11
475050  GYPSY6  GYPSY6  GYPSY6  GYPSY6  GYPSY6          GYPSY6  GYPSY6  GYPSY6  GYPSY6  GYPSY6
475591  INVADER4    INVADER4    INVADER4    INVADER4    INVADER4            INVADER4    INVADER4    INVADER4    INVADER4    INVADER4
476267  Tinker  Tinker  Tinker  Tinker  Tinker          Tinker  Tinker  Tinker  Tinker  Tinker
476914  TRANSIB4    TRANSIB4    TRANSIB4    TRANSIB4    TRANSIB4            TRANSIB4    TRANSIB4    TRANSIB4    TRANSIB4    TRANSIB4
478084  DOC5    DOC5    DOC5    DOC5    DOC5            DOC5    DOC5    DOC5    DOC5    DOC5
481997  *   *   GYPSY9  *   *           -   -   -   -   GYPSY9
483554  DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA            DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA    DMLINEJA
489278  BS4 BS4 *   BS4 BS4         BS4 BS4 BS4 BS4 -
489390  DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11           DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11   DMPOGOR11
490351  FROGGER *   *   FROGGER FROGGER         FROGGER FROGGER FROGGER -   -
492299  G7  G7  G7  G7  G7          G7  G7  G7  G7  G7
493417  JUAN    JUAN    JUAN    JUAN    JUAN            JUAN    JUAN    JUAN    JUAN    JUAN
500584  *   *   IVK *   *           -   -   -   -   IVK
506326  FW2 FW2 *   FW2 FW2         FW2 FW2 FW2 FW2 -
508802  GYPSY5  GYPSY5  GYPSY5  GYPSY5  GYPSY5          GYPSY5  GYPSY5  GYPSY5  GYPSY5  GYPSY5
508805  GYPSY8  GYPSY8  *   GYPSY8  GYPSY8          GYPSY8  GYPSY8  GYPSY8  GYPSY8  -
511708  JOCKEY2 *   *   JOCKEY2 JOCKEY2         JOCKEY2 JOCKEY2 JOCKEY2 -   -
512188  G6_DM   G6_DM   G6_DM   G6_DM   G6_DM           G6_DM   G6_DM   G6_DM   G6_DM   G6_DM
512631  TRANSIB3    TRANSIB3    TRANSIB3    TRANSIB3    TRANSIB3            TRANSIB3    TRANSIB3    TRANSIB3    TRANSIB3    TRANSIB3
516072  *   *   1360    *   *           -   -   -   -   1360
520001  *   *   *   *   ROOA_LTR            -   ROOA_LTR    -   -   -
524762  AF222049    AF222049    AF222049    AF222049    AF222049            AF222049    AF222049    AF222049    AF222049    AF222049
529348  412 412 412 412 412         412 412 412 412 412
530434  AF418572    AF418572    AF418572    AF418572    AF418572            AF418572    AF418572    AF418572    AF418572    AF418572
533360  RT1B    RT1B    RT1B    RT1B    RT1B            RT1B    RT1B    RT1B    RT1B    RT1B

Conclusion

The observed and the expected alignment are remarkably similar. Solely the ordering of segregating insertions may is not accurately reproduced, but this is expected.
To see why this is expected, consider the two DNA sequences 'ATG' and 'ACG'. If we perform a pairwise alignement with these two sequences using a low gap penalty we may get the two equally valid alignments:

Alignment1:
A-TG
AC-G

Alignment2:
AT-G
A-CG

However, the important information for analysing TEs in piRNA cluster (or other repetitive regions), the population frequency of the different TE insertions is accurately reproduced.

Final remark

In this simulations all TE insertions were on the plus strand. In samples TEs may however be on both strands. Manna considers the strand of TEs, therefore it will never align a P-element on the plus strand with a P-element on the minus strand. Furthermore Manna will only consider overlapping sequences of TEs, it will thus not align a 5'-fragment of a TE (say the first 1000bp of the P-element) with a 3'-fragment of the TE (say the last 1000 bp of the P-element; the P-element has a length of 2907bp).


Related

Wiki: Home

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.