To allow benchmarking tools for the identification of TE insertions from Pool-Seq data, we provide paired end reads for 1000 TE insertions in a region of an artificial chromosome being devoid of any repeats. The insertions are of random position, family and TE sequence. These data have been used to generate the table where we compare the performance of PoPoolationTE, PoPoolationTE2 and TEMP in the main manuscript.
known TE insertions
The position, family, strand and population frequency of the simulated TE insertions can be found in the following file. The insertions identified with a tool of interest should be compared to this data set.
https://sourceforge.net/projects/popoolation-te2/files/te-benchmark/statistics-freqrange001to10.txt/download
**the artificial reference chromosome **
https://sourceforge.net/projects/popoolation-te2/files/te-benchmark/chasis1M.fasta.zip/download
the consensus sequences of the TE insertions
https://sourceforge.net/projects/popoolation-te2/files/te-benchmark/teseq-clean-ml100noS4.fasta/download
a hierarchy of the TE insertions
https://sourceforge.net/projects/popoolation-te2/files/te-benchmark/tehier-ml100noS4.fasta/download
the simulated paired end reads
https://sourceforge.net/projects/popoolation-te2/files/te-benchmark/chi2_1.fastq.zip/download
https://sourceforge.net/projects/popoolation-te2/files/te-benchmark/chi2_2.fastq.zip/download
TE insertions identified with these reads should have the exact position, frequency, strand and family as provided in the known insertions