Menu

detectMITE / Blog: Recent posts

[Q2]: What does the description line mean in the output (in FASTA format) of detectMITE?

A: The description line of each MITE sequence (e.g., >6|2367817|2367964|2) has the following explanation:
>Chromosome|GenomicStartPosition| GenomicStopPosition |TSD_Length.
The description line of representative sequence of each MITE family (e.g., >6|2367817|2367964|2|109) has the following explanation:
>Chromosome|GenomicStartPosition| GenomicStopPosition |TSD_Length|CopyNumber.

Posted by Congting Ye 2015-10-13

[Q1]: Why does detectMITE identify much larger number of MITE families than the ones produced by other tools or database?

A: In detectMITE, the criteria of forming family are stricter than those adopted in MITE Digger, MITE-Hunter and RSPB, resulting in a large number of smaller MITE families. The rational for us to do this is to keep the completeness and validity of candidate MITEs as possible as we can, without losing detailed information that can be advantageous in further downstream data analyses in the future. More specifically, the reasons why we did not cluster MITE candidate sequences into fewer larger families are following. (1) The loose clustering criteria cannot guarantee the entire similarity between two potential members and cannot preserve structure/sequence signatures and similarities of MITEs. For example, if two sequences have partial similarity in their internal sequences, but with different terminal inverted repeats (TIRs), or vice versa, it is more likely to put them together into the same MITE family when loose clustering criteria are adopted. Using loose clustering criteria will for sure inflate the copy number count for many MITE families and increase the false positive rate in MITE detection. As a generic MITE detection tool, it is ideal for detectMITE to generate small families to keep similarity in scale of full length copies. If necessary, users can always do another round of clustering that will use loose criteria to form super families of MITEs. To study the evolution of MITEs cross different species, such MITE super families could be very informative and helpful. However, they are not accurate enough for genome annotation purpose. (2) For a given MITE family generated by using loose clustering criteria, a representative sequence cannot always represent faithfully the sequence and structural characteristics of all members within that family. In contrast, the representative sequence of a MITE family generated using strict clustering criteria can be directly used to retrieve the members in the genome with precise boundaries and high similarity.

Posted by Congting Ye 2015-10-13
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.