#130 nearly identical sequence in contigs

Assembly_analysis
open
nobody
Test (8)
5
2011-10-27
2011-10-27
YunjieLiu
No

Hi all,

I have used Celera assembler to create a contig set for the liver fluke genome.
Then I did a self-align on the contigs and found some very long,nearly identical sequence. Blast m8 sorted result is below.
Our group thought these sequences maybe artificial duplicates but no reprot about this case.
I wonder how these sequences come about and how to deal with them?

Best wishes!

ctg120293203258 ctg120293209927 99.98 11912 2 0 3255 15166 56050 44139 0.0 2.360e+04
ctg120293209927 ctg120293203258 99.98 11912 2 0 44139 56050 15166 3255 0.0 2.360e+04
ctg120293194963 ctg120293210163 99.99 10009 1 0 57736 67744 16614 6606 0.0 1.983e+04
ctg120293210163 ctg120293194963 99.99 10009 1 0 6606 16614 67744 57736 0.0 1.983e+04
ctg120293192725 ctg120293199964 99.99 9405 1 0 1 9405 17264 26668 0.0 1.864e+04
ctg120293199964 ctg120293192725 99.99 9405 1 0 17264 26668 1 9405 0.0 1.864e+04
ctg120293192725 ctg120293192737 99.99 9404 1 0 2 9405 9404 1 0.0 1.863e+04
ctg120293192737 ctg120293199964 100.00 9404 0 0 1 9404 26668 17265 0.0 1.864e+04
ctg120293192737 ctg120293192725 99.99 9404 1 0 1 9404 9405 2 0.0 1.863e+04
ctg120293199964 ctg120293192737 100.00 9404 0 0 17265 26668 9404 1 0.0 1.864e+04
ctg120293192725 ctg120293199964 99.99 9403 1 0 10337 19739 26666 17264 0.0 1.863e+04
ctg120293199964 ctg120293192725 99.99 9403 1 0 17264 26666 19739 10337 0.0 1.863e+04
ctg120293192725 ctg120293192737 99.99 9402 1 0 10337 19738 3 9404 0.0 1.863e+04
ctg120293192737 ctg120293192725 99.99 9402 1 0 3 9404 10337 19738 0.0 1.863e+04

Discussion