On 15/07/11 12:47, David Eccles (gringer) wrote:
> I'll have a look at what comes out if I break the circularity of this
> DNA by removing sequence that includes the start and end pieces together.
Done, sort of. Sorry, I think I forgot to link in the circular reads
sequence in my previous email, so here are both of them:
http://user.interface.org.nz/~gringer/hacking/head100k_interleaved_noN_phiX_region1101.fasta
http://user.interface.org.nz/~gringer/hacking/linearised_head100k_interleaved_noN_phiX_region1101.fasta
Curiously, Ray finds more seeds in another rank (rank 6) by removing
this connecting sequence:
$ mpirun -np 10 ../../code/Ray -i
~/illuminadata/110517_H134_0062_A816HKABXX/Data/Intensities/BaseCalls/fastq/ignore/linearised_head100k_interleaved_noN_phiX_region1101.fasta
Rank 0 is extending seeds [0/0] (completed)
Rank 5 is extending seeds [0/0] (completed)
Rank 5 extended 0 seeds out of 0 (-nan%)
Rank 4 is extending seeds [0/0] (completed)
Rank 4 extended 0 seeds out of 0 (-nan%)
Rank 7 is extending seeds [0/0] (completed)
Rank 7 extended 0 seeds out of 0 (-nan%)
Rank 2 is extending seeds [0/0] (completed)
Rank 2 extended 0 seeds out of 0 (-nan%)
Rank 3 is extending seeds [0/0] (completed)
Rank 3 extended 0 seeds out of 0 (-nan%)
Rank 1 is extending seeds [0/0] (completed)
Rank 1 extended 0 seeds out of 0 (-nan%)
Rank 8 is extending seeds [0/0] (completed)
Rank 8 extended 0 seeds out of 0 (-nan%)
Rank 6 is extending seeds [1/1]
Rank 6 starts on a seed, length is 4063 [0/1]
Rank 6 reached 0 vertices (GCAGGTTGGATACGCCAATCA) from seed 1
Rank 9 is extending seeds [1/1]
Rank 9 starts on a seed, length is 1277 [0/1]
Rank 9 reached 0 vertices (GTGTTAACAGTCGGGAGAGGA) from seed 1
Rank 0 extended 0 seeds out of 0 (-nan%)
Rank 9 reached 1277 vertices from seed 1 (changing direction)
Rank 9 reached 0 vertices (AGTTTTATCGCTTCCATGACG) from seed 1
Rank 9 reached 1279 vertices (CCTCTCCCGACTGTTAACACT) from seed 1
Rank 9 (extension done)
Rank 9 is extending seeds [1/1] (completed)
Rank 9 extended 1 seeds out of 1 (100.00%)
Rank 6 reached 4063 vertices from seed 1 (changing direction)
Rank 6 reached 0 vertices (CTGGTTATATTGACCATGCCG) from seed 1
Rank 6 reached 4064 vertices (TGATTGGCGTATCCAACCTGC) from seed 1
Rank 6 (extension done)
Rank 6 is extending seeds [1/1] (completed)
Rank 6 extended 1 seeds out of 1 (100.00%)
...
Number of contigs: 2
Total length of contigs: 5383
Number of contigs >= 500 nt: 2
Total length of contigs >= 500 nt: 5383
Number of scaffolds: 1
Total length of scaffolds: 5652
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5652
Why don't the other ranks find seeds?
It's still successful at getting a sequence, but it's split into 2
contigs. This doesn't much surprise me, because I eliminated the joining
segment, but kept end pairs that spanned either side of the joining
segment. It's a little strange that the sequence of Ns that has been
filled (about 274 Ns) in is not the same as what I would expect from an
assembly of the circular genome (i.e. 2 Ns), but when a lot of repeated
Ns appear in an assembled sequence, it's expected that the distance is
unknown.
-- David
|