> ________________________________________
> De : David Eccles (gringer) [dav...@mp...]
> Date d'envoi : 15 juillet 2011 07:27
> À : Sébastien Boisvert; den...@li...
> Objet : Re: Confused about coding -- completed seeds without distributions
>
> On 15/07/11 12:47, David Eccles (gringer) wrote:
>> I'll have a look at what comes out if I break the circularity of this
>> DNA by removing sequence that includes the start and end pieces together.
>
> Done, sort of. Sorry, I think I forgot to link in the circular reads
> sequence in my previous email, so here are both of them:
>
> http://user.interface.org.nz/~gringer/hacking/head100k_interleaved_noN_phiX_region1101.fasta
>
> http://user.interface.org.nz/~gringer/hacking/linearised_head100k_interleaved_noN_phiX_region1101.fasta
>
> Curiously, Ray finds more seeds in another rank (rank 6) by removing
> this connecting sequence:
>
Simply because Ray is based on peer-to-peer computation -- no rank will ever do the same computation.
> $ mpirun -np 10 ../../code/Ray -i
> ~/illuminadata/110517_H134_0062_A816HKABXX/Data/Intensities/BaseCalls/fastq/ignore/linearised_head100k_interleaved_noN_phiX_region1101.fasta
>
> Rank 0 is extending seeds [0/0] (completed)
> Rank 5 is extending seeds [0/0] (completed)
> Rank 5 extended 0 seeds out of 0 (-nan%)
> Rank 4 is extending seeds [0/0] (completed)
> Rank 4 extended 0 seeds out of 0 (-nan%)
> Rank 7 is extending seeds [0/0] (completed)
> Rank 7 extended 0 seeds out of 0 (-nan%)
> Rank 2 is extending seeds [0/0] (completed)
> Rank 2 extended 0 seeds out of 0 (-nan%)
> Rank 3 is extending seeds [0/0] (completed)
> Rank 3 extended 0 seeds out of 0 (-nan%)
> Rank 1 is extending seeds [0/0] (completed)
> Rank 1 extended 0 seeds out of 0 (-nan%)
> Rank 8 is extending seeds [0/0] (completed)
> Rank 8 extended 0 seeds out of 0 (-nan%)
> Rank 6 is extending seeds [1/1]
> Rank 6 starts on a seed, length is 4063 [0/1]
> Rank 6 reached 0 vertices (GCAGGTTGGATACGCCAATCA) from seed 1
> Rank 9 is extending seeds [1/1]
> Rank 9 starts on a seed, length is 1277 [0/1]
> Rank 9 reached 0 vertices (GTGTTAACAGTCGGGAGAGGA) from seed 1
> Rank 0 extended 0 seeds out of 0 (-nan%)
> Rank 9 reached 1277 vertices from seed 1 (changing direction)
> Rank 9 reached 0 vertices (AGTTTTATCGCTTCCATGACG) from seed 1
> Rank 9 reached 1279 vertices (CCTCTCCCGACTGTTAACACT) from seed 1
> Rank 9 (extension done)
> Rank 9 is extending seeds [1/1] (completed)
> Rank 9 extended 1 seeds out of 1 (100.00%)
> Rank 6 reached 4063 vertices from seed 1 (changing direction)
> Rank 6 reached 0 vertices (CTGGTTATATTGACCATGCCG) from seed 1
> Rank 6 reached 4064 vertices (TGATTGGCGTATCCAACCTGC) from seed 1
> Rank 6 (extension done)
> Rank 6 is extending seeds [1/1] (completed)
> Rank 6 extended 1 seeds out of 1 (100.00%)
> ...
> Number of contigs: 2
> Total length of contigs: 5383
> Number of contigs >= 500 nt: 2
> Total length of contigs >= 500 nt: 5383
> Number of scaffolds: 1
> Total length of scaffolds: 5652
> Number of scaffolds >= 500 nt: 1
> Total length of scaffolds >= 500: 5652
>
> Why don't the other ranks find seeds?
>
> It's still successful at getting a sequence, but it's split into 2
> contigs. This doesn't much surprise me, because I eliminated the joining
> segment, but kept end pairs that spanned either side of the joining
> segment. It's a little strange that the sequence of Ns that has been
> filled (about 274 Ns) in is not the same as what I would expect from an
> assembly of the circular genome (i.e. 2 Ns), but when a lot of repeated
> Ns appear in an assembled sequence, it's expected that the distance is
> unknown.
>
> -- David
>
I think you are right, basically, to find the start of a seed, Ray looks for a vertex that don't have one strong parent and that don't have one strong child.
Because your genome is so tiny and because there are no repeats, Ray fails to find the start of any seed because of the above.
I have to think how to solve this carefully.
Sébastien
|