> ________________________________________
> De : David Eccles (gringer) [dav...@mp...]
> Date d'envoi : 15 juillet 2011 06:47
> À : Sébastien Boisvert
> Cc : den...@li...
> Objet : Re: Confused about coding -- completed seeds without distributions
>
> On 15/07/11 10:12, David Eccles (gringer) wrote:
>> I'll fetch the input data that failed once I get back home, and also
>> see what happens with real phiX data off our sequencer.
>
> So, for real phiX data, I prepared the sequence by masking out bases
> with quality scores less than 20, and then filtering out the 'N' bases,
> using the attached script (with 'proportion=1'), which produces an
> interleaved FASTA file. Despite all that filtering, I still have 4GB of
> >Q20 phiX data that came off the sequencer.
>
You don't need to filter anything with Ray.
> Based on the presence of the sequence 'ATCCAACCTGCAGAGTTTTATCGCTT' in
> these data, it it looks like a circularised phiX genome was used. I put
> the top 100k lines from this interleaved file into Ray, and it seems to
> work fine:
>
> $ mpirun -np 10 ../../code/Ray -i
> ~/illuminadata/110517_H134_0062_A816HKABXX/Data/Intensities/BaseCalls/fastq/ignore/head100k_interleaved_noN_phiX_region1101.fasta
> ...
> Rank 0 is creating seeds [1/1072]
> Rank 9 is creating seeds [1/1102]
> Rank 7 is creating seeds [1/1094]
> Rank 3 is creating seeds [1/1182]
> Rank 5 is creating seeds [1/1102]
> Rank 4 is creating seeds [1/1168]
> Rank 2 is creating seeds [1/1090]
> Rank 8 is creating seeds [1/1206]
> Rank 1 is creating seeds [1/1124]
> Rank 6 is creating seeds [1/1164]
> Rank 0 has 0 seeds
> Rank 0 is creating seeds [1072/1072] (completed)
> Rank 0: peak number of workers: 1032, maximum: 30000
> Rank 0: VirtualCommunicator: 172441 pushed messages generated 918
> virtual messages (0.532356%)
> Rank 2 has 0 seeds
> Rank 2 is creating seeds [1090/1090] (completed)
> Rank 2: peak number of workers: 1040, maximum: 30000
> Rank 2: VirtualCommunicator: 172482 pushed messages generated 876
> virtual messages (0.507879%)
> Rank 7 has 0 seeds
> Rank 7 is creating seeds [1094/1094] (completed)
> Rank 7: peak number of workers: 1056, maximum: 30000
> Rank 7: VirtualCommunicator: 172598 pushed messages generated 860
> virtual messages (0.498268%)
> Rank 8 has 0 seeds
> Rank 8 is creating seeds [1206/1206] (completed)
> Rank 8: peak number of workers: 1128, maximum: 30000
> Rank 8: VirtualCommunicator: 173161 pushed messages generated 863
> virtual messages (0.498380%)
> Rank 3 has 0 seeds
> Rank 3 is creating seeds [1182/1182] (completed)
> Rank 3: peak number of workers: 1136, maximum: 30000
> Rank 3: VirtualCommunicator: 173111 pushed messages generated 867
> virtual messages (0.500835%)
> Rank 5 has 0 seeds
> Rank 5 is creating seeds [1102/1102] (completed)
> Rank 5: peak number of workers: 1054, maximum: 30000
> Rank 5: VirtualCommunicator: 172651 pushed messages generated 936
> virtual messages (0.542134%)
> Rank 6 has 0 seeds
> Rank 6 is creating seeds [1164/1164] (completed)
> Rank 6: peak number of workers: 1114, maximum: 30000
> Rank 6: VirtualCommunicator: 173032 pushed messages generated 862
> virtual messages (0.498174%)
> Rank 4 has 0 seeds
> Rank 4 is creating seeds [1168/1168] (completed)
> Rank 4: peak number of workers: 1116, maximum: 30000
> Rank 4: VirtualCommunicator: 173056 pushed messages generated 852
> virtual messages (0.492326%)
> Rank 1 has 0 seeds
> Rank 1 is creating seeds [1124/1124] (completed)
> Rank 1: peak number of workers: 1088, maximum: 30000
> Rank 1: VirtualCommunicator: 183555 pushed messages generated 11665
> virtual messages (6.355043%)
> Rank 9 discovered a seed with 5363 vertices
> Rank 9 has 1 seeds
> Rank 9 is creating seeds [1102/1102] (completed)
> Rank 9: peak number of workers: 1052, maximum: 30000
> Rank 9: VirtualCommunicator: 183307 pushed messages generated 11631
> virtual messages (6.345093%)
> ...
> Rank 1 is extending seeds [0/0] (completed)
> Rank 1 extended 0 seeds out of 0 (-nan%)
> Rank 6 is extending seeds [0/0] (completed)
> Rank 6 extended 0 seeds out of 0 (-nan%)
> Rank 4 is extending seeds [0/0] (completed)
> Rank 4 extended 0 seeds out of 0 (-nan%)
> Rank 5 is extending seeds [0/0] (completed)
> Rank 5 extended 0 seeds out of 0 (-nan%)
> Rank 3 is extending seeds [0/0] (completed)
> Rank 3 extended 0 seeds out of 0 (-nan%)
> Rank 0 is extending seeds [0/0] (completed)
> Rank 0 extended 0 seeds out of 0 (-nan%)
> Rank 8 is extending seeds [0/0] (completed)
> Rank 8 extended 0 seeds out of 0 (-nan%)
> Rank 2 is extending seeds [0/0] (completed)
> Rank 2 extended 0 seeds out of 0 (-nan%)
> Rank 7 is extending seeds [0/0] (completed)
> Rank 7 extended 0 seeds out of 0 (-nan%)
> Rank 9 is extending seeds [1/1]
> Rank 9 starts on a seed, length is 5363 [0/1]
> Rank 9 reached 0 vertices (GTGTTAACAGTCGGGAGAGGA) from seed 1
> Rank 9 reached 5363 vertices from seed 1 (changing direction)
> Rank 9 reached 0 vertices (CTGGTTATATTGACCATGCCG) from seed 1
> Rank 9 reached 5365 vertices (CCTCTCCCGACTGTTAACACT) from seed 1
> Rank 9 (extension done)
> Rank 9 is extending seeds [1/1] (completed)
> Rank 9 extended 1 seeds out of 1 (100.00%)
> ...
> Number of contigs: 1
> Total length of contigs: 5385
> Number of contigs >= 500 nt: 1
> Total length of contigs >= 500 nt: 5385
> Number of scaffolds: 1
> Total length of scaffolds: 5385
> Number of scaffolds >= 500 nt: 1
> Total length of scaffolds >= 500: 5385
>
> I'll have a look at what comes out if I break the circularity of this
> DNA by removing sequence that includes the start and end pieces together.
>
> -- David
So your hypothesis is that when you use all the reads, you get a k-mer graph containing a loop.
But if that would be the case, then you would get a parallel infinite loop because SeedWorker.cpp does
not check for circular seeds.
Sébastien
|