On 15/07/11 10:12, David Eccles (gringer) wrote:
> I'll fetch the input data that failed once I get back home, and also
> see what happens with real phiX data off our sequencer.
So, for real phiX data, I prepared the sequence by masking out bases
with quality scores less than 20, and then filtering out the 'N' bases,
using the attached script (with 'proportion=1'), which produces an
interleaved FASTA file. Despite all that filtering, I still have 4GB of
>Q20 phiX data that came off the sequencer.
Based on the presence of the sequence 'ATCCAACCTGCAGAGTTTTATCGCTT' in
these data, it it looks like a circularised phiX genome was used. I put
the top 100k lines from this interleaved file into Ray, and it seems to
work fine:
$ mpirun -np 10 ../../code/Ray -i
~/illuminadata/110517_H134_0062_A816HKABXX/Data/Intensities/BaseCalls/fastq/ignore/head100k_interleaved_noN_phiX_region1101.fasta
...
Rank 0 is creating seeds [1/1072]
Rank 9 is creating seeds [1/1102]
Rank 7 is creating seeds [1/1094]
Rank 3 is creating seeds [1/1182]
Rank 5 is creating seeds [1/1102]
Rank 4 is creating seeds [1/1168]
Rank 2 is creating seeds [1/1090]
Rank 8 is creating seeds [1/1206]
Rank 1 is creating seeds [1/1124]
Rank 6 is creating seeds [1/1164]
Rank 0 has 0 seeds
Rank 0 is creating seeds [1072/1072] (completed)
Rank 0: peak number of workers: 1032, maximum: 30000
Rank 0: VirtualCommunicator: 172441 pushed messages generated 918
virtual messages (0.532356%)
Rank 2 has 0 seeds
Rank 2 is creating seeds [1090/1090] (completed)
Rank 2: peak number of workers: 1040, maximum: 30000
Rank 2: VirtualCommunicator: 172482 pushed messages generated 876
virtual messages (0.507879%)
Rank 7 has 0 seeds
Rank 7 is creating seeds [1094/1094] (completed)
Rank 7: peak number of workers: 1056, maximum: 30000
Rank 7: VirtualCommunicator: 172598 pushed messages generated 860
virtual messages (0.498268%)
Rank 8 has 0 seeds
Rank 8 is creating seeds [1206/1206] (completed)
Rank 8: peak number of workers: 1128, maximum: 30000
Rank 8: VirtualCommunicator: 173161 pushed messages generated 863
virtual messages (0.498380%)
Rank 3 has 0 seeds
Rank 3 is creating seeds [1182/1182] (completed)
Rank 3: peak number of workers: 1136, maximum: 30000
Rank 3: VirtualCommunicator: 173111 pushed messages generated 867
virtual messages (0.500835%)
Rank 5 has 0 seeds
Rank 5 is creating seeds [1102/1102] (completed)
Rank 5: peak number of workers: 1054, maximum: 30000
Rank 5: VirtualCommunicator: 172651 pushed messages generated 936
virtual messages (0.542134%)
Rank 6 has 0 seeds
Rank 6 is creating seeds [1164/1164] (completed)
Rank 6: peak number of workers: 1114, maximum: 30000
Rank 6: VirtualCommunicator: 173032 pushed messages generated 862
virtual messages (0.498174%)
Rank 4 has 0 seeds
Rank 4 is creating seeds [1168/1168] (completed)
Rank 4: peak number of workers: 1116, maximum: 30000
Rank 4: VirtualCommunicator: 173056 pushed messages generated 852
virtual messages (0.492326%)
Rank 1 has 0 seeds
Rank 1 is creating seeds [1124/1124] (completed)
Rank 1: peak number of workers: 1088, maximum: 30000
Rank 1: VirtualCommunicator: 183555 pushed messages generated 11665
virtual messages (6.355043%)
Rank 9 discovered a seed with 5363 vertices
Rank 9 has 1 seeds
Rank 9 is creating seeds [1102/1102] (completed)
Rank 9: peak number of workers: 1052, maximum: 30000
Rank 9: VirtualCommunicator: 183307 pushed messages generated 11631
virtual messages (6.345093%)
...
Rank 1 is extending seeds [0/0] (completed)
Rank 1 extended 0 seeds out of 0 (-nan%)
Rank 6 is extending seeds [0/0] (completed)
Rank 6 extended 0 seeds out of 0 (-nan%)
Rank 4 is extending seeds [0/0] (completed)
Rank 4 extended 0 seeds out of 0 (-nan%)
Rank 5 is extending seeds [0/0] (completed)
Rank 5 extended 0 seeds out of 0 (-nan%)
Rank 3 is extending seeds [0/0] (completed)
Rank 3 extended 0 seeds out of 0 (-nan%)
Rank 0 is extending seeds [0/0] (completed)
Rank 0 extended 0 seeds out of 0 (-nan%)
Rank 8 is extending seeds [0/0] (completed)
Rank 8 extended 0 seeds out of 0 (-nan%)
Rank 2 is extending seeds [0/0] (completed)
Rank 2 extended 0 seeds out of 0 (-nan%)
Rank 7 is extending seeds [0/0] (completed)
Rank 7 extended 0 seeds out of 0 (-nan%)
Rank 9 is extending seeds [1/1]
Rank 9 starts on a seed, length is 5363 [0/1]
Rank 9 reached 0 vertices (GTGTTAACAGTCGGGAGAGGA) from seed 1
Rank 9 reached 5363 vertices from seed 1 (changing direction)
Rank 9 reached 0 vertices (CTGGTTATATTGACCATGCCG) from seed 1
Rank 9 reached 5365 vertices (CCTCTCCCGACTGTTAACACT) from seed 1
Rank 9 (extension done)
Rank 9 is extending seeds [1/1] (completed)
Rank 9 extended 1 seeds out of 1 (100.00%)
...
Number of contigs: 1
Total length of contigs: 5385
Number of contigs >= 500 nt: 1
Total length of contigs >= 500 nt: 5385
Number of scaffolds: 1
Total length of scaffolds: 5385
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5385
I'll have a look at what comes out if I break the circularity of this
DNA by removing sequence that includes the start and end pieces together.
-- David
|