On 21/07/11 19:16, Sébastien Boisvert wrote:
>> However, this is probably more interesting, using a colour-space
>> sequence for one end, and a base-space sequence for the other end:
> What do you mean ?
It was a quick test of combining colour-space reads and base-space reads
together in one run. While it's not expected that first and second reads
will be in a different space, that is a side-effect of allowing combined
spaces as input files.
> But mine does not produce the contigs in nucleotide space.
> And I believe yours does !
Yes, it does. The assembly for my fork is done in base-space, while the
internal representation for reads and k-mers is colour-space. This means
that the hash values for k-mers will differ from your code, but in other
respects, other code doesn't need to know the difference.
FWIW, the current behaviour of my code could be shoehorned into
sebhtml/git by converting all reads into base-space.
I don't yet use any k-mers with unknown first bases, but I expect to add
that in once I've worked out an appropriate place to do it. I probably
need to get the assembly (or at least seed extension) to happen in
colour-space so that sequences with a different first base but the same
colour-space sequence are lumped together.
> So the next steps is to test your work on the system tests I guess.
The only "system tests" I have at the moment aren't a particularly good
representation of the data Ray has been designed to work on:
phiX-simulated: small genome, synthetic data
phiX-sequenced: small genome, circularised genome
S. mediterranea: transcriptome, rather than genome
?E. coli: colour-space data with high error-rate
At the moment, I've only really done any testing on the two phiX datasets.
-- David
|