Re: [Denovoassembler-devel] Colour-space loading / assembly now working (produces base-space output

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 21/07/11 19:16, Sébastien Boisvert wrote:
>> However, this is probably more interesting, using a colour-space
>> sequence for one end, and a base-space sequence for the other end:
> What do you mean ?

It was a quick test of combining colour-space reads and base-space reads 
together in one run. While it's not expected that first and second reads 
will be in a different space, that is a side-effect of allowing combined 
spaces as input files.

> But mine does not produce the contigs in nucleotide space.
> And I believe yours does !

Yes, it does. The assembly for my fork is done in base-space, while the 
internal representation for reads and k-mers is colour-space. This means 
that the hash values for k-mers will differ from your code, but in other 
respects, other code doesn't need to know the difference.

FWIW, the current behaviour of my code could be shoehorned into 
sebhtml/git by converting all reads into base-space.

I don't yet use any k-mers with unknown first bases, but I expect to add 
that in once I've worked out an appropriate place to do it. I probably 
need to get the assembly (or at least seed extension) to happen in 
colour-space so that sequences with a different first base but the same 
colour-space sequence are lumped together.

> So the next steps is to test your work on the system tests I guess.

The only "system tests" I have at the moment aren't a particularly good 
representation of the data Ray has been designed to work on:

phiX-simulated: small genome, synthetic data
phiX-sequenced: small genome, circularised genome
S. mediterranea: transcriptome, rather than genome
?E. coli: colour-space data with high error-rate

At the moment, I've only really done any testing on the two phiX datasets.

-- David

Re: [Denovoassembler-devel] Colour-space loading / assembly now working (produces base-space output

Ray -- Parallel genome assemblies for parallel DNA sequencing

Re: [Denovoassembler-devel] Colour-space loading / assembly now working (produces base-space output)