So, things are finally looking like the colour-space stuff is going
somewhere. I've managed to get a good result from a Ray run with
colour-space input (no errors, first bases included), and it might be a
tad quicker than the sebhtml/ray colour-space assembler:
$ time ../../../sebgit/ray/code/Ray -p tests/phix/phix_5k_1.csfasta
tests/phix/phix_5k_2.csfasta | grep -A 7 'Number of contigs:'
Number of contigs: 1
Total length of contigs: 5380
Number of contigs >= 500 nt: 1
Total length of contigs >= 500 nt: 5380
Number of scaffolds: 1
Total length of scaffolds: 5380
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5380
real 0m3.107s
user 0m2.432s
sys 0m0.652s
$ time ../code/Ray -p tests/phix/phix_5k_1.csfasta
tests/phix/phix_5k_2.csfasta | grep -A 7 'Number of contigs:'
Number of contigs: 1
Total length of contigs: 5243
Number of contigs >= 500 nt: 1
Total length of contigs >= 500 nt: 5243
Number of scaffolds: 1
Total length of scaffolds: 5243
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5243
real 0m2.894s
user 0m2.392s
sys 0m0.484s
$ fasta_formatter -i tests/phix/phix.fasta | fastx_reverse_complement |
grep $(fasta_formatter -i RayOutput.Scaffolds.fasta | grep -v '^>') >
/dev/null && echo "success (match in reverse direction)"
success (match in reverse direction)
There's nothing particularly impressive with this (apart from it
actually working...). For what it's worth, the sebhtml/ray also produces
a correct sequence, if you feed it the right starting base:
$ fasta_formatter -i tests/phix/phix.fasta | fastx_reverse_complement |
grep $((echo -n 'G'; tail -n +2 RayOutput.Scaffolds.fasta) |
~/scripts/cs2base.pl | fasta_formatter) > /dev/null && echo "success
(match in forward direction)"
success (match in forward direction)
However, this is probably more interesting, using a colour-space
sequence for one end, and a base-space sequence for the other end:
$ time ../../../sebgit/ray/code/Ray -p tests/phix/phix_5k_1.fasta
tests/phix/phix_5k_2.csfasta | grep -A 7 'Number of contigs:'
Number of contigs: 3
Total length of contigs: 15713
Number of contigs >= 500 nt: 3
Total length of contigs >= 500 nt: 15713
Number of scaffolds: 3
Total length of scaffolds: 15713
Number of scaffolds >= 500 nt: 3
Total length of scaffolds >= 500: 15713
real 0m4.246s
user 0m3.428s
sys 0m0.796s
bioinf@thaliana:~/install/ray/git/ray/system-tests$ time ../code/Ray -p
tests/phix/phix_5k_1.fasta tests/phix/phix_5k_2.csfasta | grep -A 7
'Number of contigs:'
Number of contigs: 1
Total length of contigs: 5243
Number of contigs >= 500 nt: 1
Total length of contigs >= 500 nt: 5243
Number of scaffolds: 1
Total length of scaffolds: 5243
Number of scaffolds >= 500 nt: 1
Total length of scaffolds >= 500: 5243
real 0m2.892s
user 0m2.412s
sys 0m0.456s
$ fasta_formatter -i tests/phix/phix.fasta | fastx_reverse_complement |
grep $(fasta_formatter -i RayOutput.Scaffolds.fasta | grep -v '^>') >
/dev/null && echo "success (match in reverse direction)"
success (match in reverse direction)
[I haven't tried doing a match with the sebhtml/ray sequence.]
The commit that actually got my code working properly with colour-space
is here:
https://github.com/gringer/ray/commit/fbd1efd4c3dbe31df0de820167a068736de648f5
[I forgot to make the same changes for the KmerAcademyBuilder (lines
109-113) that I made for the VerticesExtractor (lines 110-114)]
-- David
|