I just completed a genome assembly with MIRA. It has 450000 454 reads and 3 million illumina read pairs.
When I ask tg_index to convert the CAF file to a gap5 database, I get the following error messages:
With the Illumina reads:
Template name '"HWI-ST813:68:B0202ABXX:2:1102:13848:91869:TTAGGC"' must be a prefix of the sequence name 'HWI-ST813:68:B0202ABXX:2:1102:13848:91869:TTAGGC/1'. Ignoring it.
With the 454 reads:
Template name '"GACYVPL01ELGAJ"' must be a prefix of the sequence name 'GACYVPL01ELGAJ'. Ignoring it.
I think it happens a few million times. Should I be concerned? Can I switch it off? Should I be asking this question on the MIRA list?
Hmm that's clearly my bug and not MIRA as the error is wrong. Template name '"GACYVPL01ELGAJ"' *is* a prefix of the sequence name 'GACYVPL01ELGAJ'.
Is this the latest svn version? I'll try and construct a CAF to verify this.
Thanks for the quick response.
My version was taken from svn on the 19th October. I usually check through the source updates about once a week to look for relevant bug fixes. I noticed that your last change to the tg_index code was on the 30th Sept, and it was a duplicate name search. Perhaps this broke it?
We managed to reproduce it here on a data set too with the latest svn checkout. So we'll investigate why.
Thanks for the bug report.
I did the same thing on my Ubuntu box to make sure it wasn't just a Mac thing. Same result.
It's objecting to the quotes around the template name. So in our internal caf files the template entry is like this:
but in MIRA caf it is like this:
I'm going to have to get tg_index to handle quotes in caf files properly but in the mean time you could try and remove the quotes from the template names.