Before enter in a discussion, I would like to congratulate all the people engaged in the development of the FLASH script.
Of all the pipelines and scripts for merging overlapping paired-end reads, FLASH was the one which had the best results.
BUT... it could be better if in the merging process, it considerate that two overlapping sequences migth have some mismatch due to some insertion or deletion of a base, for example:
Thanks for the feedback. This is known limitation of FLASH: it only considers substitution errors, not insertions or deletions. As such, it is primarily intended for data where substitution errors are much more common than insertions or deletions. Currently, there are no plans to address this.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Before enter in a discussion, I would like to congratulate all the people engaged in the development of the FLASH script.
Of all the pipelines and scripts for merging overlapping paired-end reads, FLASH was the one which had the best results.
BUT... it could be better if in the merging process, it considerate that two overlapping sequences migth have some mismatch due to some insertion or deletion of a base, for example:
ACGTAGATCGATAGATAGTAGATGTAGATATGA
-------------------TAGTAGTAGATGTAGATATGAAGAACACAACGATCGATGCTG
The two reads above seems not to overlap, BUT because of an insertion, they could overlap:
ACGTAGATCGATAGATAGTAGATGTAGATATGA
--------------------TAG_TAGTAGATGTAGATATGAAGAACACAACGATCGATGCTG
please, consider this suggestion.
Thanks in advance!
Last edit: Renato Oliveira 2016-06-10
Thanks for the feedback. This is known limitation of FLASH: it only considers substitution errors, not insertions or deletions. As such, it is primarily intended for data where substitution errors are much more common than insertions or deletions. Currently, there are no plans to address this.