Menu

#14 Problem with gap handing in output files

4.0.1
closed
None
2015-01-10
2015-01-05
No

I have some doubts with the gap handling in mira.
From what I can see in my output files, when an alignment contains mostly gaps in a certain position, (example in the MAF file - 1 read with "G" and 4 reads with "" in the position) the consensus sequence is written with a gap in that position.
This means that in the
_out.padded.fasta file, this position is written as "" and it gets eliminated from the _out.unpadded.fasta.
However, in some cases, mira is writing the minority position instead of a gap.
Attached is a small fastq file you can use as a testcase.
In positions 56 and 69 you can find both cases I refer to. In position 56 we have a 1:5 gap ratio, which is reported as a gap in the fasta file and in position 69 we have a 2:3 which is reported as a "C".
Is this by design or is it a bug in the countings?

Thanks

1 Attachments

Discussion

  • Francisco Pina Martins

    Also, here is the manifest file for the assembly.

     
  • Francisco Pina Martins

    Actually in the example file it's positions 102 and 115. Sorry about the mistake.

     
  • Bastien Chevreux

    The consensus generation does not rely primarily on base counts, but uses base and group qualities to determine what the consensus base should be. An early version of that was described in my thesis: http://www.chevreux.org/thesis/node19.html#SECTION00745000000000000000

    The current algorithms have both become simpler and more complex, but the above method is still the main driver for base calling.

    HTH,
    B.

     
  • Francisco Pina Martins

    Thanks Bastien.
    I suppose this can be closed now.

    Best,

    Francisco

     
  • Bastien Chevreux

    • status: open --> closed
     

Log in to post a comment.