GAP5: alignment pads in consensus

  • Sònia Casillas

    Sònia Casillas - 2012-08-29


    I'm using GAP5 to edit 454 assemblies from MIRA and I've realized that GAP5 is very reticent to put alignment pads in the consensus. In some columns, I might have 20 pads and 1 A and the consensus is an A instead of an alignment pad. I've tried to change the parameters in the "Consensus Algorithm" window, but the behaviour is still the same. Is there a way to tell GAP5 to put in the consensus just the most frequent character for each column? I'm using GAP5 v.1.2.14-r2837M.

    Thank you very much for your help.

    Best regards,


  • James Bonfield

    James Bonfield - 2012-08-29

    Do you know what the confidence values are for the individual * and A characters?

    A base with confidence 100 will force the consensus to be that value, not matter what else exists (unless it's another conflicting base of confidence 100). 

  • Sònia Casillas

    Sònia Casillas - 2012-09-01

    Yes, alignment pads have a confidence of 0 or 1, not sure now, while the nucleotide normally has a confidence of ~20; but it is just 1 read with the nucleotide compared to ~20 with pads in places with homopolymers sequenced by 454… Other places where the nucleotide has a confidence of ~3, then the consensus has the alignment pad. Is there a way, then, to ignore confidence values in these cases?

    Thank you very much for your help!


  • James Bonfield

    James Bonfield - 2012-09-03

    Hmm. There isn't a way to completely ignore confidence values (as there was in gap4), so this needs to be added on the "to do" list I guess.

    However more intriguing is how the data became confidence 0 or 1. It's possible we have a bug with a specific type of input file. Can you please tell me what format your data was in prior to running tg_index?

  • Sònia Casillas

    Sònia Casillas - 2012-09-04

    Prior to running tg_index the data was a CAF file coming from the MIRA assembler. The instruction that we ran for converting to GAP5 was: tg_index –d seq,qual,name –o NIU –C NIU_out.caf

    Thank you very much for your help!


  • James Bonfield

    James Bonfield - 2012-09-07

    Interesting. Certainly the CAF files I see have quality values for * ("-" in CAF). You could look at the BaseQuality lines to see if there are issues with qualities. Gap5 requires padded CAF format which clearly you have or it wouldn't have worked. However you could try the caftools package to run caf_depad followed by caf_pad to strip out and reintroduce the pads. This should assign a pad quality value based on the average of the two surrounding bases.


    PS. You don't actually need -d seq,qual,name as it'll do that by default. The -d option was really a debugging thing for me to experiment in only including specific types of data and see how much space they took up (for the chart in the paper). I left it in because in some very rare cases people may want to see the assembly layout in the template display but not care at all about the actual bases or qualities.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks