Commit [r3657] Maximize Restore History

Many changes to shuffle pads implementation.

The malign code in align_lib now copes with start/end range being
something other than 1..length. This makes it far more efficient at
dealing with sub-sections of a contig.

Added a EDGE_GAPS_MAXY cost to block shifting the consensus to match
the sequence (vice versa is OK). In essence this is moving the
sequence beyond the end of the consensus, which can cause problems in
BAM world if we get coordinates < 1.

Adjusted the malign scores. It's now hard coded for DNA only, but it
was never used on proteins anyway. The scaling of scores is still 0 to
128, but now it is nonlinear and not in direct proportion to the
fraction of base type in the consensus vector. This was arrived at by

There is also an optional second scoring method for pads, which works
in some cases and not in others. This is selected by specifying
gap_extend to be 0 (gap_extend has been ignored for years, since we
switched from affine alignments). The shuffle pads code now executes
both modes in an attempt to find the best alignments.

When evaluating our realigned soft-clips we first undo the clipped
before recomputing the new consensus vector. This avoids a circular
argument and means we can only extend into heterozygous cutoffs when
there was already the same heterozygosity in the originally clipped

Finally, adjusted the soft clipped scoring method to promote more
extension before realignment, but harsher when validating whether to
keep the alignments.

jkbonfield 2014-05-21

changed /staden/trunk/src/gap5/shuffle_pads.c
changed /staden/trunk/src/seq_utils/align_lib.c
changed /staden/trunk/src/seq_utils/align_lib.h
/staden/trunk/src/gap5/shuffle_pads.c Diff Switch to side-by-side view
/staden/trunk/src/seq_utils/align_lib.c Diff Switch to side-by-side view
/staden/trunk/src/seq_utils/align_lib.h Diff Switch to side-by-side view