[Kdiff3-user] Alignment algorithm improvements! (was: Large set of testdata for alignmenttest)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Aug 19, 2014 at 07:27:35PM +0200, Maurice van der Pot wrote:
> 3) generate test data systematically
>    Disadvantage: the data set gets big really quickly if you cover all
>    possible combinations of lines, even for small fragments
> 4) generate test data from real-life merges
>    Disadvantage: the data gets big even more quickly than with 3,
>    because there is a lot of overlap in which merged file tests which
>    behaviour of kdiff3 and each file contains many lines that are
>    irrelevant to the test

I went ahead and implemented both 3 and 4. Both scripts are included in
the merge request I created on sourceforge:
https://sourceforge.net/p/kdiff3/code/merge-requests/2/

This test set was a prerequisite for some improvements on the alignment
algorithm that I have done and which I am pretty excited about!
(see my moveup branch on sourceforge: 
https://sourceforge.net/u/griffon26/kdiff3/ci/moveup/tree/)

The test sets have allowed me to not only catch mistakes I made during
the implementation of the improvements, but also found one or two
deficiencies in the existing code. Once I fixed those the alignment
even improved a little more.

The testdata I used was generated with:
  generate_testdata_from_permutations.py -r 7 -s 0

The differences became:
  http://www.kfk4ever.com/~griffon26/alignment_changes_after_moveup.txt
In this output the actual result is the output of the code with my improvements
(my moveup branch), while 'expected result' is the output of the original code
(my alignmenttest branch).

I have obviously not checked the results of all test cases, but the
samples I took looked like pure improvements. One example:

  Running test with testdata/permutations/perm_02739_*.txt...                            NOK
  Actual result (written to testdata/permutations/perm_02739_actual_result.txt):
  ----------------------------------------------------------------------------------------------
       0 aaa                          0 aaa                          0 aaa
       1 bbb                          1 bbb
                                      2 ccc                          1 ccc
       2 ddd                          3 xxxddd                       2 ddd
                                      4 eee
       3                              5                              3
  ----------------------------------------------------------------------------------------------
  Expected result:
  ----------------------------------------------------------------------------------------------
       0 aaa                          0 aaa                          0 aaa
       1 bbb                          1 bbb
       2 ddd                          2 ccc                          1 ccc
                                      3 xxxddd                       2 ddd
                                      4 eee
       3                              5                              3
  ----------------------------------------------------------------------------------------------

As you can see in the old version ddd in file A was not aligned with ddd in file C.
The new version fixes this.

I also ran the new code against test data generated from the linux kernel git repo.
I checked a handful of the differences there and they were also either
improvements or neutral changes.

Hoping to hear from you soon,

Maurice.

-- 
Maurice van der Pot

Gnome Planner Developer  gri...@kf...  http://live.gnome.org/Planner

[Kdiff3-user] Alignment algorithm improvements! (was: Large set of testdata for alignmenttest)

A graphical text difference analyzer

[Kdiff3-user] Alignment algorithm improvements! (was: Large set of testdata for alignmenttest)