tkdiff / Bugs / #47 Choose better diff when multiple choices are available

Side-by-side diff viewer, editor and merge preparer

#47 Choose better diff when multiple choices are available

Milestone: v1.0 (example)

Status: closed-wont-fix

Owner: michael-m

Labels: None

Priority: 5

Updated: 2018-06-06

Created: 2013-04-04

Creator: Anonymous

Private: No

I have a file that contains repeated paragraphs along the lines of:

$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget social kansas pantry donation exemption" \ --overrides_policy_name="budget consumer auto" \ --force \ edit

$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget allow transfers within Knowledge PA" \ --overrides_policy_name="budget consumer auto" \ --force \ edit

$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget allow transfers within Social PA" \ --overrides_policy_name="budget consumer auto" \ --force \ edit

$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget auto approval" \ --force \ delete

In the original file, the first and last paragraph above are present along with other preceding paragraphs and some trailing stuff. I inserted the 2nd and 3rd paragraphs.

I inserted 16 lines, and diff has 6 choices as to which 16 lines it thinks I inserted. At the current time, diff chooses the last available choice. I claim that, when possible, diff should prefer a choice of lines that takes blank lines into account. Specifically, we can consider 4 lines for a choice. The line before the inserted text, the first line of the inserted text, the last line of the inserted text, and the line after the inserted text.

As a first approximation, consider counting the number of those 4 lines that are blank and using that as the score for a choice. We will then take a choice with the highest score. In this case, there are two choices that have a score of 2 and 4 choices with a score of zero.

As a refined approximation, we might slightly prefer a choice that is preceded by a blank line and ends with a blank line to a choice that starts with a blank line and is follwed by a blank line. So we might score 10 points for a blank line, and 1 point for a preceding blank line and 1 point for an ending blank line. On the other hand, this is also the last high scoring choice.

This pattern occurs sufficiently frequently that diff would work better if it incorporated this logic.

Discussion

michael-m - 2018-05-21

status: open --> pending

assigned_to: michael-m

Group: --> v1.0 (example)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

michael-m - 2018-05-21

The key phrase in your description is in your last line "...frequently that diff would work better..."; you are correct, but TkDiff is designed as a "wrapper" (if you will) that surrounds the external diff engine. As such, it does not make the decisions on where any particular region begins or ends -- that is what the engine does. What you want is a different engine, which TkDiff will allow you to specify via its "diff cmd" preference setting. There are some others out there these days that might do a better job than the classical (and often default) unix-derived "diff". I believe the GIT source management tool suite has one that has three or four distinct algorithms you might find helpful; and you need not actually be using GIT to utilize their diff tool. There may be others. In fact, you could even implement your own wrapper to implement exactly what you described (in a Shell or other scripting language) and use THAT as the engine for TkDiff to invoke. But TkDiff is unlikely to ever get into the business of creating engines itself; the definition of "good" is simply far too subjective.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

michael-m - 2018-06-06

status: pending --> closed-wont-fix
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Choose better diff when multiple choices are available

Side-by-side diff viewer, editor and merge preparer

Group

Searches

Help

#47 Choose better diff when multiple choices are available

Discussion