I have a file that contains repeated paragraphs along the lines of:
$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget social kansas pantry donation exemption" \ --overrides_policy_name="budget consumer auto" \ --force \ edit
$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget allow transfers within Knowledge PA" \ --overrides_policy_name="budget consumer auto" \ --force \ edit
$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget allow transfers within Social PA" \ --overrides_policy_name="budget consumer auto" \ --force \ edit
$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget auto approval" \ --force \ delete
In the original file, the first and last paragraph above are present along with other preceding paragraphs and some trailing stuff. I inserted the 2nd and 3rd paragraphs.
I inserted 16 lines, and diff has 6 choices as to which 16 lines it thinks I inserted. At the current time, diff chooses the last available choice. I claim that, when possible, diff should prefer a choice of lines that takes blank lines into account. Specifically, we can consider 4 lines for a choice. The line before the inserted text, the first line of the inserted text, the last line of the inserted text, and the line after the inserted text.
As a first approximation, consider counting the number of those 4 lines that are blank and using that as the score for a choice. We will then take a choice with the highest score. In this case, there are two choices that have a score of 2 and 4 choices with a score of zero.
As a refined approximation, we might slightly prefer a choice that is preceded by a blank line and ends with a blank line to a choice that starts with a blank line and is follwed by a blank line. So we might score 10 points for a blank line, and 1 point for a preceding blank line and 1 point for an ending blank line. On the other hand, this is also the last high scoring choice.
This pattern occurs sufficiently frequently that diff would work better if it incorporated this logic.
The key phrase in your description is in your last line "...frequently that diff would work better..."; you are correct, but TkDiff is designed as a "wrapper" (if you will) that surrounds the external diff engine. As such, it does not make the decisions on where any particular region begins or ends -- that is what the engine does. What you want is a different engine, which TkDiff will allow you to specify via its "diff cmd" preference setting. There are some others out there these days that might do a better job than the classical (and often default) unix-derived "diff". I believe the GIT source management tool suite has one that has three or four distinct algorithms you might find helpful; and you need not actually be using GIT to utilize their diff tool. There may be others. In fact, you could even implement your own wrapper to implement exactly what you described (in a Shell or other scripting language) and use THAT as the engine for TkDiff to invoke. But TkDiff is unlikely to ever get into the business of creating engines itself; the definition of "good" is simply far too subjective.