#47 Choose better diff when multiple choices are available


I have a file that contains repeated paragraphs along the lines of:

$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget social kansas pantry donation exemption" \ --overrides_policy_name="budget consumer auto" \ --force \ edit

$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget allow transfers within Knowledge PA" \ --overrides_policy_name="budget consumer auto" \ --force \ edit

$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget allow transfers within Social PA" \ --overrides_policy_name="budget consumer auto" \ --force \ edit

$POLICY_CLI \ --policy_store_address=$FLAGS_policy_store_address \ --evaluator_address=$FLAGS_evaluator_address \ --policy_name="budget auto approval" \ --force \ delete

In the original file, the first and last paragraph above are present along with other preceding paragraphs and some trailing stuff. I inserted the 2nd and 3rd paragraphs.

I inserted 16 lines, and diff has 6 choices as to which 16 lines it thinks I inserted. At the current time, diff chooses the last available choice. I claim that, when possible, diff should prefer a choice of lines that takes blank lines into account. Specifically, we can consider 4 lines for a choice. The line before the inserted text, the first line of the inserted text, the last line of the inserted text, and the line after the inserted text.

As a first approximation, consider counting the number of those 4 lines that are blank and using that as the score for a choice. We will then take a choice with the highest score. In this case, there are two choices that have a score of 2 and 4 choices with a score of zero.

As a refined approximation, we might slightly prefer a choice that is preceded by a blank line and ends with a blank line to a choice that starts with a blank line and is follwed by a blank line. So we might score 10 points for a blank line, and 1 point for a preceding blank line and 1 point for an ending blank line. On the other hand, this is also the last high scoring choice.

This pattern occurs sufficiently frequently that diff would work better if it incorporated this logic.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks