Menu

#2238 WinMerge's diff algorithm routinely gets confused

Branch
open
nobody
None
5
2021-08-10
2021-07-03
Mark Craig
No

I'm belatedly reporting this bug in version 2.16.12.0, but it's been present for at least several years. WinMerge's diff algorithm routinely "gets confused" while comparing complex files, like JSON files, and displays confusing inconsistent results like in Screenshot 1. Even though the only difference between the two opposing lines is a numeric value, 200 versus 90, WinMerge offsets the lines as if they are each unique lines in each file rather than a common line with a minor difference.

In Screenshot 2, the result of simply changing the numeric value to be the same in each file causes WinMerge to suddenly realize they are in fact the same line.

This has been happening for years in this type of complex data-structure file.

1 Attachments

Discussion

  • Mark Craig

    Mark Craig - 2021-07-03

    The screenshots referenced above. The SourceForge UI was giving me an either-or choice between attaching WinMerge.txt from one directory or the screenshot files from another, but not both. >:-/

     
  • Takashi Sawanaka

    I think you have "Enable moved block detection" enabled.
    Somewhere on the left side, there is a line for "Volume: 200 ...", so the "Volume: 200 ..." on the right side is separated as a moved block.
    If you don't want to separate the "Volume: 200 ..." line on the right, please disable "Enable moved block detection".

     
  • Mark Craig

    Mark Craig - 2021-07-04

    If that is the reason why, then I should begin seeing an absence of that behavior: I incidentally disabled the moved block detection shortly before I posted. I wasn't suspicious that it was the cause of this behavior specifically, mind you, but I did see that it was having undesirable effects on these file types.

    I will try to remember to return and confirm or rule out this suspicion. I will probably next need to compare and edit these file in a month or two.

     
  • Mark Craig

    Mark Craig - 2021-08-08

    Here's another example of WinMerge getting needlessly confused. It shouldn't even need to be syntax-aware to interpret this correctly, but it doesn't. I have had the Enable Moved Block Detection option disabled since prior to beginning this ticket.

     
  • Takashi Sawanaka

    I'm not sure what kind of comparison result you want, but you may get the results you expect by manually adjusting the comparison position using a feature called Synchronization Points below.

     

    Last edit: Takashi Sawanaka 2021-08-09
  • Mark Craig

    Mark Craig - 2021-08-09

    Why isn't detection of those syncronization points automatic? In the example screenshot, it was the addition of the comment to the YScale line that caused WinMerge to go berzerk on that section and (apparently) begin to confuse one instance of Preset: GrassDense with a later one, ignore the existence of the more proximal one, and then mistakenly claim that all the intervening lines are new and added, when all that was added was the comment.

    Its behavior doesn't make as much sense to me as a user as it must to you as a developer. I've been a paid programmer in a bygone era, so I understand more than most about application behaviors, but this behavior doesn't make sense to me.

     
  • Takashi Sawanaka

    The reason for this unexpected result is that most diff tools, in order to speed up the process, calculate the hash value of each line and use that hash value to determine if the lines match. To the human eye, the lines may look similar, but the hash values are completely different, so the diff algorithm does not think to synchronize these lines.

    I'm thinking that perhaps calculating the diff on a word-by-word basis instead of a line-by-line basis and then reverting to a line-by-line basis would give relatively good results even if it is not syntax-aware, but I haven't tried that yet.

     
  • Mark Craig

    Mark Craig - 2021-08-09

    This is another good example of confusion. Why did it treat the final additional YScale line any differently than the previous several? WinMerge reacted to it as if it was somehow different than the others that it interpreted correctly (as simple line insertions), and the result is anything but simple. This is at the very end of the file, BTW, so there is nothing else past this point.

     
    • Takashi Sawanaka

      Perhaps the last line on the left has a newline character such as CR+LF, but the last line on the right does not have a newline character such as CR+LF, so it is not considered a matching line.
      If you press Enter at the end of the last line on the right to insert a newline character and compare, you'll get the results you expect.

       

Log in to post a comment.

MongoDB Logo MongoDB