Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#55 Broken Subversion's internal diff

closed-wont-fix
nobody
None
5
2007-07-02
2007-06-29
No

Constructing a topic from Subversion repository could fail if there are files with wrong EOL markers.

I've run into this error when tried to create a topic having file with \r\r\n (0D 0D 0A) markers.

It seems that Subversion counts lines by 0D markers but Codestriker's unidiff reader counts them by 0A marker so it uses wrong line count for that file and fails to parse Subversion's diff.

The problem persist in both 1.4.2 and 1.5.0pre Subversion on both Windows and Linux. It could be resolved by using external diff (at least on Linux) so using it should be recommended in codestriker documentation probably.

Here are first lines of problematic file's svn diff output.

Subversion's internal diff:
Index: editor/hlp/hid_sc_minimize.htm
===================================================================
--- editor/hlp/hid_sc_minimize.htm (revision 0)
+++ editor/hlp/hid_sc_minimize.htm (revision 4)
@@ -0,0 +1,32 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">^M+^M
+<HTML>^M+^M
+<HEAD>^M+^M

External diff (diff-cmd = diff):
Index: editor/hlp/hid_sc_minimize.htm
===================================================================
--- editor/hlp/hid_sc_minimize.htm (revision 0)
+++ editor/hlp/hid_sc_minimize.htm (revision 4)
@@ -0,0 +1,16 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">^M^M
+<HTML>^M^M
+<HEAD>^M^M

Discussion

  • David Sitsky
    David Sitsky
    2007-06-29

    Logged In: YES
    user_id=208928
    Originator: NO

    Can you attach an actual diff file with the right line counts? I'd like to add it as a test case so I can fix it. Thanks.

    How can you manage to get \r\r\n line-endings? That sounds very strange...

     
  • Repository dump

     
    Attachments
  • Logged In: YES
    user_id=324736
    Originator: YES

    Here is Subversion dump with two revisions - 1 (a.txt) and 2 (b.txt). Each of committed files have three lines, each one ends by \r\r\n.

    Creating Codestriker topic works for revision range 0-1 but fails for range 0-2 (it seems parsing breaks just because there is something after first diff chunk).

    I really don't know how my coworker created it. Most probably it was autogenerated by Visual Studio 2003 but I'm not quite sure.
    File Added: 0d0d0a.dump

     
  • David Sitsky
    David Sitsky
    2007-06-30

    Logged In: YES
    user_id=208928
    Originator: NO

    There is a pretty simple fix for this. Go to lib/Codestriker/FileParser/Parser.pm line 62, and change:

    $line =~ s/\r\n/\n/go;

    to

    $line =~ s/\r+\n/\n/go;

    Haven't had time to test it yet, but that should work.

     
  • David Sitsky
    David Sitsky
    2007-06-30

    Logged In: YES
    user_id=208928
    Originator: NO

    Actually - I'm not sure if I have really fixed this. Can you upload the full diff file here? Don't paste it as a comment, as I lose important characters. Don't upload an svn dump - I need the actual diff file for testing purposes. Thanks.

     
  • Logged In: YES
    user_id=324736
    Originator: YES

    > Go to lib/Codestriker/FileParser/Parser.pm line 62, and change:
    > $line =~ s/\r\n/\n/go;
    >to
    > $line =~ s/\r+\n/\n/go;
    >Haven't had time to test it yet, but that should work.

    Hmm... I'm quite sure this is not enough. The problem is not in line endings themselves but in line counts Unidiff reader relies on (marked by @@).
    For files dump Subversion's internal diff outputs:
    $svn diff -c 1 file:///tmp/r
    Index: a.txt
    ===================================================================
    --- a.txt (revision 0)
    +++ a.txt (revision 1)
    @@ -0,0 +1,6 @@
    +10
    +20
    +30

    But whne external diff is used the same command gives:
    $ svn diff -c 1 file:///tmp/r
    Index: a.txt
    ===================================================================
    --- a.txt (revision 0)
    +++ a.txt (revision 1)
    @@ -0,0 +1,3 @@
    +10
    +20
    +30

    That is a problem: Subversion's internal diff _doubles_ line count for \r\r\n files. I don't think this could be resolved by fixing line endings only.

     
  • David Sitsky
    David Sitsky
    2007-07-02

    Logged In: YES
    user_id=208928
    Originator: NO

    I see what you mean now. Maybe we need something like:

    $line =~ s/\r\n/\n/go;
    $line =~ s/\r/\n/go;

    That way, all line endings will be converted to \n, and should give us the right result in terms of line counts. As usual, I'd need to actually test this. Its not great for your a.txt though, as it will have extra blank lines in there, since there are really 6 lines.

     
  • David Sitsky
    David Sitsky
    2007-07-02

    Logged In: YES
    user_id=208928
    Originator: NO

    You should report this to the subversion guys. Ideally, the internal diff should produce the same result as the external diff with the canonical set of flags.

     
  • David Sitsky
    David Sitsky
    2007-07-02

    • status: open --> closed-wont-fix
     
  • David Sitsky
    David Sitsky
    2007-07-02

    Logged In: YES
    user_id=208928
    Originator: NO

    I'm going to close this off - there is no real way we can account for this kind of diff file, it is simply broken. You'll have to get the subversion guys to fix it. I'd be interested to hear how these diffs were actually created - I haven't seen this happen anywhere else, on both windows and unix machines.