#404 Fix GetTextWithoutGhosts for binary files

closed-rejected
Perry
None
5
2003-11-12
2003-11-07
Perry
No

Essentially remove the ASSERT about lines without
zero-termination. This allows me to load a binary text
file successfully in debug mode.

Discussion

  • Perry
    Perry
    2003-11-07

    Altered and original files (GhostTextBuffer.cpp & MergeDoc.cpp)

     
  • Perry
    Perry
    2003-11-07

    Logged In: YES
    user_id=60964

    Also I set the readonly flag if the file had embedded zeros.

    Note that both files appear to be readonly in this case,
    although my code is only setting the readonly flag for one
    side -- I've not figured out why the other gets marked as
    readonly ?

     
  • Perry
    Perry
    2003-11-07

    Logged In: YES
    user_id=60964

    So, now if there was a way to ask the editor to show the
    first line without eol characters, then we could tell people
    who can't load binary files, to load them this way and find
    out where the embedded zeros are...

     
  • ganier
    ganier
    2003-11-07

    Logged In: YES
    user_id=804270

    Why should we care to load files with zeros inside (or double
    zeros when Unicode) ?
    This character is invalid for text files (we can live we that),
    and it is a very special character for all strings (very very
    annoying).

    For example, my registry export has some 0x4 characters
    inside, as there are binary data. I suppose there may be
    some 0x0 even.
    First I don't really know if there may be some zero in a
    exported registry file, but if it is possible, I would just say
    that it is one more bug from Microsoft.

    What should we do with a zero ? truncate the file : that is
    not what I want. Display as read-only ? But if I want to
    merge. Propose to change the offending zero first ? But if I
    want to keep it.
    And also, according to the bug you found, a new line is
    created when a zero is found. But why ? It is not an EOL. It
    is an invalid character, not a control character (or yes it is a
    control character, the end of string one).
    Also lot of problems with strings : if I want to paste a text
    with a zero inside, the text will be truncated.

    Proposed solution : recognize file as binary and refuse to load
    (I suppose just the unicode code need to be updated). The
    message could be more explicit too.

    PS : for example, notepad replace the zero with a space.
    Why ? It would just change the binary data in my file. And
    not even a warning.

     
  • Perry
    Perry
    2003-11-07

    Logged In: YES
    user_id=60964

    re: pasting text with zero inside

    Right now, the zeros are treated as empty line termination,
    but UniMemFile.

    My proposal is to just display the file contents, more or
    less -- this isn't an exact display, because there is no
    obvious way to show zeros. So this just shows approximately
    how the file looks. And you can't edit it, because we're not
    showing it exactly and anyway, we can't handle editing files
    with zeros in them.

    This is really for only one use -- when the user has a file
    that WinMerge says is binary, but the user knows it is
    really mostly text, and they'd like to see it compared.
    Although the other thing I'd like in this case, is a way to
    tell them (or better, show them) where the offending zeros are.

    In summary, my proposal:

    > What should we do with a zero ?

    I suggest what I did already, which is throw it away and
    pretend the line ended.

    > But if I want to merge ?

    Too bad, not allowed.

    Anyway, this is mostly because we can do this right now,
    AFAICT, just by removing that ASSERT and adding the code I
    added here to make it read-only. However, this might cause
    some other problems -- I didn't test this any further than
    that it will display the file.

     
  • ganier
    ganier
    2003-11-07

    Logged In: YES
    user_id=804270

    > just by removing that ASSERT and adding the code I
    > added here to make it read-only
    Some ASSERT are really useful to detect bugs. And the code
    has kept evoluting since three months (EOL, undo/redo,
    MAC/DOS/UNIX) so I prefer to keep this ASSERT.

    > Right now, the zeros are treated as empty line termination,
    > but UniMemFile.
    Rather replace them with a space (like notepad does). This
    will for sure break nothing.

    Also I want to know there is a zero, I won't remind the
    reason why the read-only flag is set. So a dialog is really
    necessary.

    A cleaner solution is to develop an unpacker to replace the
    zero with another character. Then the user may compare the
    files, but only if he makes this choice.
    Even better the plugin could search a character replacement
    for the zero. If it finds none it replaces with space and sets
    the read-only flag.
    (needs a change to the plugin interface to set the read-only
    flag, but that is easy).

     
  • Perry
    Perry
    2003-11-07

    Logged In: YES
    user_id=60964

    > Rather replace them with a space (like
    > notepad does). This will for sure break nothing.

    What does pretending they are line terminations break ? I
    mean, I have no hopes of editing such a file anyway -- I
    don't care to even try. I'm not in a hurry to change it to
    space, because that doesn't fit as well into the UniMemFile
    interface -- or at least right at the moment, I'm not sure
    how to do that. Right now you can tell there was a zero by
    what happens from the ReadString call (it has no eol, but it
    isn't done); I don't really want ReadString to silently
    change them to spaces without telling the caller.

    > Also I want to know there is a zero, I won't
    > remind the reason why the read-only flag is
    > set. So a dialog is really necessary.

    I think this brings up a problem we actually have already --
    you can't see why the file is read-only. Perhaps the source
    had the read-only attribute. Perhaps the representation
    conversion (ie, codeset conversion) was lossy. (I don't know
    if there are any other possibilities.) We should file this
    as a -- hmm, a bug or an RFE ? An RFE I guess -- some way to
    tell why the file is read-only.

    > (unpacker idea)

    Over my head, so I can't comment.

    Actually I don't really understand what an unpacker is,
    still -- is the concept that it is unpacking the file into
    the display buffer, or is it something else ? My problem is
    that I'm not sure what the "unpacking" is. I think of
    someone opening a box (unpacking), but I'm not sure what the
    box represents, or what the object we're taking out is. That
    is, I just haven't followed along the plugin stuff, and am
    not yet understanding the basic concept -- which is probably
    a simple analogy that I just don't yet get.

     
  • Perry
    Perry
    2003-11-07

    Logged In: YES
    user_id=60964

    Never mind, I read the Plugins/readme.txt, and unpacking
    means transforming the file into a file (or a buffer) for
    display. Pretending that is a zip archive makes the word
    "unpack" make sense to me, and you mentioned that in the
    readme, so I think you may have had that in mind.

    So, I think I can understand that we could make a plugin to
    do this, but, it seems like a lot more work than just
    removing this one ASSERT (which is what this patch does, to
    make it work) ?

    I suppose I could change this patch to instead manufacture
    an actual line termination for the line ending where the
    zero was, and then I wouldn't even have to remove the ASSERT
    -- you want to keep the ASSERT, correct ?

     
  • Kimmo Varis
    Kimmo Varis
    2003-11-07

    Logged In: YES
    user_id=631874

    Why to add some workarounds for binary files. We can't
    compare or edit them. There are many editor capable of
    showing binary files and many binary edit tools. I agree
    messagebox for user should be clearer. Unfortunately I
    couldn't modify texts or add new texts just before 2.0
    release so I used existing one. But it's 2.1 now and all is
    possible. Just tell user file cannot be opened because there
    are embedded zeroes in it yadda yadda.

    OR

    Implement real binary view. One nice binary view/edit
    control is here:
    http://www.codeproject.com/miscctrl/cppdumpctrl.asp

     
  • Perry
    Perry
    2003-11-07

    Logged In: YES
    user_id=60964

    I wasn't proposing adding a workaround; I was proposing
    removing an ASSERT, because we *can* compare binary files,
    right now, if they are mostly text -- that is, we can
    compare the textual parts fine. At least, so it seems to me,
    when I tried it with the sample files I posted for the bug
    (but, they are *very* small sample files!)

    My basic point is that AFAICT, we actually can compare
    binary files right now by treating them as text files. This
    is of course only useful if they have text sections, or if
    they are largely text.

    I do not insist on this however :)

     
  • Perry
    Perry
    2003-11-08

    Logged In: YES
    user_id=60964

    If (as I suspect :)) there is no more discussion on this in
    a couple of days, I'll close this as Rejected.

     
  • Kimmo Varis
    Kimmo Varis
    2003-11-08

    Logged In: YES
    user_id=631874

    > My basic point is that AFAICT, we actually can compare
    > binary files right now by treating them as text files.

    Ok. But how useful this half-way binary support really is?
    Reliable? If we open door for binary file compares, people
    start using it and sending reports about cases it does not
    work; we get a lot of bug reports about feature we are not
    supporting... Who has time to look for those? For me this is
    binary thing (no pun intented): we have real binary file
    support or no support.

    Hmm. Of course this debends on how we define binary files.
    :) Current algorithm searches for zeros.

     
  • Perry
    Perry
    2003-11-08

    Logged In: YES
    user_id=60964

    Lets move this discussion to developer forum. I'll go post
    there now.

     
  • Perry
    Perry
    2003-11-12

    Logged In: YES
    user_id=60964

    Closing this as Rejected.
    Its TTL (time-to-live) field expired :)

     
  • Perry
    Perry
    2003-11-12

    • assigned_to: nobody --> puddle
    • status: open --> closed-rejected