Menu

#52 rendering problem with UTF-8 encoded Devanagari

v1.0_(example)
closed
None
5
2014-08-07
2006-05-24
Bob Eaton
No

I'm attaching the text below as a pdf file in case
the webpage doesn't render things the way I'm
describing below...

I just installed and tried kdiff3 to use as the diff
utility for TortoiseSVN and it seems very promising
(read: definitely better than what is being offered
by default in TortoiseSVN’s diff), but it has a weird
feature when diff’ing to text files that have UTF-8
encoded Devanagari text:

With such runs of text, they typically have to be
rendered as a whole run, rather than character-by-
character, because otherwise, the dependent
diacritics are shown “offset” from the characters on
which they depend. For example, this is the
word ‘book’ in Hindi:

किताब

But this is what I see in kdiff3:

क‌ि‌ताब (this isn't what I mean; see the attached)

Notice the 2nd and fourth characters show up with
their little dotted circles showing how they position
with respect to their dependent character (and in
fact, out of correct order since Uniscribe either: a)
isn’t being used or b) isn’t being given the
characters to render together as a single run.

This would be okay if the word was actually different
between the two panes, because in order for you to
mark it with a different color, you probably have to
render it in a character-by-character way (or at
least for the portion of the run that is different),
but it’s not as nice to look at when there is no
difference between the two...

Is there any way you can send strings to the render
as whole runs rather than character-by-character when
they are the same in both panes?

I am using the Windows version 9.9.0 and I have it
configured to interpret the data files as UTF-8
encoded (thank you for supporting this!).

By the way, normally I would have preferred to use
Arial Unicode MS as the font since that is a nicer
font to display Unicode-encoded Devanagari, but with
that font (which isn't fixed-width), the display was
even worse: It seemed that every character had a
space (or a virtual space offset) between them so
that the above was rendered as:

क ि त ा ब

Discussion

  • Bob Eaton

    Bob Eaton - 2006-05-24

    pdf image of the above text showing the issue correctly

     
  • Joachim Eibl

    Joachim Eibl - 2006-05-27

    Logged In: YES
    user_id=584435

    Hi Bob,
    I'm aware of your problem and intend a solution in future.
    But because characters are displayed differently depending
    on the previous or following characters, it won't be
    possible to show character-by-character-differences.
    But I will see what can be done.
    Cheers,
    Joachim

     
  • Bob Eaton

    Bob Eaton - 2006-05-29

    Logged In: YES
    user_id=1327607

    It looking into it, I think I know what's wrong: the
    underlying QT routines for 'drawText' must render the
    strings of text (at least those which represent common
    text between the two texts being compared) as runs of
    text. It looks like they are rendering the characters of
    the string given one-by-one using GetGlyphOutline. This
    will not work for Indic languages (or at least not for
    Devananagari).
    If you are building KDiff3 as a "wide" application (which
    it looks like you are -- i.e. using the "UNICODE" define),
    then if QT were to use the DrawText or ExtTextOut Win32
    api instead, I think we'll get the behavior we're looking
    for. Of course, that might make Devanagari (and other
    Indic) scripts work, but break something else, but...

     
  • Nobody/Anonymous

    Logged In: NO

    No... I was wrong.

    There may actually be a problem in Qt as well (it looks
    like it only wants to do runs of text if it is latin1,
    which this won't be), but the prior problem is that the
    kdiff code itself is calling drawText one character at a
    time.

    So the thing to do would be (optionally, though i don't
    know what the checkbox should be called), redo
    DiffTextWindowData::writeLine so that it accumulates
    portions of the line that are the same between the two (or
    three) panes and then call drawText...

    it might require some fanagling in Qt as well (to get it
    to treat it as a run of text rather than glyphs)... but
    this would be the first thing that's necessary.

    Sorry, you probably already knew all this...

    Bob

     
  • Bob Eaton

    Bob Eaton - 2006-05-30

    source changes to diff.h and difftextwindow.cpp for Unicode Devanagari fix

     
  • Bob Eaton

    Bob Eaton - 2006-05-30

    Logged In: YES
    user_id=1327607

    I'm attaching the diff.h and difftextwindow.cpp I've
    modified. I'm not comfortable (nor do I have any more time
    to spend on this) becoming a "real" contributor to this
    project, but the attached files do work for my need.

    Basically, it accumulates a run of characters of the same
    color before doing the drawText. Doing it this way, causes
    the runs to be rendered as a unit (which is what many non-
    Roman ranges of Unicode need).

    I haven't checked that it works for RTL. I'm pretty sure
    the wrapping doesn't work (because it's also not based on
    actual line lengths, but rather the simplification you did
    before about the size of "W"). And the color rectangles
    appear to be slightly off.

    Nevertheless, I think if you don't already have a solution
    for this, this might give you some hints.

    Hope this helps,
    Bob

     
  • Joachim Eibl

    Joachim Eibl - 2006-05-31

    Logged In: YES
    user_id=584435

    Thank you for the patch. The basic idea is correct.
    As you also see, there are quite a few things to fix,
    before it can be really made public. (Word wrap,
    RTL-languages, character highlighting, selections for copy
    and paste.)
    Nevertheless if this already helps you, very good!
    Cheers,
    Joachim

     
  • Joachim Eibl

    Joachim Eibl - 2014-08-07
    • status: open --> closed
    • Group: --> v1.0_(example)
     
MongoDB Logo MongoDB