Full-width / half-width issue and solution

2011-12-14
2013-05-13
  • Iwasa Kazmi

    Iwasa Kazmi - 2011-12-14

    I have added an improvement to solve full-with / half-width issue.

    I intend to include this change to the next beta release, but some people
    may have problem by this change.

    You can check the new feature with the snapshot build:
    Poderosa-snapshot-20111211-bin.zip
    Poderosa-snapshot-20111211-src.zip

    If you feel some trouble about this change, or have any suggestions, please post here.

    /// What is the problem ? ///

    Poderosa displays full-width (double-width) characters using CJK font, and
    displays half-width (single-width) characters using main font.

    But some characters are difficult to determine which width should be used to display them.
    Poderosa uses Unicode internally, and Unicode doesn't specify how wide the character is.

    Currently, Poderosa treats a character as full-width character if the character is contained
    in the Japanese character set.
    For example, a box-drawing character U+250C (BOX DRAWINGS LIGHT DOWN AND RIGHT) will be
    displayed with double-width.

    However, this causes problem if an application outputs text on the assumption that
    such characters will be displayed as half-width form.

    /// What was changed ? ///

    Which width should be used to display a character is determined according to
    the current encoding setting.

    For example, U+00B0 (degree sign) will be displayed with half-width if the user is
    using ISO 8859-1.

    If the user is using CJK character set like ShiftJIS or GB2312, degree sign character
    will be displayed with double-width.
    ShiftJIS or GB2312 specify degree sign as full-width form, and the character
    is converted to U+00B0.

    UTF-8 is the problem. Unicode doesn't specify how wide the character is.

    So I added another encoding setting "UTF-8 latin".

    It uses UTF-8 for the encoding, but the characters that have lower code than U+2000
    are always displayed as half-width character.

    Previous "UTF-8" encoding setting works as "UTF-8 CJK".
    It uses UTF-8 for the encoding, and some characters, contains symbols, graphical characters,
    greek letters and cyrillic letters, that are contained in east asian character set,
    are displayed as full-width character.

    /// Internal changes ///

    Some characters, that have lower code than U+2000, and should be displayed in full-width,
    are mapped to Unicode's private-use area (U+E000 - U+F8FF).
    The mapped characters are displayed as full-width character using CJK font.

    Other characters that have lower code than U+2000 are displayed as half-width character
    using main font.

    By this mechanism, half-width form and full-width form of a character can coexist
    on the same console buffer.

     
  • Elmue

    Elmue - 2011-12-15

    Hello

    I studied your changes.
    Using an encoding UTF8-Latin was a good idea to solve the problem.

    But moving the characters to the private-use area and then moving them back is an awkward solution.
    What you did with this trick was to store the information, if a character is wide or narrow, in the Unicode character itself.

    This could be done better:
    I would store this information in the GWord.
    The current design of GWord and GLine is not smart.

    Every GWord should have an own character buffer, (I would even remove the character buffer in GLine).
    Every GWord should store the length of its characters as a fixed value that must not be determined anew each time.
    Every GWord should store if the characters in it are wide or narrow as a fixed value that must not be determined anew each time.
    I think that even WIDECHAR_PAD can be removed completely, which also was not a smart solution.

    This would drastically improve drawing speed and remove unnecessary Unicode conversions.
    One could even implement a Render() function directly into GWord.

    Elmü

     
  • Iwasa Kazmi

    Iwasa Kazmi - 2011-12-16

    That's good idea.

    But I'm worried about some points.

    1. The cost of GLineManipulator's Load() and Export() methods will increase.
      These methods may be called more frequently than OnPaint event happens.

    GLineManipulator will still need WIDECHAR_PAD.
    Creating more objects for GWords, or expanding GWords to a single buffer will be also needed.

    1. Major problem caused by removing WIDECHAR_PAD is that
      it is not able to check quickly which side of a full-width character
      is on the specified column.
      Such check is required to determine a suitable range of a text which contains full-width characters.

    It is not required frequently, but more overhead should be prevented.

    Anyway, I will try to change GLine later.

     

Log in to post a comment.