Menu

#2027 With D2D some simplified chinese characters rendered as traditional chinese

Bug
closed-fixed
nobody
5
2021-04-23
2018-07-30
No

Copied from [#2026] as separate bug.

There is anothor bug with D2D, orignally reported at https://github.com/rizonesoft/Notepad3/issues/191 (see above d2d-bug.txt).

In D2D some simplified chinese characters (which has new character forms, see https://en.wikipedia.org/wiki/Xin_Zixing) been rendered as traditional chinese characters or it's traditional forms.

Related

Bugs: #2026
Bugs: #2080
Feature Requests: #1364

Discussion

1 2 > >> (Page 1 of 2)
  • Zufu Liu

    Zufu Liu - 2018-08-10

    I probably figured out why.

    The problem is Direct2D preserve rendering state.

    The "en-us" in FontCached::FontCached(const FontParameters &fp) caused the problem:

            HRESULT hr = pIDWriteFactory->CreateTextFormat(wszFace, nullptr,
                static_cast<DWRITE_FONT_WEIGHT>(fp.weight),
                style,
                DWRITE_FONT_STRETCH_NORMAL, fHeight, L"en-us", &pTextFormat);
    

    The screenshots from top to bottom (default font is set to DejaVu Sans Mono):
    en-us.txt: "en-us" using D2D, some unknown font (maybe MingLiU 微軟細明體 or PMingLiU 微軟新細明體) is used for Chinese characters.
    zh-cn.txt: "zh-cn" using D2D, Microsoft YaHei 微软雅黑 (or similar font) is used for Chinese characters.
    zh-tw.txt: "zh-tw" using D2D, Microsoft JhengHei 微軟正黑體 (or similar font) is used for Chinese characters
    gdi.txt: "en-us" using GDI, SimSun-PUA 宋体-PUA (or similar font) is used to Chinese characters.

    Replace L"en-us" with empty string has same result with L"zh-cn", repaced with NULL crashed.

    A related question is https://stackoverflow.com/questions/28397971/idwritefactorycreatetextformat-failing.
    Anothor related "unknown font" bug for Korean at: https://github.com/XhmikosR/notepad2-mod/issues/121.

    A tempory solution is to replace L"en-us" with empty string to use "default" fallback font.
    However, even with this the rendering result is not better than GDI: only in GDI rendering result one Chinese character is two width compared to Latin letter.
    A true fix I think is use some way to set the fallback font to default system font, not defualt system UI font.

    Using Windows 10 Notepad, set font to DejaVu Sans Mono, SimSun 宋体 or SimSun-PUA 宋体-PUA is used as fallback font.

     

    Last edit: Zufu Liu 2018-08-10
  • Neil Hodgson

    Neil Hodgson - 2018-08-10
    • status: open --> open-accepted
     
  • Neil Hodgson

    Neil Hodgson - 2018-08-10

    If I am interpreting this correctly then the text shows correctly if you specifically ask for a font with simplified/traditional characters with the bug only occurring when font substitution is used because the requested font does not contain Chinese characters.

    It might be possible to create a font collection object to use as the second parameter of CreateTextFormat to influence the selection of substitution fonts.

    If there is value in setting the localeName then an API could be implemented.

     
  • Zufu Liu

    Zufu Liu - 2018-08-11

    I think it's related to font substitution, there seems some other old bugs related to font substitution too.

    Replace the localeName with empty string (current locale), using the "default" fallback font in current locale, at least make the rendered text readable, not contains some unexpected glyph.

    From my current observation, (for above Chinese text) the default UI font for the specified locale is used as fallback font when localeName is set to empty string, "zh-cn" or "zh-tw"; Not the default system font like GDI or Windows Notepad.
    And it's known what fallback font is used when localeName is set to "en-us".

    The localeName is very related to the setlocale() C funcation, some links about locale name:
    https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setlocale-wsetlocale

    https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings

    https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html

    I think CreateTextFormat with specified localeName will assume the text is preferred to that locale, then chosen fallback font preferred to that locale.

     
  • Zufu Liu

    Zufu Liu - 2018-08-11

    I think maybe two APIs need to be implemented:

    one to set the localeName, the default is empty, client can set this according to file encoding, GetACP() code page, GetUserDefaultLocaleName(), etc.

    another one to set a fallback font list for IDWriteFontCollection, the default is empty.

     

    Last edit: Zufu Liu 2018-08-11
  • Neil Hodgson

    Neil Hodgson - 2018-08-12

    In libraries, prefer explicit locale parameters in all cases over ambient locale. This allows use and testing of all features on all installations. Its also better when an application can be using multiple locales at once - for example when writing a script in an English locale that operates on a Chinese language file. With an explicit locale parameter, applications can then define their preferred locale as the ambient locale by calling GetACP or GetUserDefaultLocaleName if they want. The application can also have settings that override the ambient locale.

    Before defining a fallback font list API, it needs to be determined how to set one up and whether this will actually do anything. The font collection may only be used for the named font and not for substitution fonts - I haven't found any good documentation about that parameter's effects.

     
  • Zufu Liu

    Zufu Liu - 2018-08-12

    Only find a litte document about IDWriteFontCollection at
    https://docs.microsoft.com/en-us/windows/desktop/DirectWrite/introducing-directwrite#accessing-the-font-system
    https://docs.microsoft.com/en-us/windows/desktop/DirectWrite/custom-font-collections


    The IDWriteFontCollection object is a collection of font families. DirectWrite provides access to the set of fonts installed on the system through a special font collection called the system font collection. This is obtained by calling the GetSystemFontCollection method of the IDWriteFactory object. An application can also create a custom font collection from a set of fonts enumerated by an application-defined callback, that is, private fonts installed by an application, or fonts embedded in a document.

    Which is rather complicated (a Font Collection Loader, for private font or font not installed in system wide). Seems not suitable for our purpose.

     

    Last edit: Zufu Liu 2018-08-12
    • Neil Hodgson

      Neil Hodgson - 2018-08-12

      Yes, that sounds like a method for adding rather than excluding or prioritising fonts so probably not useful.

       
  • Zufu Liu

    Zufu Liu - 2018-08-12

    Hi Neil, Is new API like SetFontQuality and GetFontQuality will be added to set/get the locale?

     
  • Neil Hodgson

    Neil Hodgson - 2018-08-12

    The first step with the API is to determine if the locale should be set per-style (like StyleSetFont) or over all styles (like SetFontQuality).

     
  • Zufu Liu

    Zufu Liu - 2018-08-13

    Per-style may useful, for many programming languages, comment and string may contains non-ASCII characters. For HTML/XML, their inner text often contains non-ASCII characters.

    Maybe a better approach is to define a over all styles (current document default) locale like font quality, but allow it be overridden by individual styles, like HTML/CSS style inheritance.

    A document default locale will simplify usage.

     
  • Zufu Liu

    Zufu Liu - 2018-08-13

    A document default locale will simplify usage, though can it be implemented by client application itself, just like define a base font.

     
    • Neil Hodgson

      Neil Hodgson - 2018-08-14

      OK, per-style locales could be added later.

       
  • Zufu Liu

    Zufu Liu - 2018-08-14

    Seems StyleSetCharacterSet is related to this, except characterSet is not used in pTextFormat (characterSet can be converted to a locale name in some not flexible way).

    What make me more confusing is: why font character set is related to text code page.

    From SurfaceD2D::SetFont():

        codePageText = codePage;
        if (pfm->characterSet) {
            codePageText = Scintilla::CodePageFromCharSet(pfm->characterSet, codePage);
        }
    

    Document about lfCharSet:
    https://docs.microsoft.com/en-us/windows/desktop/api/wingdi/ns-wingdi-taglogfonta

     

    Last edit: Zufu Liu 2018-08-14
    • Neil Hodgson

      Neil Hodgson - 2018-08-15

      In Scintilla, the code page (called dbcsCodePage) is used primarily to group bytes into characters and may only take on one of 7 values associated with DBCS, UTF-8 or 0 for single byte. If the text is in Greek, code page is 0 and character set is 161. This is all caused by the early history before Unicode support on Windows 95.

       
  • Zufu Liu

    Zufu Liu - 2018-08-14

    Please forgive my previous comment, and lfCharSet (characterSet) is for glyphs mapping within the save font, not for fallback font.

     
    • Neil Hodgson

      Neil Hodgson - 2018-08-14

      The low-level API parameters for encodings and locale overlap and have sometimes subtly different meanings. Scintilla has always exposed both code page and character set although that complicates development by application authors who would often like a simpler single 'encoding' setting. However, with the UTF-8 code page working for all languages, there are language subtleties like better Korean character choices that can be achieved by specifying character set.

      Character set could be used as an approximate indication of locale intent - it does at least allow choosing between the 4 main CJK environments. However, I suspect it will later be found to be inadequate and an additional locale argument will be worth implementing.

       
  • Zufu Liu

    Zufu Liu - 2018-08-15

    Indeed, a locale argument is required.

    Most programming (monospace) fonts only support Latin-1, on my system the Windows font chooser dialog doesn't show CJKV in the Script/Regon ComboBox for these fonts, only the first and default "Western European" is useful for me.

     
    • Neil Hodgson

      Neil Hodgson - 2018-08-15

      Despite choosing a latin font or a European locale, you should still see Asian characters as well as emoji due to font substitution. If only the named font was used then software that draws text would have to perform quite complex queries into font coverage then use that to segment the text and draw each piece in a font that includes all the characters in that piece.

       
  • Zufu Liu

    Zufu Liu - 2019-03-15

    Please test characters in "Examples of language-dependent glyphs" section from https://en.wikipedia.org/wiki/Han_unification with different localeName (“zh-Hans”, "zh-Hant", "ko", "vi") for CreateTextFormat().

    The title is not suitable, maybe something like "With D2D some Chinese characters (Han ideographs) may rendered with glyphs not match user's locale" is better.

     
    • Neil Hodgson

      Neil Hodgson - 2019-03-15

      When the localeNames are used, the glyphs change but the results are different from the Wikipedia page. For example, for 76F4 with "zh-Hant", I'm seeing a similar image to "zh-Hans", not the image in the table. Does it require a particular font name choice?

       
      • Zufu Liu

        Zufu Liu - 2019-03-16

        It works for me, except for Vietnamese where browse shows in Chinese simplified form, hard-coded "vi", "vi-vi", "vi-nom" shows in Japanese form. even with vietnamese supplemental fonts is installed.

        from Wiki last sentence before the table:

        This only works for fallback glyph selection if you have CJK fonts installed on your system and the font selected to display this article does not include glyphs for these characters.

         
  • Neil Hodgson

    Neil Hodgson - 2019-03-16

    Here's a rough implementation.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.

MongoDB Logo MongoDB