Menu

#2849 wxt terminal fails to display some Japanese (EUC-JP) string

open
nobody
None
2026-01-28
2026-01-06
No

I use EUC-JP japanese encoding (not Shift_JIS) on my computer. Current wxt terminal can display most of all japanese charcter correctly in default except some EUC-JP Japanese string.

For example, "2025\xc7\xaf", where "\xc7\xaf" means Japanese "year" in EUC-JP encoding. But, wxt terminal cannot display the string correctly, because the string may be recognized as UTF-8 string "2025" + "<u+01ef>".</u+01ef>

This is done by gp_cairo_convert() in src/wxterminal/gp_cairo.c:

    if (g_utf8_validate(string, -1, NULL)) {
        string_utf8 = g_strdup(string);
    } else {
        charset = gp_cairo_get_encoding(plot);
        string_utf8 = g_convert(string, -1, "UTF-8", charset, 
          &bytes_read, NULL, &error);
    }

The function g_utf8_validate() may recognize the string "2025\xc7\xaf" as UTF-8 encoding string. So, I want the option to skip the check by g_utf8_validate(). I don't want new encoding name for EUC-JP, and
it does not solve the problem.

Though above code may select UTF-8 firstly, it may not be valid for the mechanism of "set encoding". When the string may be recognize utf8 string by g_utf8_validate(), above code always use it as utf8 string,
even if any encoding were set by "set encoding".

I suggest to skip the check by g_utf8_validate() in the case "set encoding ..." were used.

Discussion

  • Ethan Merritt

    Ethan Merritt - 2026-01-06

    You are correct that g_utf8_validate() should not be called if the current character encoding is known to be something other than utf8. I will need your assistance to figure out how to test for this correctly so that it handles your situation.

    Short answer

    I have added an internal value S_ENC_EUCJP and attempted to modify the code for set encoding locale so that it detects EUC-JP as the current encoding. The change works under linux but I cannot test under Windows. The git commit is d2ee58da8b6. The test case I used is attached below.

    Please test on your machine three times,
    once with no set encoding command ,
    once with set encoding locale, and
    once with set encoding default.

    Long answer

    If I am reading the code correctly, gnuplot's function gp_cairo_get_encoding(plot) is intended to report the current character encoding as a string that can be passed to g_convert(). It first looks to see if the encoding has been set by the gnuplot command "set encoding". If not, it calls the glib function g_get_charset(). So there are two cases that we need to handle.

    (1) There is no command set encoding euc-jp, and there is no internal definition S_ENC_EUCJP. There is a command set encoding locale, but I do not know what happens when it is used on a machine where the default locale is EUC-JP. I think on a linux machine the command does not work, because there is no internal value S_ENC_EUCJP to report the result. I can fix this.

    I will need help with Windows, however. It looks like the code tries to translate a Windows codepage into a gnuplot encoding, but again this will fail because there is no value S_ENC_EUCJP to report the result. The information I found is that codepage 20932 indicates EUC-JP. I can add a test for this, but I will need help to test whether it works.

    (2) The glib function g_get_charset(const char **charset) returns TRUE if the current locale is UTF-8 and FALSE otherwise. However on Windows the documentation says that it tests the "system default ANSI code-page" instead of testing the current C locale. I do not know whether that is correct for your situation. It also returns a string to pass to g_convert(), but again I do not know whether the string is correct for your situation.

     

    Last edit: Ethan Merritt 2026-01-06
  • Shigeharu TAKENO

    Thanks for your sooner reply and adding S_ENC_EUCJP.

    I tested modified code for the script

     set term wxt
     set encoding 
     # set encoding locale
     # set encode default
     set title "2025[\xc7\xaf]"
     plot x
    

    They do not display japanese title correctly, and in the case "set encoding locale" gnuplot says

     warning: Error converting locale "ja_JP.eucJP" to codepage number
    

    Only in the case

     set encoding "EUC_JP"
    

    gnuplot displays the Japanese string correctly, because the name "EUC_JP" is added in encoding_names[] of term.c.

    Unfortunately, Locale name for japanese code depens on the operating system.
    FreeBSD:
    locale name of EUC-JP = ja_JP.eucJP (old FreeBSD: ja_JP.EUC)
    locale name of Shift_JIS = ja_JP.SJIS
    Solaris:
    locale name of EUC-JP = ja_JP.eucJP, or ja (default)
    locale name of Shift_JIS: ja_JP.PCK

    setlocale() returns these names, so we may need to modify

    if (l && (strstr(l, "EUC-JP") || strstr(l, "euc-jp")))
    

    in src/encoding.c, for example

    if (l && (strstr(l, "EUC-JP") || strstr(l, "euc-jp") || strstr(l, "eucJP") || (strlen(l) == 2 && strcmp(l, "ja"))))
    

    The entry for EUC-JP may be need in set_encoding_tbl[] in term.c.

    On MS-Windows, certainly codepage 20932 is equal to EUC-JP, but most of Japanese Windows user don't use EUC-JP. EUC-JP is mainly for Unix users. Japanese Windows users use Shift_JIS or UTF-8 only.

    Well, I reported some EUC-JP string may be recognize UTF-8 string, the same problem may be occur for some Shift_JIS string, because Shift_JIS is 8bit 2byte string. For example, the byte sequence

    0xE5 0x81 0x8F 0xE5 0x81 0x9c
    

    is Japanese 2 characters <u+504f> <u+505c> as UTF-8 string, and it is Shift_JIS 3 characters [E581][8FE5][819C]. So g_convert() should be used for the case S_ENC_SJIS too.</u+505c></u+504f>

     
  • Ethan Merritt

    Ethan Merritt - 2026-01-07

    I think I see an additional problem.
    How should we handle Unicode escape sequences?
    For example to create a label containing a Greek letter pi: 2π * x
    it should be possible to use an escape sequence set label "2\U+03C0 * x".

    The program knows how to convert this to a UTF-8 character sequence that would be accepted by the cairo terminals. But with this change in place the call to iconv would try to convert this byte sequence as if it were EUC-JP and give the error
    Unable to convert "2π": the sequence is invalid in the current charset (EUC-JP), falling back to iso_8859_1

    Is there a standard solution to printing Greek characters or other general symbols in a EUC-JP environment?

     
  • Shigeharu TAKENO

    The current gnuplot seems to permit the \U+xxxx sequence only for S_ENC_UTF8. I agree that we can not use \U+xxxx for S_ENC_EUCJP. Because EUC-JP table include greek alphabets and some simple symbols, we cant put them directly. The user which wants to use any special character not in EUC-JP table may use "set encoding utf8" and UTF-8 script.

    Well, I found the function strlen_sjis() for gp_strlen() in encoding.c, so the function strlen_eucjp() may be need for S_ENC_EUCJP. Standard EUC-JP is 8bit 2byte character code, the first byte of EUC-JP character is 0x8e or in 0xa1-0xfe, and the second byte is in 0xa1-0xfe. Though there is extended 3byte EUC-JP code, it is not implemented in almost japanese environment, and we may not think to treat it.

     
    • Ethan Merritt

      Ethan Merritt - 2026-01-08

      I was thinking about it the wrong way. The cairo terminals already convert all text to UTF-8. Either it was recognized as UTF-8 or it was run through g_convert to make it UTF-8. So at that point it is OK to expand the Unicode escape sequences \U+acbd into more UTF-8. I made that change, and it seems to work. Your EUC-JP strings can now include Unicode escape sequences if the locale is recognized, but only for the cairo terminals. I will look to see if any other terminals can do something similar.

      I will also look into strlen_eucjp(). Did you know if there is an existing implementation somewhere?

       
  • Shigeharu TAKENO

    I tested current git version. Then the script all.dem stopped at
    the command "test palette" in pm3d.dem.

    gnuplot> set encoding EUC_JP
                                ^
    line 68: unrecognized encoding specification; see 'help encoding'.
    

    I tested on wxt terminal, then the same message appeared for the case "set encoding locale":

    gnuplot> set encoding default
    gnuplot> test palette          # no problem
    gnuplot> set encoding locale  
    gnuplot> test palette
    

    gnuplot> set encoding EUC_JP
    ^
    line 68: unrecognized encoding specification; see 'help encoding'.

    gnuplot>
    

    But images generated by above commands are correct.
    Also, "test" command also fails for "set encoding locale".

    gnuplot> set encoding default
    gnuplot> test
    gnuplot> set encoding locale  
    gnuplot> test               
    Unable to convert "utf8: ": the sequence is invalid in the current 
    charset (EUC-JP), falling back to iso_8859_1
    gnuplot>
    

    The Image generated the last command is incorrect (attached file).
    However, it may not a bug because the "test" command put a UTF-8 string
    without changing encoding.

     
  • Shigeharu TAKENO

    Well, EUC-JP string "2025\xc7\xaf" are printed correctly for set encoding locale, set encoding "EUC_JP", and without "set encoding" command.

    The compound string of EUC-JP character and "\U+xxxx" are also printed correctly for the same situation. Thank you.

     

Log in to post a comment.