gnuplot / Bugs / #2849 wxt terminal fails to display some Japanese (EUC-JP) string

Ethan Merritt - 2026-01-06

You are correct that g_utf8_validate() should not be called if the current character encoding is known to be something other than utf8. I will need your assistance to figure out how to test for this correctly so that it handles your situation.

Short answer

I have added an internal value S_ENC_EUCJP and attempted to modify the code for set encoding locale so that it detects EUC-JP as the current encoding. The change works under linux but I cannot test under Windows. The git commit is d2ee58da8b6. The test case I used is attached below.

Please test on your machine three times,
once with no set encoding command ,
once with set encoding locale, and
once with set encoding default.

Long answer

If I am reading the code correctly, gnuplot's function gp_cairo_get_encoding(plot) is intended to report the current character encoding as a string that can be passed to g_convert(). It first looks to see if the encoding has been set by the gnuplot command "set encoding". If not, it calls the glib function g_get_charset(). So there are two cases that we need to handle.

(1) There is no command set encoding euc-jp, and there is no internal definition S_ENC_EUCJP. There is a command set encoding locale, but I do not know what happens when it is used on a machine where the default locale is EUC-JP. I think on a linux machine the command does not work, because there is no internal value S_ENC_EUCJP to report the result. I can fix this.

I will need help with Windows, however. It looks like the code tries to translate a Windows codepage into a gnuplot encoding, but again this will fail because there is no value S_ENC_EUCJP to report the result. The information I found is that codepage 20932 indicates EUC-JP. I can add a test for this, but I will need help to test whether it works.

(2) The glib function g_get_charset(const char **charset) returns TRUE if the current locale is UTF-8 and FALSE otherwise. However on Windows the documentation says that it tests the "system default ANSI code-page" instead of testing the current C locale. I do not know whether that is correct for your situation. It also returns a string to pass to g_convert(), but again I do not know whether the string is correct for your situation.

Last edit: Ethan Merritt 2026-01-06

Bug_2849.gp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shigeharu TAKENO - 2026-01-07

Thanks for your sooner reply and adding S_ENC_EUCJP.

I tested modified code for the script

set term wxt set encoding # set encoding locale # set encode default set title "2025[\xc7\xaf]" plot x

They do not display japanese title correctly, and in the case "set encoding locale" gnuplot says

warning: Error converting locale "ja_JP.eucJP" to codepage number

Only in the case

set encoding "EUC_JP"

gnuplot displays the Japanese string correctly, because the name "EUC_JP" is added in encoding_names[] of term.c.

Unfortunately, Locale name for japanese code depens on the operating system.
FreeBSD:
locale name of EUC-JP = ja_JP.eucJP (old FreeBSD: ja_JP.EUC)
locale name of Shift_JIS = ja_JP.SJIS
Solaris:
locale name of EUC-JP = ja_JP.eucJP, or ja (default)
locale name of Shift_JIS: ja_JP.PCK

setlocale() returns these names, so we may need to modify

if (l && (strstr(l, "EUC-JP") || strstr(l, "euc-jp")))

in src/encoding.c, for example

if (l && (strstr(l, "EUC-JP") || strstr(l, "euc-jp") || strstr(l, "eucJP") || (strlen(l) == 2 && strcmp(l, "ja"))))

The entry for EUC-JP may be need in set_encoding_tbl[] in term.c.

On MS-Windows, certainly codepage 20932 is equal to EUC-JP, but most of Japanese Windows user don't use EUC-JP. EUC-JP is mainly for Unix users. Japanese Windows users use Shift_JIS or UTF-8 only.

Well, I reported some EUC-JP string may be recognize UTF-8 string, the same problem may be occur for some Shift_JIS string, because Shift_JIS is 8bit 2byte string. For example, the byte sequence

0xE5 0x81 0x8F 0xE5 0x81 0x9c

is Japanese 2 characters <u+504f> <u+505c> as UTF-8 string, and it is Shift_JIS 3 characters [E581][8FE5][819C]. So g_convert() should be used for the case S_ENC_SJIS too.</u+505c></u+504f>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2026-01-07

I think I see an additional problem.
How should we handle Unicode escape sequences?
For example to create a label containing a Greek letter pi: 2π * x
it should be possible to use an escape sequence set label "2\U+03C0 * x".

The program knows how to convert this to a UTF-8 character sequence that would be accepted by the cairo terminals. But with this change in place the call to iconv would try to convert this byte sequence as if it were EUC-JP and give the error
Unable to convert "2π": the sequence is invalid in the current charset (EUC-JP), falling back to iso_8859_1

Is there a standard solution to printing Greek characters or other general symbols in a EUC-JP environment?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shigeharu TAKENO - 2026-01-08

The current gnuplot seems to permit the \U+xxxx sequence only for S_ENC_UTF8. I agree that we can not use \U+xxxx for S_ENC_EUCJP. Because EUC-JP table include greek alphabets and some simple symbols, we cant put them directly. The user which wants to use any special character not in EUC-JP table may use "set encoding utf8" and UTF-8 script.

Well, I found the function strlen_sjis() for gp_strlen() in encoding.c, so the function strlen_eucjp() may be need for S_ENC_EUCJP. Standard EUC-JP is 8bit 2byte character code, the first byte of EUC-JP character is 0x8e or in 0xa1-0xfe, and the second byte is in 0xa1-0xfe. Though there is extended 3byte EUC-JP code, it is not implemented in almost japanese environment, and we may not think to treat it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ethan Merritt - 2026-01-08
  
  I was thinking about it the wrong way. The cairo terminals already convert all text to UTF-8. Either it was recognized as UTF-8 or it was run through g_convert to make it UTF-8. So at that point it is OK to expand the Unicode escape sequences \U+acbd into more UTF-8. I made that change, and it seems to work. Your EUC-JP strings can now include Unicode escape sequences if the locale is recognized, but only for the cairo terminals. I will look to see if any other terminals can do something similar.
  
  I will also look into strlen_eucjp(). Did you know if there is an existing implementation somewhere?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shigeharu TAKENO - 2026-01-28

I tested current git version. Then the script all.dem stopped at
the command "test palette" in pm3d.dem.

gnuplot> set encoding EUC_JP ^ line 68: unrecognized encoding specification; see 'help encoding'.

I tested on wxt terminal, then the same message appeared for the case "set encoding locale":

gnuplot> set encoding default gnuplot> test palette # no problem gnuplot> set encoding locale gnuplot> test palette

gnuplot> set encoding EUC_JP
^
line 68: unrecognized encoding specification; see 'help encoding'.

gnuplot>

But images generated by above commands are correct.
Also, "test" command also fails for "set encoding locale".

gnuplot> set encoding default gnuplot> test gnuplot> set encoding locale gnuplot> test Unable to convert "utf8: ": the sequence is invalid in the current charset (EUC-JP), falling back to iso_8859_1 gnuplot>

The Image generated the last command is incorrect (attached file).
However, it may not a bug because the "test" command put a UTF-8 string
without changing encoding.

test-wxt-eucjp-1.png
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shigeharu TAKENO - 2026-01-28

Well, EUC-JP string "2025\xc7\xaf" are printed correctly for set encoding locale, set encoding "EUC_JP", and without "set encoding" command.

The compound string of EUC-JP character and "\U+xxxx" are also printed correctly for the same situation. Thank you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

wxt terminal fails to display some Japanese (EUC-JP) string

A portable, multi-platform, command-line driven graphing utility

Priority

Searches

Help

#2849 wxt terminal fails to display some Japanese (EUC-JP) string

Discussion