I use EUC-JP japanese encoding (not Shift_JIS) on my computer. Current wxt terminal can display most of all japanese charcter correctly in default except some EUC-JP Japanese string.
For example, "2025\xc7\xaf", where "\xc7\xaf" means Japanese "year" in EUC-JP encoding. But, wxt terminal cannot display the string correctly, because the string may be recognized as UTF-8 string "2025" + "<u+01ef>".</u+01ef>
This is done by gp_cairo_convert() in src/wxterminal/gp_cairo.c:
if (g_utf8_validate(string, -1, NULL)) {
string_utf8 = g_strdup(string);
} else {
charset = gp_cairo_get_encoding(plot);
string_utf8 = g_convert(string, -1, "UTF-8", charset,
&bytes_read, NULL, &error);
}
The function g_utf8_validate() may recognize the string "2025\xc7\xaf" as UTF-8 encoding string. So, I want the option to skip the check by g_utf8_validate(). I don't want new encoding name for EUC-JP, and
it does not solve the problem.
Though above code may select UTF-8 firstly, it may not be valid for the mechanism of "set encoding". When the string may be recognize utf8 string by g_utf8_validate(), above code always use it as utf8 string,
even if any encoding were set by "set encoding".
I suggest to skip the check by g_utf8_validate() in the case "set encoding ..." were used.
You are correct that g_utf8_validate() should not be called if the current character encoding is known to be something other than utf8. I will need your assistance to figure out how to test for this correctly so that it handles your situation.
Short answer
I have added an internal value S_ENC_EUCJP and attempted to modify the code for
set encoding localeso that it detects EUC-JP as the current encoding. The change works under linux but I cannot test under Windows. The git commit is d2ee58da8b6. The test case I used is attached below.Please test on your machine three times,
once with no
set encodingcommand ,once with
set encoding locale, andonce with
set encoding default.Long answer
If I am reading the code correctly, gnuplot's function gp_cairo_get_encoding(plot) is intended to report the current character encoding as a string that can be passed to g_convert(). It first looks to see if the encoding has been set by the gnuplot command "set encoding". If not, it calls the glib function g_get_charset(). So there are two cases that we need to handle.
(1) There is no command
set encoding euc-jp, and there is no internal definition S_ENC_EUCJP. There is a commandset encoding locale, but I do not know what happens when it is used on a machine where the default locale is EUC-JP. I think on a linux machine the command does not work, because there is no internal value S_ENC_EUCJP to report the result. I can fix this.I will need help with Windows, however. It looks like the code tries to translate a Windows codepage into a gnuplot encoding, but again this will fail because there is no value S_ENC_EUCJP to report the result. The information I found is that codepage 20932 indicates EUC-JP. I can add a test for this, but I will need help to test whether it works.
(2) The glib function
g_get_charset(const char **charset)returns TRUE if the current locale is UTF-8 and FALSE otherwise. However on Windows the documentation says that it tests the "system default ANSI code-page" instead of testing the current C locale. I do not know whether that is correct for your situation. It also returns a string to pass to g_convert(), but again I do not know whether the string is correct for your situation.Last edit: Ethan Merritt 2026-01-06
Thanks for your sooner reply and adding S_ENC_EUCJP.
I tested modified code for the script
They do not display japanese title correctly, and in the case "set encoding locale" gnuplot says
Only in the case
gnuplot displays the Japanese string correctly, because the name "EUC_JP" is added in encoding_names[] of term.c.
Unfortunately, Locale name for japanese code depens on the operating system.
FreeBSD:
locale name of EUC-JP = ja_JP.eucJP (old FreeBSD: ja_JP.EUC)
locale name of Shift_JIS = ja_JP.SJIS
Solaris:
locale name of EUC-JP = ja_JP.eucJP, or ja (default)
locale name of Shift_JIS: ja_JP.PCK
setlocale() returns these names, so we may need to modify
in src/encoding.c, for example
The entry for EUC-JP may be need in set_encoding_tbl[] in term.c.
On MS-Windows, certainly codepage 20932 is equal to EUC-JP, but most of Japanese Windows user don't use EUC-JP. EUC-JP is mainly for Unix users. Japanese Windows users use Shift_JIS or UTF-8 only.
Well, I reported some EUC-JP string may be recognize UTF-8 string, the same problem may be occur for some Shift_JIS string, because Shift_JIS is 8bit 2byte string. For example, the byte sequence
is Japanese 2 characters <u+504f> <u+505c> as UTF-8 string, and it is Shift_JIS 3 characters [E581][8FE5][819C]. So g_convert() should be used for the case S_ENC_SJIS too.</u+505c></u+504f>
I think I see an additional problem.
How should we handle Unicode escape sequences?
For example to create a label containing a Greek letter pi:
2π * xit should be possible to use an escape sequence
set label "2\U+03C0 * x".The program knows how to convert this to a UTF-8 character sequence that would be accepted by the cairo terminals. But with this change in place the call to iconv would try to convert this byte sequence as if it were EUC-JP and give the error
Unable to convert "2π": the sequence is invalid in the current charset (EUC-JP), falling back to iso_8859_1Is there a standard solution to printing Greek characters or other general symbols in a EUC-JP environment?
The current gnuplot seems to permit the \U+xxxx sequence only for S_ENC_UTF8. I agree that we can not use \U+xxxx for S_ENC_EUCJP. Because EUC-JP table include greek alphabets and some simple symbols, we cant put them directly. The user which wants to use any special character not in EUC-JP table may use "set encoding utf8" and UTF-8 script.
Well, I found the function strlen_sjis() for gp_strlen() in encoding.c, so the function strlen_eucjp() may be need for S_ENC_EUCJP. Standard EUC-JP is 8bit 2byte character code, the first byte of EUC-JP character is 0x8e or in 0xa1-0xfe, and the second byte is in 0xa1-0xfe. Though there is extended 3byte EUC-JP code, it is not implemented in almost japanese environment, and we may not think to treat it.
I was thinking about it the wrong way. The cairo terminals already convert all text to UTF-8. Either it was recognized as UTF-8 or it was run through g_convert to make it UTF-8. So at that point it is OK to expand the Unicode escape sequences \U+acbd into more UTF-8. I made that change, and it seems to work. Your EUC-JP strings can now include Unicode escape sequences if the locale is recognized, but only for the cairo terminals. I will look to see if any other terminals can do something similar.
I will also look into strlen_eucjp(). Did you know if there is an existing implementation somewhere?
I tested current git version. Then the script all.dem stopped at
the command "test palette" in pm3d.dem.
I tested on wxt terminal, then the same message appeared for the case "set encoding locale":
gnuplot> set encoding EUC_JP
^
line 68: unrecognized encoding specification; see 'help encoding'.
But images generated by above commands are correct.
Also, "test" command also fails for "set encoding locale".
The Image generated the last command is incorrect (attached file).
However, it may not a bug because the "test" command put a UTF-8 string
without changing encoding.
Well, EUC-JP string "2025\xc7\xaf" are printed correctly for set encoding locale, set encoding "EUC_JP", and without "set encoding" command.
The compound string of EUC-JP character and "\U+xxxx" are also printed correctly for the same situation. Thank you.