Adam Borowski submitted this at https://bugs.debian.org/822074
As joe does its own character classification, rather than using glibc's
iswfoo() as everything else does, sometimes its interpretation differs.
In particular, joe fails to display any of private use area characters
(U+E000..U+F8FF, U+F000..U+FFFFD, U+100000..U+10FFFD).
Classification returned by glibc:
width 1 punct graph print
While the Unicode standard says only that codepoints in that range are "not
noncharacters" without defining their properties, there's no way to sanely
give them a control function, thus making "printable" the only remaining
option. That's what glibc does -- and that's how all programs other than
joe treat these characters.
Here's a minimal patch that fixes iswprint(PUA):
--- joe-4.1.orig/joe/unicode.c
+++ joe-4.1/joe/unicode.c
@@ -321,6 +321,7 @@ void joe_iswinit()
cclass_union(cclass_print, unicode("N"));
cclass_union(cclass_print, unicode("P"));
cclass_union(cclass_print, unicode("Zs"));
+ cclass_union(cclass_print, unicode("Co"));
cclass_opt(cclass_print);
/* Graphical characters (no spaces) */
Classification returned by glibc:
width 1 punct graph print
I wonder about iswpunct() -- glibc somehow returns true for PUA characters,
so it might be a good idea to be consistent with it (even if I don't see why
it's set). As for iswgraph(), joe defines this function but never uses it.
Joe's wcwidth() assumes 1 for all not explicitely listed characters, so
that's same as glibc.
This is now fixed in Mercurial. I'm also marking the PUA as graphical even though JOE doesn't use this class. I don't understand the rational for marking them as punctuation, so ignoring that for now.