Re: [Foxgui-users] unicode character has 5 digits

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I try this:

const FXString UnicodeXmlHTMLSpecialChars::UNICODE_SUBSCRIPT_y_SYMBOL = unescape("\U0001E05F");

and 

const FXString UnicodeXmlHTMLSpecialChars::UNICODE_SUBSCRIPT_y_SYMBOL = \U0001E05F";

IN both cases I get this warning on compiling in VS2022:

“warning C4566: character represented by universal-character-name '\U0001E05F' cannot be represented in the current code page (1252)”

and it displays as ‘??’ in all fonts I try.

From: Roland Hughes via Foxgui-users <fox...@li...> 
Sent: Tuesday, July 8, 2025 4:53 AM
To: fox...@li...
Subject: Re: [Foxgui-users] unicode character has 5 digits

Please define "Does not work."

Do you get a compilation error?

Just not see the character?

What OS are you on?

I haven't coded with Fox in years, but . . . when it comes to Unicode, the first thing you have to do is ensure the font you are using actually has the character represented. Most fonts only have a tiny subset.

Here is an ancient discussion about finding which fonts have what character

https://graphicdesign.stackexchange.com/questions/63283/how-to-find-browse-fonts-that-include-certain-rare-characters-unicode-internat

a 4 year old discussion

https://www.reddit.com/r/Unicode/comments/l3a3t8/what_font_renders_all_unicode_characters/

A bit of barefoot in the snow for you

We should have forced all countries to use American English just so software developers would have an easier life. Internationalization is where it all went to Hell. Those who are long in the tooth (or now toothless) will remember wide characters.

https://www.geeksforgeeks.org/cpp/wide-char-and-library-functions-in-c/

This was it!!!  Instead of 256 ASCII values we could now have 65536. That would rule the world! Please read point 2 at the top of that.  wchar_t could be 2 or 4 bytes DEPENDING ON COMPILER USED. Data exchange was basically impossible.

Microsoft, in its infinite wisdom, cough cough hack hack, basically got trapped here. They are still trapped here today. Under the hood they went with the first cut of UTF-16 to avoid having to do multiple value characters like UTF-8 forced. In theory it was faster. Keep in mind Windows 3.10 was running no 286 computers so 16-bit at the time.

https://www.betaarchive.com/forum/viewtopic.php?t=38718

Still we could not get the population to engage in global nuclear warfare and force it to use the one true language, American English, where we could make do with good ole ASCII and those wonderful code pages. Especially since IBM still thwarts the universe today with EBCDIC

https://en.wikipedia.org/wiki/Code_page

Guess what?

Instead of subjugating all others via global warfare, they chose to promote peace and love, forming a committee churning out an ever larger elephant when the world wanted a mouse. Like all committees, it lacked any real industry knowledge. All they ever had was an x86 so that must be all that exists.

Read up on surrogates

https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF_(surrogates)

Pay attention to the BE (Big Endian) and LE (Little Endian) columns. IBM and AMDAL (sp?) are Big Endian. Despite Unisys switching to Intel processors they are still ones complement.

Now, we had a fine fine pickle brine.

The x86 and ARM world needed to support itty bitty embedded systems having 512MB or less of RAM (think universal remote control for your TV)  

__AND__ 

we now had to be able to indicate the width of a constant.

The one true world where everything fit into a single 16-bit box was gone!

There is oceans of documentation and legacy code examples out there where \u is always used for unicode.

So now, C programmers, who've never touched a shift key in their life, had to use \U

Just wait for the hack they come up with when the benevolent committee lacking industry knowledge bloats UTF past 32.

UTF-64 is already taken.

https://utf64.moreplease.com/

On 7/7/2025 8:27 PM, John Selverian wrote:

I’ve used Unicode successfully in the past. For instance for a subscripted ‘t’ I used:

UNICODE_SUBSCRIPT_t_SYMBOL = unescape("\\u209c <file://u209c> ");

I want to now use a subscripted ‘y’ and found this:

IE05F

CYRILLIC SUBSCRIPT SMALL LETTER U

<sub> 0443 y

But this code: 

unescape(\\u1E05F <file://u1E05F> )

Does not work. How do I encode a Unicode character with a 5 digit code?

Kind regards,

js

_______________________________________________
Foxgui-users mailing list
Fox...@li... <mailto:Fox...@li...> 
https://lists.sourceforge.net/lists/listinfo/foxgui-users

-- 
Roland Hughes, President
Logikal Solutions
(630)-205-1593  (cell)
https://theminimumyouneedtoknow.com
https://infiniteexposure.net
https://johnsmith-book.com