From: <je...@fo...> - 2025-07-08 12:58:09
|
On 2025-07-08 03:53, Roland Hughes via Foxgui-users wrote: > Please define "Does not work." > > Do you get a compilation error? > > Just not see the character? > > What OS are you on? > > I haven't coded with Fox in years, but . . . when it comes to Unicode, > the first thing you have to do is ensure the font you are using > actually has the character represented. Most fonts only have a tiny > subset. > > Here is an ancient discussion about finding which fonts have what > character > > https://graphicdesign.stackexchange.com/questions/63283/how-to-find-browse-fonts-that-include-certain-rare-characters-unicode-internat > > > a 4 year old discussion > > https://www.reddit.com/r/Unicode/comments/l3a3t8/what_font_renders_all_unicode_characters/ > > > A bit of barefoot in the snow for you > > We should have forced all countries to use American English just so > software developers would have an easier life. Internationalization is > where it all went to Hell. Those who are long in the tooth (or now > toothless) will remember wide characters. > > https://www.geeksforgeeks.org/cpp/wide-char-and-library-functions-in-c/ > > > This was it!!! Instead of 256 ASCII values we could now have 65536. > That would rule the world! Please read point 2 at the top of that. > wchar_t could be 2 or 4 bytes DEPENDING ON COMPILER USED. Data > exchange was basically impossible. > > Microsoft, in its infinite wisdom, cough cough hack hack, basically > got trapped here. They are still trapped here today. Under the hood > they went with the first cut of UTF-16 to avoid having to do multiple > value characters like UTF-8 forced. In theory it was faster. Keep in > mind Windows 3.10 was running no 286 computers so 16-bit at the time. > > https://www.betaarchive.com/forum/viewtopic.php?t=38718 > > Still we could not get the population to engage in global nuclear > warfare and force it to use the one true language, American English, > where we could make do with good ole ASCII and those wonderful code > pages. Especially since IBM still thwarts the universe today with > EBCDIC > > https://en.wikipedia.org/wiki/Code_page > > Guess what? > > Instead of subjugating all others via global warfare, they chose to > promote peace and love, forming a committee churning out an ever > larger elephant when the world wanted a mouse. Like all committees, it > lacked any real industry knowledge. All they ever had was an x86 so > that must be all that exists. > > Read up on surrogates > > https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF_(surrogates) > > Pay attention to the BE (Big Endian) and LE (Little Endian) columns. > IBM and AMDAL (sp?) are Big Endian. Despite Unisys switching to Intel > processors they are still ones complement. > > Now, we had a fine fine pickle brine. > > The x86 and ARM world needed to support itty bitty embedded systems > having 512MB or less of RAM (think universal remote control for your > TV) > > __AND__ > > we now had to be able to indicate the width of a constant. > > The one true world where everything fit into a single 16-bit box was > gone! > > There is oceans of documentation and legacy code examples out there > where \u is always used for unicode. > > So now, C programmers, who've never touched a shift key in their life, > had to use \U > > Just wait for the hack they come up with when the benevolent committee > lacking industry knowledge bloats UTF past 32. > > UTF-64 is already taken. > > https://utf64.moreplease.com/ Thanks for this wonderful background. 32-bit wide characters would indeed incur enormous bloat, but thankfully, it seems UTF8 encoding is brilliantly leaving most european langauges very close to 1-byte per characters; even people in Korea, Japan, and China, UTF8 is never bigger than 32-bit wide characters, but all punctuation, numbers, etc. is mercifully as short as 1 byte. Now RAM and DISK space is cheaper than ever, but the biggest problem was always software. UTF8 also makes software *mostly* able to deal with wide characters w/o undue pain and suffering. UTF8 is very clever: you can start a character walk from any point in a string, as the begin of a character is always recognizable as such; thus, you can also walk backwards through UTF8 very easily. Various other encodings of 32-bit wide characters are not nearly as clever. So UTF8 is winning and all those who gambled on 16-bit characters are having the worst of both worlds now: not as compact as UTF8, while still having variable-sized characters. So, UTF8 is the way to go. For those who don't interpret the characters, just store them, you'll never need to know anything other than 8-bit safe strings of bytes. In a few cases, you need to traverse a character, not a byte at a time, you can look for the magic lead-character: (ch&0xC0)!=0x80 this takes only two clock cycles! -- JVZ |