Re: [Foxgui-users] unicode character has 5 digits

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 2025-07-07 20:27, John Selverian wrote:
> I've used Unicode successfully in the past. For instance for a
> subscripted 't' I used:
> 
> UNICODE_SUBSCRIPT_t_SYMBOL = unescape("\\u209c");
> 
> I want to now use a subscripted 'y' and found this:
> 
> IE05F
> 
> CYRILLIC SUBSCRIPT SMALL LETTER U
> 
> <sub> 0443 y
> 
> But this code:
> 
> unescape(\\u1E05F [1])
> 
> Does not work. How do I encode a Unicode character with a 5 digit
> code?

Unescape needs to take in a so-called surrogate pair.  A surrogate
pair is two 16-bit wide characters encoding for a 32-bit wide character.

For historic reasons, unicode started out as 16-bit wide characters, but
as these things go, the number of code-points eventually outgrew the 
16-bit
space, and surrogate pairs were needed to encode characters > 16-bit.

32-bit (UCS32) character -> 16-bit (UCS16):

   CH = (U >> 10) + 0xD800
   CL = (U & 0x03FF) + 0xDC00

16-bit UCS16 -> 32-bit UCS32:

   U = (0x10000-(0xD800<<10)-0xDC00) + (CH<<10) + CL

The magic term in the above is to compensate for the fact that two 
constants
were added for CL and CH to package the 32-bit character into the at the 
time
not yet assigned code space of the 16-bit Unicode system.

   -- JVZ