[Unicon-group] UTF-8 class implementation available

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Good morning to all,

I am making available to anyone who so desires a copy of the UTF-8 
unicon classes with some documentation. The basic code has been 
completed and testing has been done on the functionality. Exhaustive 
testing has NOT been performed as yet. I still have to fix the comments 
in the header of the file. This will be done over the next few weeks. 
However, the code is available for use and/or testing for those interested.

Unlike in my previous non-class based code, I have made analogues to all 
the in-built string processing functions and provide a means of using 
the class methods within the string scanning environment. There is also 
a small PDF file that describes the various methods and a simple example 
of how to use the classes.

If you are interested, please feel free to contact me either directly or 
via this list group.

This is my take on processing UTF-8. It is an interim measure until the 
UTF-8 implementation changes are made to the Unicon/Icon runtime system. 
It may or may not suit your purposes.

There is at least one possible problem in that it will recognise some 
multi-byte characters as UTF-8 even though they are specifically not 
UTF-8. This particular problem will only arise where someone has been 
malicious in crafting the codepoint. This will only occur when an extra 
continuation byte is inserted that contains only 0 bits in the lower 6 
bits of the byte. The standard specifically states that this is not 
allowed. My code, at this point, doesn't always catch this condition and 
in one place has the potential to generate these specific continuation 
bytes. In normal processing, this should not arise.

regards

Bruce Rennie

[Unicon-group] UTF-8 class implementation available

A modern descendant of the Icon programming language.

[Unicon-group] UTF-8 class implementation available