Re: [Ocaml-lib-devel] A proposal for Unicode character and UTF-8 modules.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Yamagata Yoriyuki wrote:

> Here are proposals for Unicode character and UTF-8 modules.  They were
> part of my camomile library, and passed the random tests.

Good start here.

> let look s i =
>   let n' =
>     let n = Char.code s.[i] in
>     if n < 0x80 then n else
>     if 0xc2 <= n && n <= 0xdf then
>       look_code s (i + 1) 1 (n - 0xc0)
>     else if 0xe0 <= n && n <= 0xef then
>       look_code s (i + 1) 2 (n - 0xe0)
>     else if 0xf0 <= n && n <= 0xf7 then
>       look_code s (i + 1) 3 (n - 0xf0)
>     else if 0xf8 <= n && n <= 0xfb then
>       look_code s (i + 1) 4 (n - 0xf8)
>     else if 0xfc <= n && n <= 0xfd then
>       look_code s (i + 1) 5 (n - 0xfc)
>     else invalid_arg "UTF8"
>   in
>   uchar_of_int n'

This is inefficient. you want:

	if n <= 0x7F then n else
	if n <= 0xc1 then invalid_arg "UTF8" else
	if n <= 0xdf then ...

i.e. you can assume the previous range test
failed, so just check for endpoints of ranges
in ascending order.
-- 
John Max Skaller, mailto:sk...@oz...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850