From: John M. S. <sk...@oz...> - 2003-06-19 12:51:42
|
Yamagata Yoriyuki wrote: > Here are proposals for Unicode character and UTF-8 modules. They were > part of my camomile library, and passed the random tests. Good start here. > let look s i = > let n' = > let n = Char.code s.[i] in > if n < 0x80 then n else > if 0xc2 <= n && n <= 0xdf then > look_code s (i + 1) 1 (n - 0xc0) > else if 0xe0 <= n && n <= 0xef then > look_code s (i + 1) 2 (n - 0xe0) > else if 0xf0 <= n && n <= 0xf7 then > look_code s (i + 1) 3 (n - 0xf0) > else if 0xf8 <= n && n <= 0xfb then > look_code s (i + 1) 4 (n - 0xf8) > else if 0xfc <= n && n <= 0xfd then > look_code s (i + 1) 5 (n - 0xfc) > else invalid_arg "UTF8" > in > uchar_of_int n' This is inefficient. you want: if n <= 0x7F then n else if n <= 0xc1 then invalid_arg "UTF8" else if n <= 0xdf then ... i.e. you can assume the previous range test failed, so just check for endpoints of ranges in ascending order. -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |