From: John M. S. <sk...@oz...> - 2003-06-23 17:30:34
|
Yamagata Yoriyuki wrote: > k could be negative. A better way is > > if k >= 0 then > if k <= 0x7f then ... else > if k <= 0x7ff then ... else > ... > if k <= 0x3ffffff then ... else > (*) > else > (*) > > but then the code (*) is duplicated. I don't think an extra integer > comparison is a big deal. I do, because, I may UTF-8 every input file to my compiler. Since lexical analysis is the slowest part of compilation, and this routine is handling every character individually, blinding speed is important. Adding 50% more comparisons to handle an ASCII character may slow the lexer, and thereby the whole compilation process, by a significant amount. I'm already thinking to replace Ocamllex, since the space compaction on the lookup tables costs performance :-) Also tempted to mmap the input file, to eliminate the check for end of buffer needed on each char. After all, the core of a scanner is ultra fast: while(state = matrix[state][*p++]); which should outperform memory easily. Well, if I go i18n, I want the decoder function as fast as possible (the encoder is less critical). -- John Max Skaller, mailto:sk...@oz... snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia. voice:61-2-9660-0850 |