Yamagata Yoriyuki wrote:
> k could be negative. A better way is
>
> if k >= 0 then
> if k <= 0x7f then ... else
> if k <= 0x7ff then ... else
> ...
> if k <= 0x3ffffff then ... else
> (*)
> else
> (*)
>
> but then the code (*) is duplicated. I don't think an extra integer
> comparison is a big deal.
I do, because, I may UTF-8 every input file to my compiler.
Since lexical analysis is the slowest part of compilation,
and this routine is handling every character individually,
blinding speed is important. Adding 50% more comparisons
to handle an ASCII character may slow the lexer, and thereby
the whole compilation process, by a significant amount.
I'm already thinking to replace Ocamllex, since the
space compaction on the lookup tables costs performance :-)
Also tempted to mmap the input file, to eliminate the
check for end of buffer needed on each char.
After all, the core of a scanner is ultra fast:
while(state = matrix[state][*p++]);
which should outperform memory easily.
Well, if I go i18n, I want the decoder function
as fast as possible (the encoder is less critical).
--
John Max Skaller, mailto:skaller@...
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850
|