The problem:
The following page will not be correctly interpreted by the lexer:
<html><body>CoolࢮStuff</body></html>
The solution:
Most of the code already exists to correctly handle this situation. In EntityTable.cs on line 85, the possibility of handling a hexadecimal entity is clearly visible.
The problem comes from when the parser is reading the entity and hits the characters [A-F] or [a-f]. In hex these are also digits, however the lexer does not recognize them as such.
The solution is to edit line 3160-3161 in Lexer.cs. Change the following:
MapStr("abcdefghijklmnopqrstuvwxyz", (short)(LOWERCASE | LETTER | NAMECHAR));
MapStr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", (short)(UPPERCASE | LETTER | NAMECHAR));
To this:
MapStr("abcdef", (short)(DIGIT | LOWERCASE | LETTER | NAMECHAR));
MapStr("ABCDEF", (short)(DIGIT | UPPERCASE | LETTER | NAMECHAR));
MapStr("ghijklmnopqrstuvwxyz", (short)(LOWERCASE | LETTER | NAMECHAR));
MapStr("GHIJKLMNOPQRSTUVWXYZ", (short)(UPPERCASE | LETTER | NAMECHAR));