Menu

#4 Lexer's Handling of Hexidecimal Entities

open
nobody
None
5
2010-01-13
2010-01-13
Anonymous
No

The problem:

The following page will not be correctly interpreted by the lexer:

<html><body>Cool&#x08AE;Stuff</body></html>

The solution:

Most of the code already exists to correctly handle this situation. In EntityTable.cs on line 85, the possibility of handling a hexadecimal entity is clearly visible.

The problem comes from when the parser is reading the entity and hits the characters [A-F] or [a-f]. In hex these are also digits, however the lexer does not recognize them as such.

The solution is to edit line 3160-3161 in Lexer.cs. Change the following:

MapStr("abcdefghijklmnopqrstuvwxyz", (short)(LOWERCASE | LETTER | NAMECHAR));
MapStr("ABCDEFGHIJKLMNOPQRSTUVWXYZ", (short)(UPPERCASE | LETTER | NAMECHAR));

To this:

MapStr("abcdef", (short)(DIGIT | LOWERCASE | LETTER | NAMECHAR));
MapStr("ABCDEF", (short)(DIGIT | UPPERCASE | LETTER | NAMECHAR));
MapStr("ghijklmnopqrstuvwxyz", (short)(LOWERCASE | LETTER | NAMECHAR));
MapStr("GHIJKLMNOPQRSTUVWXYZ", (short)(UPPERCASE | LETTER | NAMECHAR));

Discussion


Log in to post a comment.

MongoDB Logo MongoDB