HTML Parser / Bugs / #208 Lexer returns a TagNode with a 'null' name

#208 Lexer returns a TagNode with a 'null' name

Milestone: v1.6

Status: closed-fixed

Owner: Derrick Oswald

Labels: Scanner Bug (53)

Priority: 5

Updated: 2006-05-27

Created: 2006-05-23

Creator: Keiron McCammon

Private: No

If you pass the following char sequence (this was from
an actual website: "<!\r\nMSIE->"

Then the Lexer.getNode(boolean) gets the first 2 chars
fine but the '/r/n' is then converted to a single '/n'
char by Page.getCharacter() but since the next char
isn't '-' it calls Cursor.retreat() which moves the
cursor back to the '/r' character (not the '!') and
then calls Lexer.parseTag(), which results in skipping
the '!' char and the tag then has a null name.

If the char sequence doesn't include the '/r' character
then the tag name is '!' which is reasonable all things
considered.

Discussion

Nobody/Anonymous - 2006-05-27

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-05-27

Logged In: NO

Use a more careful cursor retreat - Page.ungetCharacter().

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-05-27

labels: --> Scanner Bug

assigned_to: nobody --> derrickoswald

status: closed --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lexer returns a TagNode with a 'null' name

Group

Searches

Help

#208 Lexer returns a TagNode with a 'null' name

Discussion