#936 Invalid entity results in truncated text

open
nobody
5
2010-11-26
2010-11-26
aditsu
No

First noticed in test 1062345. My example:

<div title="hello &# this part will not appear in the output">
<div title="&#nbsp; another test">

Results in:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Linux (vers 25 March 2009), see www.w3.org">
<title></title>
</head>
<body>
<div title="hello ">
<div title=""></div>
</div>
</body>
</html>

Analysis:
EntityInfo returns yes but sets the character to 0. That becomes a string terminator.

Suggested patch:

Index: src/entities.c

RCS file: /cvsroot/tidy/tidy/src/entities.c,v
retrieving revision 1.19
diff -u -r1.19 entities.c
--- src/entities.c 9 Aug 2008 11:55:27 -0000 1.19
+++ src/entities.c 26 Nov 2010 08:02:58 -0000
@@ -375,7 +375,7 @@

*code = c;
*versions = VERS_ALL;
- return yes;
+ return c != 0;
}

/* Named entity: name ="&" followed by a name */

Discussion