<edef>A string of the form � or &#dddd</edef>
<edef>A string of the form &#xdddd or ~dddd</edef>
Where I use ~ (tilde) for the NULL char (x00)
Can you help ?
Friday, March 13, 2009.
Yes, it does add a null character when it finds '&#d' ... to find out why?
Tidy starts, on seeing the '&', to build an 'entity'... and on the '#' switches entity mode.
Then it aborts the 'entity' name building on seeing the 'd', and thinks it has the whole entity, and tries to look it up in EntityInfo()...
If the second entity character, name, is '#' then it tries to 'decode' the 3rd, setting 0 (zero) default, using -
sscanf( name+2, "%u", &c );
but in this case name+2 is a NULL... and WITHOUT checking the result of sscanf(), which in this case returns -1, proceeds to set the 'code' to zero, and return 'yes', entity 'found'...
It should return 'no' - no such entity!
So, it seems this could be fixed by the following patch :-
--- tidycvs\src\entities.c Thu Sep 18 16:47:12 2008
+++ tidydev\src\entities.c Fri Mar 13 12:22:57 2009
@@ -366,16 +366,18 @@
if ( name == '#' )
uint c = 0; /* zero on missing/bad number */
+ int res;
/* 'x' prefix denotes hexadecimal number format */
if ( name == 'x' || (!isXml && name == 'X') )
- sscanf( name+3, "%x", &c );
+ res = sscanf( name+3, "%x", &c );
- sscanf( name+2, "%u", &c );
- *code = c;
- *versions = VERS_ALL;
- return yes;
+ res = sscanf( name+2, "%u", &c );
+ if ( res != -1 )
+ *code = c;
+ *versions = VERS_ALL;
+ return yes;
/* Named entity: name ="&" followed by a name */
That is check the result of the sscanf() is not EOF (-1). This could be made even tighter by using if ( res == 1 ), since that is the result expected...
Or the code could check the length of 'name' to see if a sscanf() is even possible, but this 'minimum length' would change whether there was the 'x'/'X' first...
Then it seems the correct xml is output...
<edef>A string of the form &#xdddd or &#dddd</edef>
Hope this helps...
EOF - 2683371.doc
Thanks a lot Geoff !!
Any chance that a new version of Tidy correcting that bug will be available soon ?
I have no idea when my patch will make it into the cvs source... if ever ;=()
But meantime I have put up a WIN32 executable on my site :
Download the zip, and unzip the tidy.exe, and give it a try ;=))
Or you can download the full tidy cvs source, and then download my patch -
apply it, and build tidy yourself... in just about any system...
Thanks for the report... now long ago... sorry for the delay...
Tidy source has moved on to https://github.com/htacg/tidy-html5, site to http://www.html-tidy.org/
Back then I was not a Tidy maintainer, and for what even reason my patch never made it into the CVS source, thus is not in our current github source.
I have now raised issue #373 to address this bug. Will add a new patch after testing soonest.
If you do find another tidy bug please file an issue, and if you find, fix, and test the feature in a tidy fork then you can issue a Pull Request together with sample html and config used.
Tidy needs your support...
Meantime closing this here as out-of-date...
If you get a chance checkout the issue-373 branch for the fix... This will be merged to master after testing, as it hopefully closes this old bug!
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.