[Htmlparser-announce] htmlparser 1.42
Brought to you by:
derrickoswald
From: Derrick O. <Der...@Ro...> - 2004-07-31 23:39:21
|
Patch release Version 1.42 (Release Build Jul 27, 2004) of the most popular HTML parser on SourceForge is now available: http://sourceforge.net/project/showfiles.php?group_id=24399&package_id=17243&release_id=256305 This is the same as Version 1.4 with four bug fixes: #998195 SiteCatpurer just crashed #995744 Translate.decode(String) #995703 Parser Crash #988846 Linkbean getLinks() segmentation fault (duplicate of above) #919738 Text has not been extracted correctly using StringBean #936392 ScriptTag visitor fails for comments with ' (duplicate of above) One bug involved the decoding of URLs with the Translate.decode() method, which was incorrect. Another bug involved the SiteCapturer program failing in the face of an EncodingChangeException. This exception is raised when the <META> tag indicates a different character set that the one assumed at the start of parsing, and retracing the stream yields different characters than those the client has already consumed. The SiteCapturer now handles this exception by resetting the parser and trying again. Another bug involved an overzealous test for "text/XXX" content. This was erroneously assuming that the content was binary and throwing an exception. Experience indicates numerous web servers are returning parseable streams with content types that do not indicate text. The test is removed. The last bug would return wrong nodes when presented with a quote in a comment in a <SCRIPT> tag. |