I ran into the same issue. It turns out that having '<' characters in the
script character data is illegal (but very common). There is a global flag
at org.htmlparser.scanners.ScriptScanner.STRICT which defaults to true. Set
it to false, and it will accept more of the common illegal javascript,
though it still has problems on combinations of quotes, commments, and '<'
characters. If you run into them, you'll need to override the Lexer
yourself and modify the parseCDATA(boolean) method.
Good luck.
Dave
On 5/24/07, Pandian Annamalai <pan...@ya...> wrote:
>
> Hi,
>
> I have used the HTMLParser on HTML files before and it used to work fine.
>
> But when I used it to parse the Javascript which has embedded HTML like
> below, the parser adds up '>' closing tags for any
> matching '<'.
>
> for e.g I have asked the parser to rewrite the img tag source url,
>
> Input:
> ------
>
> for (g=0; g <recursedNodes.length; g++) {
> if (recursedNodes[g] == 1) document.write("<img
> src=\"images/en_US/line.gif\" align=\"absbottom\" alt=\"\" />");
> else document.write("<img src=\"images/en_US/empty.gif\"
> align=\"absbottom\" alt=\"\" />");
> }
>
>
> ouput:
> ------
>
> for (g=0; g <recursedNodes.length; g++) {
> if (recursedNodes[g] == 1) document.write("><img
> src=\"\root\mages/en_US/line.gif\" align=\"absbottom\" alt=\"\" />");
> else document.write("<img src=\"\root\mages/en_US/line.gif\"
> align=\"absbottom\" alt=\"\" />");
> }
>
> Everything looks fine except the extra '>' before <img.... This is because
> the "<recursedNodes " in for loop is considered as a HTML tag and parser
> is adding '>' to close the tag.
>
> Any help on how the parser can be made to ignore this.. ?
>
> Regards,
> Pandian
>
> ------------------------------
> Ready for the edge of your seat? Check out tonight's top picks<http://us.rd.yahoo.com/evt=48220/*http://tv.yahoo.com/>on Yahoo! TV.
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|