[Htmlparser-cvs] htmlparser/src/org/htmlparser/scanners ScriptScanner.java,1.62,1.63
Brought to you by:
derrickoswald
From: Derrick O. <der...@us...> - 2005-03-12 17:53:20
|
Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv25217/scanners Modified Files: ScriptScanner.java Log Message: Add STRICT flag to ScriptScanner to revert to legacy handling of broken ETAGO (</). If STRICT is true, scan according to HTML specification, else if false, scan with quote smart state machine which heuristically yields the correct parse. Index: ScriptScanner.java =================================================================== RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/ScriptScanner.java,v retrieving revision 1.62 retrieving revision 1.63 diff -C2 -d -r1.62 -r1.63 *** ScriptScanner.java 7 Mar 2005 02:18:46 -0000 1.62 --- ScriptScanner.java 12 Mar 2005 17:53:10 -0000 1.63 *************** *** 52,55 **** --- 52,80 ---- { /** + * Strict parsing of CDATA flag. + * If this flag is set true, the parsing of script is performed without + * regard to quotes. This means that erroneous script such as: + * <pre> + * document.write("</script>"); + * </pre> + * will be parsed in strict accordance with appendix + * <a href="http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data"> + * B.3.2 Specifying non-HTML data</a> of the + * <a href="http://www.w3.org/TR/html4/">HTML 4.01 Specification</a> and + * hence will be split into two or more nodes. Correct javascript would + * escape the ETAGO: + * <pre> + * document.write("<\/script>"); + * </pre> + * If true, CDATA parsing will stop at the first ETAGO ("</") no matter + * whether it is quoted or not. If false, balanced quotes (either single or + * double) will shield an ETAGO. Beacuse of the possibility of quotes within + * single or multiline comments, these are also parsed. In most cases, + * users prefer non-strict handling since there is so much broken script + * out in the wild. + */ + public static boolean STRICT = false; + + /** * Create a script scanner. */ *************** *** 87,91 **** } } ! content = lexer.parseCDATA (); position = lexer.getPosition (); node = lexer.nextNode (false); --- 112,116 ---- } } ! content = lexer.parseCDATA (!STRICT); position = lexer.getPosition (); node = lexer.nextNode (false); |