Thread: [Htmlparser-developer] RE: [Htmlparser-cvs] htmlparser/src/org/htmlparser/scanners CompositeTagScann
Brought to you by:
derrickoswald
From: Marc N. <ma...@ke...> - 2003-05-27 18:23:03
|
Derrick, I was relying on some of the old behavior of ScriptScanner, mostly the = fact that its contents were not parsed as HTML. I'm still seeing cases = where tags inside of <script> are recognised as "HTML" and modified = (i.e. turned into uppercase, auto-closed, etc). For example, if there = is an HTML tag in a Javascript comment. Also, using "\" to concatenate = lines (which is valid in Javacript) is totally messed up now when I try = to get the script code using "toHtml()". However, I think your change was valid and fixes the bug as requested. = What I think I'm going to do, though, is make a new scanner class that = does what the old ScriptScanner did. That is, do a bare-bones "leave = everything inside that tag as-is" parse of the HTML, searching only for = the end tag with no knowledge of quotes or anything. I think there are = cases where Javascript is written such that any modification at all will = break it. I'll send a note to the list when this class is done (today sometime). = I'll call it StrictScriptScanner or something. Marc -----Original Message----- From: der...@us... [mailto:der...@us...] Sent: Saturday, May 24, 2003 2:05 PM To: htm...@li... Subject: [Htmlparser-cvs] htmlparser/src/org/htmlparser/scanners CompositeTagScanner.java,1.52,1.53 ScriptScanner.java,1.21,1.22 Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners In directory sc8-pr-cvs1:/tmp/cvs-serv7741/org/htmlparser/scanners Modified Files: CompositeTagScanner.java ScriptScanner.java=20 Log Message: Fixed bug #741769 ScriptScanner doesn't handle quoted </script> tags Major overhaul of ScriptScanner. It now uses the scan() method of CompositeTagScanner (i.e. doesn't = override). CompositeTagScanner now has a balance_quotes member field that dictates whether strings tags are scanned honouring single and double quotes. This affected the call chain through NodeReader and StringScanner which now have this parameter. StringScanner now correctly handles quotes if asked. The ignoreState = stuff is removed, it didn't work anyway since a single StringScanner is used recursively = by the NodeReader, and the member field would have been tromped. Sorry to all those who have broken code because of this, but it's for = the better. Really. Index: CompositeTagScanner.java =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: = /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/CompositeTagSc= anner.java,v retrieving revision 1.52 retrieving revision 1.53 diff -C2 -d -r1.52 -r1.53 *** CompositeTagScanner.java 19 May 2003 02:49:57 -0000 1.52 --- CompositeTagScanner.java 24 May 2003 21:04:44 -0000 1.53 *************** *** 97,100 **** --- 97,101 ---- private Set tagEnderSet; private Set endTagEnderSet; + private boolean balance_quotes; =09 public CompositeTagScanner(String [] nameOfTagToMatch) { *************** *** 125,129 **** this(filter,nameOfTagToMatch,tagEnders,new String[] {}, = allowSelfChildren); } ! =09 public CompositeTagScanner( String filter,=20 --- 126,130 ---- this(filter,nameOfTagToMatch,tagEnders,new String[] {}, = allowSelfChildren); } !=20 public CompositeTagScanner( String filter,=20 *************** *** 131,138 **** String [] tagEnders,=20 String [] endTagEnders, ! boolean allowSelfChildren) { super(filter); this.nameOfTagToMatch =3D nameOfTagToMatch; this.allowSelfChildren =3D allowSelfChildren; this.tagEnderSet =3D new HashSet(); for (int i=3D0;i<tagEnders.length;i++) --- 132,172 ---- String [] tagEnders,=20 String [] endTagEnders, ! boolean allowSelfChildren) ! { ! this(filter,nameOfTagToMatch,tagEnders,endTagEnders, = allowSelfChildren, false); ! } !=20 ! /** ! * Constructor specifying all member fields. ! * @param filter A string that is used to match which tags are to = be allowed ! * to pass through. This can be useful when one wishes to = dynamically filter ! * out all tags except one type which may be programmed later than = the parser. ! * @param nameOfTagToMatch The tag names recognized by this = scanner. ! * @param tagEnders The non-endtag tag names which signal that no = closing ! * end tag was found. For example, encountering <FORM> while ! * scanning a <A> link tag would mean that no </A> was = found ! * and needs to be corrected. ! * @param endTagEnders The endtag names which signal that no = closing end ! * tag was found. For example, encountering </HTML> while ! * scanning a <BODY> tag would mean that no </BODY> was = found ! * and needs to be corrected. These items are not prefixed by a = '/'. ! * @param allowSelfChildren If <code>true</code> a tag of the same = name is ! * allowed within this tag. Used to determine when an endtag is = missing. ! * @param balance_quotes <code>true</code> if scanning string nodes = needs to ! * honour quotes. For example, ScriptScanner defines this = <code>true</code> ! * so that text within <SCRIPT></SCRIPT> ignores = tag-like text ! * within quotes. ! */ ! public CompositeTagScanner( ! String filter,=20 ! String [] nameOfTagToMatch,=20 ! String [] tagEnders,=20 ! String [] endTagEnders, ! boolean allowSelfChildren, ! boolean balance_quotes) { super(filter); this.nameOfTagToMatch =3D nameOfTagToMatch; this.allowSelfChildren =3D allowSelfChildren; + this.balance_quotes =3D balance_quotes; this.tagEnderSet =3D new HashSet(); for (int i=3D0;i<tagEnders.length;i++) *************** *** 145,149 **** public Tag scan(Tag tag, String url, NodeReader reader,String = currLine) throws ParserException { CompositeTagScannerHelper helper =3D=20 ! new CompositeTagScannerHelper(this,tag,url,reader,currLine); return helper.scan(); } --- 179,183 ---- public Tag scan(Tag tag, String url, NodeReader reader,String = currLine) throws ParserException { CompositeTagScannerHelper helper =3D=20 ! new = CompositeTagScannerHelper(this,tag,url,reader,currLine,balance_quotes); return helper.scan(); } *************** *** 193,196 **** return false; } -=20 } --- 227,229 ---- Index: ScriptScanner.java =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: = /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/ScriptScanner.= java,v retrieving revision 1.21 retrieving revision 1.22 diff -C2 -d -r1.21 -r1.22 *** ScriptScanner.java 19 May 2003 02:49:57 -0000 1.21 --- ScriptScanner.java 24 May 2003 21:04:44 -0000 1.22 *************** *** 28,64 **** =20 package org.htmlparser.scanners; ! ///////////////////////// ! // HTML Parser Imports // ! ///////////////////////// ! import org.htmlparser.Node; ! import org.htmlparser.NodeReader; ! import org.htmlparser.StringNode; ! import org.htmlparser.tags.EndTag; import org.htmlparser.tags.ScriptTag; import org.htmlparser.tags.Tag; import org.htmlparser.tags.data.CompositeTagData; import org.htmlparser.tags.data.TagData; ! import org.htmlparser.util.NodeList; ! import org.htmlparser.util.ParserException; /** * The HTMLScriptScanner identifies javascript code */ -=20 public class ScriptScanner extends CompositeTagScanner { - private static final String SCRIPT_END_TAG =3D "</SCRIPT>"; private static final String MATCH_NAME [] =3D {"SCRIPT"}; private static final String ENDERS [] =3D {"BODY", "HTML"}; public ScriptScanner() { ! super("",MATCH_NAME,ENDERS); } =20 public ScriptScanner(String filter) { ! super(filter,MATCH_NAME,ENDERS); } =20 ! public ScriptScanner(String filter, String[] nameOfTagToMatch) { ! super(filter,nameOfTagToMatch,ENDERS); } ! =09 public String [] getID() { return MATCH_NAME; --- 28,59 ---- =20 package org.htmlparser.scanners; !=20 import org.htmlparser.tags.ScriptTag; import org.htmlparser.tags.Tag; import org.htmlparser.tags.data.CompositeTagData; import org.htmlparser.tags.data.TagData; !=20 /** * The HTMLScriptScanner identifies javascript code */ public class ScriptScanner extends CompositeTagScanner { private static final String MATCH_NAME [] =3D {"SCRIPT"}; private static final String ENDERS [] =3D {"BODY", "HTML"}; public ScriptScanner() { ! this(""); } =20 public ScriptScanner(String filter) { ! this(filter,MATCH_NAME,ENDERS); } =20 ! public ScriptScanner(String filter, String[] nameOfTagToMatch, = String[] enders) { ! this(filter,nameOfTagToMatch,enders, new String[0], true, true); } !=20 ! public ScriptScanner(String filter, String[] nameOfTagToMatch, = String[] enders, String[] endtagenders, boolean allowSelfChildren, = boolean balance_quotes) { ! super(filter,nameOfTagToMatch,enders, new String[0], = allowSelfChildren, balance_quotes); ! } !=20 public String [] getID() { return MATCH_NAME; *************** *** 70,205 **** return new ScriptTag(tagData,compositeTagData); } -=20 - public Tag scan(Tag tag, String url, NodeReader reader, String = currLine) - throws ParserException { - try { - int startLine =3D reader.getLastLineNumber(); - String line =3D null; - StringBuffer scriptContents =3D=20 - new StringBuffer(); - boolean endTagFound =3D false; - Tag startTag =3D tag; - Tag endTag =3D null; - line =3D currLine; - boolean sameLine =3D true; - int startingPos =3D startTag.elementEnd(); - do { - int endTagLoc =3D = line.toUpperCase().indexOf(getEndTag(),startingPos); - while (endTagLoc>0 && isScriptEmbeddedInDocumentWrite(line, = endTagLoc)) { - startingPos =3D endTagLoc+getEndTag().length(); - endTagLoc =3D line.toUpperCase().indexOf(getEndTag(), = startingPos); =09 - } - =20 - if (endTagLoc!=3D-1) { - endTagFound =3D true; - endTag =3D (EndTag)EndTag.find(line,endTagLoc); - if (sameLine)=20 - scriptContents.append( - getCodeBetweenStartAndEndTags( - line, - startTag, - endTagLoc) - ); - else { - scriptContents.append(Node.getLineSeparator()); - scriptContents.append(line.substring(0,endTagLoc)); - } - =09 - reader.setPosInLine(endTag.elementEnd()); - } else { - if (sameLine)=20 - scriptContents.append( - line.substring( - startTag.elementEnd()+1 - ) - ); - else { - scriptContents.append(Node.getLineSeparator()); - scriptContents.append(line); - } - } - if (!endTagFound) { - line =3D reader.getNextLine(); - startingPos =3D 0; - } - if (sameLine)=20 - sameLine =3D false; - } - while (line!=3Dnull && !endTagFound); - if (endTag =3D=3D null) { - // If end tag doesn't exist, create one - String endTagName =3D tag.getTagName(); - int endTagBegin =3D reader.getLastReadPosition()+1 ; - int endTagEnd =3D endTagBegin + endTagName.length() + 2;=20 - endTag =3D new EndTag( - new TagData( - endTagBegin, - endTagEnd, - endTagName, - currLine - ) - ); - } - NodeList childrenNodeList =3D new NodeList(); - childrenNodeList.add( - new StringNode( - scriptContents, - startTag.elementEnd(), - endTag.elementBegin()-1 - ) - ); - return createTag( - new TagData( - startTag.elementBegin(), - endTag.elementEnd(), - startLine, - reader.getLastLineNumber(), - startTag.getText(), - currLine, - url, - false - ), new CompositeTagData( - startTag,endTag,childrenNodeList - ) - ); - =09 - } - catch (Exception e) { - throw new ParserException("Error in ScriptScanner: ",e); - } - } -=20 - public String getCodeBetweenStartAndEndTags( - String line, - Tag startTag, - int endTagLoc) throws ParserException { - try { - =09 - return line.substring( - startTag.elementEnd()+1, - endTagLoc - ); - } - catch (Exception e) { - StringBuffer msg =3D new StringBuffer("Error in = getCodeBetweenStartAndEndTags():\n"); - msg.append("substring starts at: = "+(startTag.elementEnd()+1)).append("\n"); - msg.append("substring ends at: "+(endTagLoc)); - throw new ParserException(msg.toString(),e); - } - } -=20 - /** - * Gets the end tag that the scanner uses to stop scanning. = Subclasses of - * <code>ScriptScanner</code> you should override this method. - * @return String containing the end tag to search for, i.e. = </SCRIPT> - */=20 - public String getEndTag() { - return SCRIPT_END_TAG; - } - =09 - private boolean isScriptEmbeddedInDocumentWrite(String line, int = endTagLoc) { - if (endTagLoc+getEndTag().length() > line.length()-1) return false; - return line.charAt(endTagLoc+getEndTag().length())=3D=3D'"'; - } -=20 } --- 65,67 ---- ------------------------------------------------------- This SF.net email is sponsored by: ObjectStore. If flattening out C++ or Java code to make your application fit in a relational database is painful, don't do it! Check out ObjectStore. Now part of Progress Software. http://www.objectstore.net/sourceforge _______________________________________________ Htmlparser-cvs mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-cvs |
From: Derrick O. <Der...@ro...> - 2003-05-27 21:46:44
|
Marc, The text within <SCRIPT></SCRIPT> is supposed to be parsed as pure text or remarks. I guess the text scanner goes until it sees a <x... and then stops to defer to a tag scanner. I hadn't thought about those in comments, or about the \ end of lines. Perhaps, rather than write a new scanner, fix the StringScanner (the remark scanner should be OK), so that it does the correct behaviour when balance_quotes is true. Then the 'balance_quotes' flag could be called 'strict_script' or something. Derrick Marc Novakowski wrote: >Derrick, > >I was relying on some of the old behavior of ScriptScanner, mostly the fact that its contents were not parsed as HTML. I'm still seeing cases where tags inside of <script> are recognised as "HTML" and modified (i.e. turned into uppercase, auto-closed, etc). For example, if there is an HTML tag in a Javascript comment. Also, using "\" to concatenate lines (which is valid in Javacript) is totally messed up now when I try to get the script code using "toHtml()". > >However, I think your change was valid and fixes the bug as requested. What I think I'm going to do, though, is make a new scanner class that does what the old ScriptScanner did. That is, do a bare-bones "leave everything inside that tag as-is" parse of the HTML, searching only for the end tag with no knowledge of quotes or anything. I think there are cases where Javascript is written such that any modification at all will break it. > >I'll send a note to the list when this class is done (today sometime). I'll call it StrictScriptScanner or something. > >Marc > >-----Original Message----- >From: der...@us... >[mailto:der...@us...] >Sent: Saturday, May 24, 2003 2:05 PM >To: htm...@li... >Subject: [Htmlparser-cvs] htmlparser/src/org/htmlparser/scanners >CompositeTagScanner.java,1.52,1.53 ScriptScanner.java,1.21,1.22 > > >Update of /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners >In directory sc8-pr-cvs1:/tmp/cvs-serv7741/org/htmlparser/scanners > >Modified Files: > CompositeTagScanner.java ScriptScanner.java >Log Message: >Fixed bug #741769 ScriptScanner doesn't handle quoted </script> tags >Major overhaul of ScriptScanner. >It now uses the scan() method of CompositeTagScanner (i.e. doesn't override). >CompositeTagScanner now has a balance_quotes member field that dictates >whether strings tags are scanned honouring single and double quotes. >This affected the call chain through NodeReader and StringScanner which >now have this parameter. >StringScanner now correctly handles quotes if asked. The ignoreState stuff is removed, >it didn't work anyway since a single StringScanner is used recursively by the NodeReader, >and the member field would have been tromped. >Sorry to all those who have broken code because of this, but it's for the better. Really. > > > >Index: CompositeTagScanner.java >=================================================================== >RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/CompositeTagScanner.java,v >retrieving revision 1.52 >retrieving revision 1.53 >diff -C2 -d -r1.52 -r1.53 >*** CompositeTagScanner.java 19 May 2003 02:49:57 -0000 1.52 >--- CompositeTagScanner.java 24 May 2003 21:04:44 -0000 1.53 >*************** >*** 97,100 **** >--- 97,101 ---- > private Set tagEnderSet; > private Set endTagEnderSet; >+ private boolean balance_quotes; > > public CompositeTagScanner(String [] nameOfTagToMatch) { >*************** >*** 125,129 **** > this(filter,nameOfTagToMatch,tagEnders,new String[] {}, allowSelfChildren); > } >! > public CompositeTagScanner( > String filter, >--- 126,130 ---- > this(filter,nameOfTagToMatch,tagEnders,new String[] {}, allowSelfChildren); > } >! > public CompositeTagScanner( > String filter, >*************** >*** 131,138 **** > String [] tagEnders, > String [] endTagEnders, >! boolean allowSelfChildren) { > super(filter); > this.nameOfTagToMatch = nameOfTagToMatch; > this.allowSelfChildren = allowSelfChildren; > this.tagEnderSet = new HashSet(); > for (int i=0;i<tagEnders.length;i++) >--- 132,172 ---- > String [] tagEnders, > String [] endTagEnders, >! boolean allowSelfChildren) >! { >! this(filter,nameOfTagToMatch,tagEnders,endTagEnders, allowSelfChildren, false); >! } >! >! /** >! * Constructor specifying all member fields. >! * @param filter A string that is used to match which tags are to be allowed >! * to pass through. This can be useful when one wishes to dynamically filter >! * out all tags except one type which may be programmed later than the parser. >! * @param nameOfTagToMatch The tag names recognized by this scanner. >! * @param tagEnders The non-endtag tag names which signal that no closing >! * end tag was found. For example, encountering <FORM> while >! * scanning a <A> link tag would mean that no </A> was found >! * and needs to be corrected. >! * @param endTagEnders The endtag names which signal that no closing end >! * tag was found. For example, encountering </HTML> while >! * scanning a <BODY> tag would mean that no </BODY> was found >! * and needs to be corrected. These items are not prefixed by a '/'. >! * @param allowSelfChildren If <code>true</code> a tag of the same name is >! * allowed within this tag. Used to determine when an endtag is missing. >! * @param balance_quotes <code>true</code> if scanning string nodes needs to >! * honour quotes. For example, ScriptScanner defines this <code>true</code> >! * so that text within <SCRIPT></SCRIPT> ignores tag-like text >! * within quotes. >! */ >! public CompositeTagScanner( >! String filter, >! String [] nameOfTagToMatch, >! String [] tagEnders, >! String [] endTagEnders, >! boolean allowSelfChildren, >! boolean balance_quotes) { > super(filter); > this.nameOfTagToMatch = nameOfTagToMatch; > this.allowSelfChildren = allowSelfChildren; >+ this.balance_quotes = balance_quotes; > this.tagEnderSet = new HashSet(); > for (int i=0;i<tagEnders.length;i++) >*************** >*** 145,149 **** > public Tag scan(Tag tag, String url, NodeReader reader,String currLine) throws ParserException { > CompositeTagScannerHelper helper = >! new CompositeTagScannerHelper(this,tag,url,reader,currLine); > return helper.scan(); > } >--- 179,183 ---- > public Tag scan(Tag tag, String url, NodeReader reader,String currLine) throws ParserException { > CompositeTagScannerHelper helper = >! new CompositeTagScannerHelper(this,tag,url,reader,currLine,balance_quotes); > return helper.scan(); > } >*************** >*** 193,196 **** > return false; > } >- > } >--- 227,229 ---- > >Index: ScriptScanner.java >=================================================================== >RCS file: /cvsroot/htmlparser/htmlparser/src/org/htmlparser/scanners/ScriptScanner.java,v >retrieving revision 1.21 >retrieving revision 1.22 >diff -C2 -d -r1.21 -r1.22 >*** ScriptScanner.java 19 May 2003 02:49:57 -0000 1.21 >--- ScriptScanner.java 24 May 2003 21:04:44 -0000 1.22 >*************** >*** 28,64 **** > > package org.htmlparser.scanners; >! ///////////////////////// >! // HTML Parser Imports // >! ///////////////////////// >! import org.htmlparser.Node; >! import org.htmlparser.NodeReader; >! import org.htmlparser.StringNode; >! import org.htmlparser.tags.EndTag; > import org.htmlparser.tags.ScriptTag; > import org.htmlparser.tags.Tag; > import org.htmlparser.tags.data.CompositeTagData; > import org.htmlparser.tags.data.TagData; >! import org.htmlparser.util.NodeList; >! import org.htmlparser.util.ParserException; > /** > * The HTMLScriptScanner identifies javascript code > */ >- > public class ScriptScanner extends CompositeTagScanner { >- private static final String SCRIPT_END_TAG = "</SCRIPT>"; > private static final String MATCH_NAME [] = {"SCRIPT"}; > private static final String ENDERS [] = {"BODY", "HTML"}; > public ScriptScanner() { >! super("",MATCH_NAME,ENDERS); > } > > public ScriptScanner(String filter) { >! super(filter,MATCH_NAME,ENDERS); > } > >! public ScriptScanner(String filter, String[] nameOfTagToMatch) { >! super(filter,nameOfTagToMatch,ENDERS); > } >! > public String [] getID() { > return MATCH_NAME; >--- 28,59 ---- > > package org.htmlparser.scanners; >! > import org.htmlparser.tags.ScriptTag; > import org.htmlparser.tags.Tag; > import org.htmlparser.tags.data.CompositeTagData; > import org.htmlparser.tags.data.TagData; >! > /** > * The HTMLScriptScanner identifies javascript code > */ > public class ScriptScanner extends CompositeTagScanner { > private static final String MATCH_NAME [] = {"SCRIPT"}; > private static final String ENDERS [] = {"BODY", "HTML"}; > public ScriptScanner() { >! this(""); > } > > public ScriptScanner(String filter) { >! this(filter,MATCH_NAME,ENDERS); > } > >! public ScriptScanner(String filter, String[] nameOfTagToMatch, String[] enders) { >! this(filter,nameOfTagToMatch,enders, new String[0], true, true); > } >! >! public ScriptScanner(String filter, String[] nameOfTagToMatch, String[] enders, String[] endtagenders, boolean allowSelfChildren, boolean balance_quotes) { >! super(filter,nameOfTagToMatch,enders, new String[0], allowSelfChildren, balance_quotes); >! } >! > public String [] getID() { > return MATCH_NAME; >*************** >*** 70,205 **** > return new ScriptTag(tagData,compositeTagData); > } >- >- public Tag scan(Tag tag, String url, NodeReader reader, String currLine) >- throws ParserException { >- try { >- int startLine = reader.getLastLineNumber(); >- String line = null; >- StringBuffer scriptContents = >- new StringBuffer(); >- boolean endTagFound = false; >- Tag startTag = tag; >- Tag endTag = null; >- line = currLine; >- boolean sameLine = true; >- int startingPos = startTag.elementEnd(); >- do { >- int endTagLoc = line.toUpperCase().indexOf(getEndTag(),startingPos); >- while (endTagLoc>0 && isScriptEmbeddedInDocumentWrite(line, endTagLoc)) { >- startingPos = endTagLoc+getEndTag().length(); >- endTagLoc = line.toUpperCase().indexOf(getEndTag(), startingPos); >- } >- >- if (endTagLoc!=-1) { >- endTagFound = true; >- endTag = (EndTag)EndTag.find(line,endTagLoc); >- if (sameLine) >- scriptContents.append( >- getCodeBetweenStartAndEndTags( >- line, >- startTag, >- endTagLoc) >- ); >- else { >- scriptContents.append(Node.getLineSeparator()); >- scriptContents.append(line.substring(0,endTagLoc)); >- } >- >- reader.setPosInLine(endTag.elementEnd()); >- } else { >- if (sameLine) >- scriptContents.append( >- line.substring( >- startTag.elementEnd()+1 >- ) >- ); >- else { >- scriptContents.append(Node.getLineSeparator()); >- scriptContents.append(line); >- } >- } >- if (!endTagFound) { >- line = reader.getNextLine(); >- startingPos = 0; >- } >- if (sameLine) >- sameLine = false; >- } >- while (line!=null && !endTagFound); >- if (endTag == null) { >- // If end tag doesn't exist, create one >- String endTagName = tag.getTagName(); >- int endTagBegin = reader.getLastReadPosition()+1 ; >- int endTagEnd = endTagBegin + endTagName.length() + 2; >- endTag = new EndTag( >- new TagData( >- endTagBegin, >- endTagEnd, >- endTagName, >- currLine >- ) >- ); >- } >- NodeList childrenNodeList = new NodeList(); >- childrenNodeList.add( >- new StringNode( >- scriptContents, >- startTag.elementEnd(), >- endTag.elementBegin()-1 >- ) >- ); >- return createTag( >- new TagData( >- startTag.elementBegin(), >- endTag.elementEnd(), >- startLine, >- reader.getLastLineNumber(), >- startTag.getText(), >- currLine, >- url, >- false >- ), new CompositeTagData( >- startTag,endTag,childrenNodeList >- ) >- ); >- >- } >- catch (Exception e) { >- throw new ParserException("Error in ScriptScanner: ",e); >- } >- } >- >- public String getCodeBetweenStartAndEndTags( >- String line, >- Tag startTag, >- int endTagLoc) throws ParserException { >- try { >- >- return line.substring( >- startTag.elementEnd()+1, >- endTagLoc >- ); >- } >- catch (Exception e) { >- StringBuffer msg = new StringBuffer("Error in getCodeBetweenStartAndEndTags():\n"); >- msg.append("substring starts at: "+(startTag.elementEnd()+1)).append("\n"); >- msg.append("substring ends at: "+(endTagLoc)); >- throw new ParserException(msg.toString(),e); >- } >- } >- >- /** >- * Gets the end tag that the scanner uses to stop scanning. Subclasses of >- * <code>ScriptScanner</code> you should override this method. >- * @return String containing the end tag to search for, i.e. </SCRIPT> >- */ >- public String getEndTag() { >- return SCRIPT_END_TAG; >- } >- >- private boolean isScriptEmbeddedInDocumentWrite(String line, int endTagLoc) { >- if (endTagLoc+getEndTag().length() > line.length()-1) return false; >- return line.charAt(endTagLoc+getEndTag().length())=='"'; >- } >- > } >--- 65,67 ---- > > > > >------------------------------------------------------- >This SF.net email is sponsored by: ObjectStore. >If flattening out C++ or Java code to make your application fit in a >relational database is painful, don't do it! Check out ObjectStore. >Now part of Progress Software. http://www.objectstore.net/sourceforge >_______________________________________________ >Htmlparser-cvs mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-cvs > > >------------------------------------------------------- >This SF.net email is sponsored by: ObjectStore. >If flattening out C++ or Java code to make your application fit in a >relational database is painful, don't do it! Check out ObjectStore. >Now part of Progress Software. http://www.objectstore.net/sourceforge >_______________________________________________ >Htmlparser-developer mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > |