I give extractor a link,and then run it,it was fine at first but soon it printed something:
org.htmlparser.util.ParserException: Unexpected Exception occurred while reading http://news.163.com, in nextHTMLNode
at Line 340 : </script>
Previous Line 339 : document.bbslogin.username.value=user;;
org.htmlparser.util.ParserException: HTMLReader.readElement() : Error occurred while trying to read the next element,
at Line 340 : </script>
Previous Line 339 : document.bbslogin.username.value=user;;
org.htmlparser.util.ParserException: HTMLReader.readElement() : Error occurred while trying to decipher the tag using scanners
Tag being processed : SCRIPT
Current Tag Line : <script>
at Line 340 : </script>
Previous Line 339 : document.bbslogin.username.value=user;;
...
org.htmlparser.util.ParserException: HTMLTag.scan() : Error while scanning tag, tag contents = script, tagLine = <script>;
org.htmlparser.util.ParserException: Exception occurred in CompositeTagScanner.scan(),
current tag being processed is : <SCRIPT>;
java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.substring(String.java:1477)
at org.htmlparser.parserHelper.AttributeParser.parseAttributes(AttributeParser.java:113)
at org.htmlparser.tags.Tag.parseAttributes(Tag.java:169)
at org.htmlparser.tags.Tag.getAttributes(Tag.java:206)
at org.htmlparser.tags.Tag.getTagName(Tag.java:212)
at org.htmlparser.tags.Tag.toHtml(Tag.java:358)
at org.htmlparser.tags.CompositeTag.getChildrenHTML(CompositeTag.java:232)
at org.htmlparser.tags.ScriptTag.<init>(ScriptTag.java:55)
at org.htmlparser.scanners.ScriptScanner.createTag(ScriptScanner.java:59)
at org.htmlparser.scanners.CompositeTagScanner.scan(CompositeTagScanner.java:135)
at org.htmlparser.scanners.TagScanner.createScannedNode(TagScanner.java:194)
at org.htmlparser.tags.Tag.scan(Tag.java:276)
at org.htmlparser.NodeReader.readElement(NodeReader.java:193)
at org.htmlparser.util.IteratorImpl.peek(IteratorImpl.java:60)
at org.htmlparser.util.IteratorImpl.hasMoreNodes(IteratorImpl.java:91)
at com.borland.samples.welcome.LinkExtractor.extractLinks(LinkExtractor.java:33)
at com.borland.samples.welcome.LinkExtractor.main(LinkExtractor.java:49)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Which version of the parser were u using? Also, LinkExtractor is a little out of date. Try the sample programs shown in the docs - they ought to work fine. If not, feel free to file a bug report.
Regards,
Somik
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I give extractor a link,and then run it,it was fine at first but soon it printed something:
org.htmlparser.util.ParserException: Unexpected Exception occurred while reading http://news.163.com, in nextHTMLNode
at Line 340 : </script>
Previous Line 339 : document.bbslogin.username.value=user;;
org.htmlparser.util.ParserException: HTMLReader.readElement() : Error occurred while trying to read the next element,
at Line 340 : </script>
Previous Line 339 : document.bbslogin.username.value=user;;
org.htmlparser.util.ParserException: HTMLReader.readElement() : Error occurred while trying to decipher the tag using scanners
Tag being processed : SCRIPT
Current Tag Line : <script>
at Line 340 : </script>
Previous Line 339 : document.bbslogin.username.value=user;;
...
org.htmlparser.util.ParserException: HTMLTag.scan() : Error while scanning tag, tag contents = script, tagLine = <script>;
org.htmlparser.util.ParserException: Exception occurred in CompositeTagScanner.scan(),
current tag being processed is : <SCRIPT>;
java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.substring(String.java:1477)
at org.htmlparser.parserHelper.AttributeParser.parseAttributes(AttributeParser.java:113)
at org.htmlparser.tags.Tag.parseAttributes(Tag.java:169)
at org.htmlparser.tags.Tag.getAttributes(Tag.java:206)
at org.htmlparser.tags.Tag.getTagName(Tag.java:212)
at org.htmlparser.tags.Tag.toHtml(Tag.java:358)
at org.htmlparser.tags.CompositeTag.getChildrenHTML(CompositeTag.java:232)
at org.htmlparser.tags.ScriptTag.<init>(ScriptTag.java:55)
at org.htmlparser.scanners.ScriptScanner.createTag(ScriptScanner.java:59)
at org.htmlparser.scanners.CompositeTagScanner.scan(CompositeTagScanner.java:135)
at org.htmlparser.scanners.TagScanner.createScannedNode(TagScanner.java:194)
at org.htmlparser.tags.Tag.scan(Tag.java:276)
at org.htmlparser.NodeReader.readElement(NodeReader.java:193)
at org.htmlparser.util.IteratorImpl.peek(IteratorImpl.java:60)
at org.htmlparser.util.IteratorImpl.hasMoreNodes(IteratorImpl.java:91)
at com.borland.samples.welcome.LinkExtractor.extractLinks(LinkExtractor.java:33)
at com.borland.samples.welcome.LinkExtractor.main(LinkExtractor.java:49)
Which version of the parser were u using? Also, LinkExtractor is a little out of date. Try the sample programs shown in the docs - they ought to work fine. If not, feel free to file a bug report.
Regards,
Somik