Menu

ParserException: null

Help
Anonymous
2004-03-01
2004-03-02
  • Anonymous

    Anonymous - 2004-03-01

    Hi,
    I am getting following error when try extract links from a web site. Any help please. Many Thanks
    Shantha

    D:\htmlparser1_4_2>java Robot http://www.keele.ac.uk/depts/cs/dake/vldb2000/pan
    l2020/DeenVLDB2/index.htm
    Crawlin Site http://www.keele.ac.uk/depts/cs/dake/vldb2000/panel2020/DeenVLDB2/
    ndex.htm  1
    Exception in thread "main" org.htmlparser.util.ParserException: null;
    sun.io.MalformedInputException
            at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:152)
            at java.io.InputStreamReader.convertInto(InputStreamReader.java:137)
            at java.io.InputStreamReader.fill(InputStreamReader.java:186)
            at java.io.InputStreamReader.read(InputStreamReader.java:249)
            at org.htmlparser.lexer.Source.fill(Source.java:239)
            at org.htmlparser.lexer.Source.read(Source.java:322)
            at org.htmlparser.lexer.Source.read(Source.java:347)
            at org.htmlparser.lexer.Page.setEncoding(Page.java:698)
            at org.htmlparser.tags.MetaTag.doSemanticAction(MetaTag.java:115)
            at org.htmlparser.scanners.TagScanner.scan(TagScanner.java:69)
            at org.htmlparser.scanners.CompositeTagScanner.scan(CompositeTagScanner
    java:162)
            at org.htmlparser.util.IteratorImpl.nextNode(IteratorImpl.java:92)
            at Robot.crawl(Robot.java:200)
            at Robot.main(Robot.java:106)

     
    • Derrick Oswald

      Derrick Oswald - 2004-03-02

      From the stack trace, there is a problem trying to interpret the page as UTF-8. The META tag in the HEAD:
          <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      causes the parser to retrace the characters read in so far (which isn't very many) using the UTF-8 encoding scheme, in response to the doSemanticAction() method of the META tag.

      By the way, I don't get this error, so it may be something in your environment. Perhaps a language setting, or somthing. Alternatively, it could be a bug in your JVM, since the byte stream looks pretty normal.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.