Re: [Htmlparser-user] Harvester
Brought to you by:
derrickoswald
From: Mohd-Taqiyuddin Z. <mt...@ec...> - 2003-02-23 14:47:17
|
hi, sorry to bother you. I know that the input tag is in the HTMLFormTag. However when I try to parse this page with HTMLFormScanner http://developer.java.sun.com/developer/Quizzes/jbasics1-1/ it returns an error and the process has been terminate. Below is my testing code.(Just to see if HTMLFormTag exist in the page) public String extractStrings() throws HTMLParserException { HTMLParser parser = new HTMLParser(resource); parser.addScanner(new HTMLFormScanner("")); HTMLNode node; String check; StringBuffer results= new StringBuffer(); for (HTMLEnumeration e = parser.elements();e.hasMoreNodes();) { node = e.nextHTMLNode(); if (node instanceof HTMLFormTag){//check the existence of HTMLFormTag System.out.print(node.toString());} check=node.toPlainTextString(); results.append(check); } return results.toString(); } however this error printed in the console. Its can compile but generate a runtime error. below is the error: ERROR: HTMLReader.readElement() : Error occurred while trying to decipher the tag using scannersat Line 72 : <form method="get" action="http://servlet.java.sun.com/logRedirect/ frontpage-head/http://search.java.sun.com/search/java/"> Previous Line 71 : <td><table border="0" cellspacing="0" cellpadding="0" width="100%" height="109"> ERROR: HTMLReader.readElement() : Error occurred while trying to read the next element,at Line 72 : <form method="get" action="http://servlet.java.sun.com/logRedirect/ frontpage-head/http://search.java.sun.com/search/java/"> Previous Line 71 : <td><table border="0" cellspacing="0" cellpadding="0" width="100%" height="109"> ERROR: Unexpected Exception occurred while reading http://developer.java.sun.com /developer/Quizzes/jbasics1-1/, in nextHTMLNode at Line 72 : <form method="get" action="http://servlet.java.sun.com/logRedirect/ frontpage-head/http://search.java.sun.com/search/java/"> Previous Line 71 : <td><table border="0" cellspacing="0" cellpadding="0" width="100%" height="109"> org.htmlparser.util.HTMLParserException: Unexpected Exception occurred while reading http://developer.java.sun.com/developer/Quizzes/jbasics1-1/, in nextHTMLNode at Line 72 : <form method="get" action="http://servlet.java.sun.com/logRedirect/ frontpage-head/http://search.java.sun.com/search/java/"> Previous Line 71 : <td><table border="0" cellspacing="0" cellpadding="0" width="100%" height="109">; org.htmlparser.util.HTMLParserException: HTMLReader.readElement() : Error occurred while trying to read the next element, at Line 72 : <form method="get" action="http://servlet.java.sun.com/logRedirect/ frontpage-head/http://search.java.sun.com/search/java/"> Previous Line 71 : <td><table border="0" cellspacing="0" cellpadding="0" width="100%" height="109">; org.htmlparser.util.HTMLParserException: HTMLReader.readElement() : Error occurred while trying to decipher the tag using scanners at Line 72 : <form method="get"action="http://servlet.java.sun.com/logRedirect/ frontpage-head/http://search.java.sun.com/search/java/"> Previous Line 71 : <td><table border="0" cellspacing="0" cellpadding="0" width="100%" height="109">; org.htmlparser.util.HTMLParserException: HTMLTag.scan() : Error while scanning tag, tag contents = form method="get" action="http://servlet.java.sun.com/logRedi rect/frontpage-head/http://search.java.sun.com/search/java/", tagLine = <form method="get" action="http://servlet.java.sun.com/logRedirect/frontpage- head/http://search.java.sun.com/search/java/">; org.htmlparser.util.HTMLParserException: HTMLFormScanner.scan() : Error while scanning the form tag, current line = <form method="get" action="http://servlet.ja va.sun.com/logRedirect/frontpage- head/http://search.java.sun.com/search/java/">; java.lang.NullPointerException at org.htmlparser.HTMLParser.addScanner(HTMLParser.java:863) at org.htmlparser.scanners.HTMLFormScanner.scan (HTMLFormScanner.java:164) at org.htmlparser.scanners.HTMLTagScanner.createScannedNode (HTMLTagScanner.java:193) at org.htmlparser.tags.HTMLTag.scan(HTMLTag.java:266) at org.htmlparser.HTMLReader.readElement(HTMLReader.java:193) at org.htmlparser.util.HTMLEnumerationImpl.peek (HTMLEnumerationImpl.java:60) at org.htmlparser.util.HTMLEnumerationImpl.hasMoreNodes (HTMLEnumerationImpl.java:91) at StringExtractor.extractStrings(StringExtractor.java:27) at StringExtractor.main(StringExtractor.java:49) there is two form in the page, one is for the searching part of the site and the other one is what i'm interested in that is form with questions. Please help me on this. Is this a bug? thank you. |