[Htmlparser-user] Suggestion on Harvester
Brought to you by:
derrickoswald
From: Mohd-Taqiyuddin Z. <mt...@ec...> - 2003-02-27 00:21:15
|
hi there, I think a formtag should end when it sees another formtag although it is not an endtag. Another way of determining the endtag of formtag is to check wether it is the end of the html page by checking the endtag of hmtltag. This is because the in formtag, it's consist of inputtag and the importants information about a form is its method, action, and inputtag, therefore when the parser first see a formtag it will parse the node until it sees the endtag of the formtag, another formtag or the end of html document. therefore, we can logically group Vector of inputtag and other attributes to the appropriate formtag (if there is more than one formtag). I hope my explaination can help us improve htmlparser. thank you. Quoting Somik Raha <so...@ya...>: > This is a known limitation. The problem is in guessing > when a form tag really should have ended. Can you > suggest something looking at the page that failed ? > > Regards, > Somik > --- Mohd-Taqiyuddin Zalfan <mt...@ec...> > wrote: > > Hi, > > > > I'm doing my harvester to harvest information in the > > formtag. It works find > > when I parse to any html pages that I need to parse > > except for this URL > > > http://developer.java.sun.com/developer/Quizzes/misc/earlyadopterjxta.html. > > It seems that the page that gives the error does not > > have an endtag for the > > formtag and the parser loopback to find the endtag > > for the formtag. Is this > > a bug? Do you know a solution that I can still parse > > the page and still get > > the Vector FormInput for further processing. Hope > > you can help me on this. > > below is the generated error. > > " > > ERROR: HTMLReader.readElement() : Error occurred > > while trying to decipher > > the tag using scanners > > Tag being processed : FORM > > Current Tag Line : <form > > action="earlyadopterjxtaanswers.jsp" > > method="POST"> > > at Line 690 : null > > Previous Line 689 : </HTML> > > ERROR: HTMLReader.readElement() : Error occurred > > while trying to read the > > next element, > > at Line 690 : null > > Previous Line 689 : </HTML> > > ERROR: Unexpected Exception occurred while reading > > > http://developer.java.sun.com/developer/Quizzes/misc/earlyadopterjxta.html, > > > > in nextHTMLNode > > at Line 690 : null > > Previous Line 689 : </HTML> > > org.htmlparser.util.ParserException: Unexpected > > Exception occurred while > > reading > > > http://developer.java.sun.com/developer/Quizzes/misc/earlyadopterjxta > > .html, in nextHTMLNode > > at Line 690 : null > > Previous Line 689 : </HTML>" > > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Scholarships for > > Techies! > > Can't afford IT training? All 2003 ictp students > > receive scholarships. > > Get hands-on training in Microsoft, Cisco, Sun, > > Linux/UNIX, and more. > > www.ictp.com/training/sourceforge.asp > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This SF.net email is sponsored by: Scholarships for Techies! > Can't afford IT training? All 2003 ictp students receive scholarships. > Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more. > www.ictp.com/training/sourceforge.asp > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |