[Htmlparser-developer] Webase test and Form tag scanner?
Brought to you by:
derrickoswald
From: Mr L. MA <law...@ya...> - 2003-03-06 17:31:08
|
One problem I had with FormTag.toString() method is that form tag should be treated as body tag since any other tags could be nested in it. The ultimate htmlparser test would be webase collection from stanford. What I did is to download a website with a offline browser ( such as webstripper) Running StringExtractor on the local collection gives many ParserExceptions. Sometimes with JTidy I can get luck on some pages before apply HTMLParser, sometimes not. My focus is to use HTMLParser for text extraction, so I came into "dirty" pages that HTMLParser gives error. Is there a way even with readelements=null I can still get the rest nodes? Ling Ma --- Somik Raha <so...@ya...> wrote: > Thanks very much for the sample page. My to do list > for this week : > [1] Refactor correction logic in the link scanner to > the composite scanner, > so that it becomes available for all composite tags. > That will solve the > problem you mention. > > [2] Work on Dhaval's suggestion - I have some ideas > about switching off > testcases that require the internet. > > Regards, > Somik > ----- Original Message ----- > From: "Mr LING MA" <law...@ya...> > To: <htm...@li...> > Sent: Wednesday, March 05, 2003 10:34 PM > Subject: [Htmlparser-developer] Form tag should not > be composite tag? > > > > Hi all: > > Do you guys think form tag should not be composite > > tag? > > or else it cannot process page like: > > > > http://money.cnn.com/services/glossary/a.html > > > > which misses one form end tag. > > > > Ling Ma > > > > __________________________________________________ > > Do you Yahoo!? > > Yahoo! Tax Center - forms, calculators, tips, more > > http://taxes.yahoo.com/ > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Etnus, makers > of TotalView, The > debugger > > for complex code. Debugging C/C++ programs can > leave you feeling lost and > > disoriented. TotalView can help you find your way. > Available on major UNIX > > and Linux platforms. Try it free. www.etnus.com > > _______________________________________________ > > Htmlparser-developer mailing list > > Htm...@li... > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Etnus, makers of > TotalView, The debugger > for complex code. Debugging C/C++ programs can leave > you feeling lost and > disoriented. TotalView can help you find your way. > Available on major UNIX > and Linux platforms. Try it free. www.etnus.com > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ |