Re: [Htmlparser-developer] testStringBeanListener() consistently failing
Brought to you by:
derrickoswald
From: Derrick O. <Der...@ro...> - 2003-02-05 23:07:06
|
Somik, It's easily reproducible: java -jar ./release/htmlparser1_3/lib/htmlparser.jar http://www.amazon.com yields: Begin Tag : br; begins at : 0; ends at : 3 WARNING: HTMLTagParser : Encountered > inside inverted commas in line <td><a href="/exec/obidos/tg/browse/-/1055940/ref=gw_mafb_/"><img src="http://g-images.amazon.com/images/G/01/merchants/logos/marshall-fields-logo-20.gif" width=87 height=20 border=0 alt="Marshall Field's"></a></td>, location 205 ^ Automatically corrected. ERROR: HTMLReader.readElement() : Error occurred while trying to decipher the tag using scanners at Line 686 : null ...and then it really starts to have problems. It seems the "xxxxxx's" pattern causes grief as it reads the > in what it thinks is a single quoted string and 'fixes' it. Derrick Somik Raha wrote: >Hi Ling Ma > It is very hard for us to help you with a vague >request for help. Can you pls post your code, and the >complete exception that you received ? > > If possible, submit a testcase showing the problem >(check >http://htmlparser.sourceforge.net/design/tests.html - >Communicate with testcases). Fixes are usually fast, >and we try to have a new release every week. > >Regards, >Somik >--- Mr LING MA <law...@ya...> wrote: > > >>Dear Derick: >>I am a graduate student, my name is Ling. >>I just begin to use htmlparser 1.3, I try to parse >>amazon pages, most of the time it gives me parsing >>error and exited. >> >>Do you happen to know what was the reason? Is it >>just >>because amazon pages tags are not well closed? or >>some >>bugs in htmlparser? >> >>Thanks a lot >> >>Sincerely yours >>Ling Ma >> >> >> |