Thread: [Htmlparser-developer] testStringBeanListener() consistently failing

Brought to you by: derrickoswald

htmlparser-developer

[Htmlparser-developer] testStringBeanListener() consistently failing

From: Somik R. <so...@ya...> - 2003-02-04 17:35:39

Hi Derrick,
  As of the last release, I'd noticed something
peculiar - testStringBeanListener() would fail every
once in a while. I ignored it, but then, after
Kaarle's fix to the attribute bug today, I find that
testStringBeanListener() fails every time.

  I am not sure if we caused something to break
(probably one of us did). Could you take a look when
you are free ? Bytway, I did modify a few of the
tests, and the part which requires a parser to hold a
url to be serializable (I didn't actually understand
why that was necessary). Could that have caused the
problem ?

Regards,
Somik

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: [Htmlparser-developer] testStringBeanListener() consistently failing

From: Derrick O. <Der...@ro...> - 2003-02-05 01:29:11

Hi back,

Sorry, I've just been lurking the HTMLParser project lately. That's 
because I liked the open source experience I had with HTMLParser so much 
I started my own project:
    http://sourceforge.net/projects/connector/

I think the problem (although I didn't experience it) with 
testStringBeanListener() is the fetch of the external URL (slashdot.org) 
may fail and then the contents of the bean's string property won't 
change meaning the listener isn't fired (my guess is you were not 
getting data back from slashdot.org consistently and then not at all for 
a period, it should be back to OK now).

So I switched to a http://htmlparser.sourceforge.org/test/example.html, 
which is something I said I was going to do a long time ago anyway.

The change to a local string instead of a web page for the serialization 
test may have missed the point, since the intent is to be able to 
recover the connection and parse it after rehydration, so I switched it 
back to a URL (the same as above). I hope that's OK, it should go faster 
anyway because the page is smaller.  If it's not, we'll have to 
investigate why it isn't.

Derrick

Somik Raha wrote:

>Hi Derrick,
>  As of the last release, I'd noticed something
>peculiar - testStringBeanListener() would fail every
>once in a while. I ignored it, but then, after
>Kaarle's fix to the attribute bug today, I find that
>testStringBeanListener() fails every time.
>
>  I am not sure if we caused something to break
>(probably one of us did). Could you take a look when
>you are free ? Bytway, I did modify a few of the
>tests, and the part which requires a parser to hold a
>url to be serializable (I didn't actually understand
>why that was necessary). Could that have caused the
>problem ?
>
>Regards,
>Somik
>
>
>  
>

Re: [Htmlparser-developer] testStringBeanListener() consistently failing

From: Mr L. MA <law...@ya...> - 2003-02-05 22:11:59

Dear Derick:
I am a graduate student, my name is Ling.
I just begin to use htmlparser 1.3, I try to parse 
amazon pages, most of the time it gives me parsing
error and exited.

Do you happen to know what was the reason? Is it just
because amazon pages tags are not well closed? or some
bugs in htmlparser?

Thanks a lot

Sincerely yours 
Ling Ma
--- Derrick Oswald <Der...@ro...> wrote:
> Hi back,
> 
> Sorry, I've just been lurking the HTMLParser project
> lately. That's 
> because I liked the open source experience I had
> with HTMLParser so much 
> I started my own project:
>     http://sourceforge.net/projects/connector/
> 
> I think the problem (although I didn't experience
> it) with 
> testStringBeanListener() is the fetch of the
> external URL (slashdot.org) 
> may fail and then the contents of the bean's string
> property won't 
> change meaning the listener isn't fired (my guess is
> you were not 
> getting data back from slashdot.org consistently and
> then not at all for 
> a period, it should be back to OK now).
> 
> So I switched to a
> http://htmlparser.sourceforge.org/test/example.html,
> 
> which is something I said I was going to do a long
> time ago anyway.
> 
> The change to a local string instead of a web page
> for the serialization 
> test may have missed the point, since the intent is
> to be able to 
> recover the connection and parse it after
> rehydration, so I switched it 
> back to a URL (the same as above). I hope that's OK,
> it should go faster 
> anyway because the page is smaller.  If it's not,
> we'll have to 
> investigate why it isn't.
> 
> Derrick
> 
> Somik Raha wrote:
> 
> >Hi Derrick,
> >  As of the last release, I'd noticed something
> >peculiar - testStringBeanListener() would fail
> every
> >once in a while. I ignored it, but then, after
> >Kaarle's fix to the attribute bug today, I find
> that
> >testStringBeanListener() fails every time.
> >
> >  I am not sure if we caused something to break
> >(probably one of us did). Could you take a look
> when
> >you are free ? Bytway, I did modify a few of the
> >tests, and the part which requires a parser to hold
> a
> >url to be serializable (I didn't actually
> understand
> >why that was necessary). Could that have caused the
> >problem ?
> >
> >Regards,
> >Somik
> >
> >
> >  
> >
> 
> 
> 
> 
>
-------------------------------------------------------
> This SF.NET email is sponsored by:
> SourceForge Enterprise Edition + IBM + LinuxWorld =
> Something 2 See!
> http://www.vasoftware.com
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: [Htmlparser-developer] testStringBeanListener() consistently failing

From: Somik R. <so...@ya...> - 2003-02-05 22:30:49

Hi Ling Ma
  It is very hard for us to help you with a vague
request for help. Can you pls post your code, and the
complete exception that you received ?

  If possible, submit a testcase showing the problem
(check
http://htmlparser.sourceforge.net/design/tests.html -
Communicate with testcases). Fixes are usually fast,
and we try to have a new release every week.

Regards,
Somik
--- Mr LING MA <law...@ya...> wrote:
> Dear Derick:
> I am a graduate student, my name is Ling.
> I just begin to use htmlparser 1.3, I try to parse 
> amazon pages, most of the time it gives me parsing
> error and exited.
> 
> Do you happen to know what was the reason? Is it
> just
> because amazon pages tags are not well closed? or
> some
> bugs in htmlparser?
> 
> Thanks a lot
> 
> Sincerely yours 
> Ling Ma
> --- Derrick Oswald <Der...@ro...> wrote:
> > Hi back,
> > 
> > Sorry, I've just been lurking the HTMLParser
> project
> > lately. That's 
> > because I liked the open source experience I had
> > with HTMLParser so much 
> > I started my own project:
> >     http://sourceforge.net/projects/connector/
> > 
> > I think the problem (although I didn't experience
> > it) with 
> > testStringBeanListener() is the fetch of the
> > external URL (slashdot.org) 
> > may fail and then the contents of the bean's
> string
> > property won't 
> > change meaning the listener isn't fired (my guess
> is
> > you were not 
> > getting data back from slashdot.org consistently
> and
> > then not at all for 
> > a period, it should be back to OK now).
> > 
> > So I switched to a
> >
> http://htmlparser.sourceforge.org/test/example.html,
> > 
> > which is something I said I was going to do a long
> > time ago anyway.
> > 
> > The change to a local string instead of a web page
> > for the serialization 
> > test may have missed the point, since the intent
> is
> > to be able to 
> > recover the connection and parse it after
> > rehydration, so I switched it 
> > back to a URL (the same as above). I hope that's
> OK,
> > it should go faster 
> > anyway because the page is smaller.  If it's not,
> > we'll have to 
> > investigate why it isn't.
> > 
> > Derrick
> > 
> > Somik Raha wrote:
> > 
> > >Hi Derrick,
> > >  As of the last release, I'd noticed something
> > >peculiar - testStringBeanListener() would fail
> > every
> > >once in a while. I ignored it, but then, after
> > >Kaarle's fix to the attribute bug today, I find
> > that
> > >testStringBeanListener() fails every time.
> > >
> > >  I am not sure if we caused something to break
> > >(probably one of us did). Could you take a look
> > when
> > >you are free ? Bytway, I did modify a few of the
> > >tests, and the part which requires a parser to
> hold
> > a
> > >url to be serializable (I didn't actually
> > understand
> > >why that was necessary). Could that have caused
> the
> > >problem ?
> > >
> > >Regards,
> > >Somik
> > >
> > >
> > >  
> > >
> > 
> > 
> > 
> > 
> >
>
-------------------------------------------------------
> > This SF.NET email is sponsored by:
> > SourceForge Enterprise Edition + IBM + LinuxWorld
> =
> > Something 2 See!
> > http://www.vasoftware.com
> > _______________________________________________
> > Htmlparser-developer mailing list
> > Htm...@li...
> >
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
> 
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up
> now.
> http://mailplus.yahoo.com
> 
> 
>
-------------------------------------------------------
> This SF.NET email is sponsored by:
> SourceForge Enterprise Edition + IBM + LinuxWorld =
> Something 2 See!
> http://www.vasoftware.com
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: [Htmlparser-developer] testStringBeanListener() consistently failing

From: Derrick O. <Der...@ro...> - 2003-02-05 23:07:06

Somik,

It's easily reproducible:

java  -jar  ./release/htmlparser1_3/lib/htmlparser.jar   
http://www.amazon.com

yields:

Begin Tag : br; begins at : 0; ends at : 3
WARNING: HTMLTagParser : Encountered > inside inverted commas in line
<td><a href="/exec/obidos/tg/browse/-/1055940/ref=gw_mafb_/"><img 
src="http://g-images.amazon.com/images/G/01/merchants/logos/marshall-fields-logo-20.gif" 
width=87 height=20 border=0 alt="Marshall Field's"></a></td>, location 205

                           ^
Automatically corrected.
ERROR: HTMLReader.readElement() : Error occurred while trying to 
decipher the tag using scanners
at Line 686 : null

...and then it really starts to have problems. It seems the "xxxxxx's" 
pattern causes grief as it reads the > in what it thinks is a single 
quoted string and 'fixes' it.

Derrick


Somik Raha wrote:

>Hi Ling Ma
>  It is very hard for us to help you with a vague
>request for help. Can you pls post your code, and the
>complete exception that you received ?
>
>  If possible, submit a testcase showing the problem
>(check
>http://htmlparser.sourceforge.net/design/tests.html -
>Communicate with testcases). Fixes are usually fast,
>and we try to have a new release every week.
>
>Regards,
>Somik
>--- Mr LING MA <law...@ya...> wrote:
>  
>
>>Dear Derick:
>>I am a graduate student, my name is Ling.
>>I just begin to use htmlparser 1.3, I try to parse 
>>amazon pages, most of the time it gives me parsing
>>error and exited.
>>
>>Do you happen to know what was the reason? Is it
>>just
>>because amazon pages tags are not well closed? or
>>some
>>bugs in htmlparser?
>>
>>Thanks a lot
>>
>>Sincerely yours 
>>Ling Ma
>>
>>    
>>

Re: [Htmlparser-developer] testStringBeanListener() consistently failing

From: Somik R. <so...@ya...> - 2003-02-05 23:18:50

Oh, thanks Derrick! I was actually hoping to get a
testcase that we could plug into the system.

Looks like we need to have some kind of precedence of
double quotes over single quotes..

Regards,
Somik
--- Derrick Oswald <Der...@ro...> wrote:
> Somik,
> 
> It's easily reproducible:
> 
> java  -jar 
> ./release/htmlparser1_3/lib/htmlparser.jar   
> http://www.amazon.com
> 
> yields:
> 
> Begin Tag : br; begins at : 0; ends at : 3
> WARNING: HTMLTagParser : Encountered > inside
> inverted commas in line
> <td><a
>
href="/exec/obidos/tg/browse/-/1055940/ref=gw_mafb_/"><img
> 
>
src="http://g-images.amazon.com/images/G/01/merchants/logos/marshall-fields-logo-20.gif"
> 
> width=87 height=20 border=0 alt="Marshall
> Field's"></a></td>, location 205
> 
>                            ^
> Automatically corrected.
> ERROR: HTMLReader.readElement() : Error occurred
> while trying to 
> decipher the tag using scanners
> at Line 686 : null
> 
> ...and then it really starts to have problems. It
> seems the "xxxxxx's" 
> pattern causes grief as it reads the > in what it
> thinks is a single 
> quoted string and 'fixes' it.
> 
> Derrick
> 
> 
> Somik Raha wrote:
> 
> >Hi Ling Ma
> >  It is very hard for us to help you with a vague
> >request for help. Can you pls post your code, and
> the
> >complete exception that you received ?
> >
> >  If possible, submit a testcase showing the
> problem
> >(check
> >http://htmlparser.sourceforge.net/design/tests.html
> -
> >Communicate with testcases). Fixes are usually
> fast,
> >and we try to have a new release every week.
> >
> >Regards,
> >Somik
> >--- Mr LING MA <law...@ya...> wrote:
> >  
> >
> >>Dear Derick:
> >>I am a graduate student, my name is Ling.
> >>I just begin to use htmlparser 1.3, I try to parse
> 
> >>amazon pages, most of the time it gives me parsing
> >>error and exited.
> >>
> >>Do you happen to know what was the reason? Is it
> >>just
> >>because amazon pages tags are not well closed? or
> >>some
> >>bugs in htmlparser?
> >>
> >>Thanks a lot
> >>
> >>Sincerely yours 
> >>Ling Ma
> >>
> >>    
> >>
> 
> 
> 
> 
>
-------------------------------------------------------
> This SF.NET email is sponsored by:
> SourceForge Enterprise Edition + IBM + LinuxWorld =
> Something 2 See!
> http://www.vasoftware.com
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
>
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: [Htmlparser-developer] testStringBeanListener() consistently failing

From: Derrick O. <Der...@ro...> - 2003-02-06 02:16:46

Somik,

hmmm,
Maybe a better paradigm than 'precedence' is to enter the 
in-double-quote state until encountering another double quote.
Similarly, enter the in-single-quote state until encountering another 
single quote.

Derrick

Somik Raha wrote:

>Oh, thanks Derrick! I was actually hoping to get a
>testcase that we could plug into the system.
>
>Looks like we need to have some kind of precedence of
>double quotes over single quotes..
>
>Regards,
>Somik
>
>  
>

Re: [Htmlparser-developer] testStringBeanListener() consistently failing

From: Somik R. <so...@ya...> - 2003-02-09 02:42:49

Derrick Oswald wrote:
> ERROR: HTMLReader.readElement() : Error occurred while trying to
> decipher the tag using scanners
> at Line 686 : null
>
> ...and then it really starts to have problems. It seems the "xxxxxx's"
> pattern causes grief as it reads the > in what it thinks is a single
> quoted string and 'fixes' it.

I doubt that this is the problem.. It was bcos of the TableScanner, Div
Scanner, and Span scanners. I had taken a gamble by putting them in the
current set of registered scanners - there's a lot of dirty html out there
that don't close the div's or span's or table's. Instead of fixing this
issue by adding more code, I want to try refactoring this logic from the
link and form scanners and reuse it.

For now, the above mentioned three scanners are not registered by default
(the page gets parsed just fine after that).

Regards,
Somik

Re: [Htmlparser-developer] testStringBeanListener() consistently failing

From: Somik R. <so...@ya...> - 2003-02-09 03:27:58

Hi,
    Here's a testcase that proves there's no bug:

public void testTagWithQuotes() throws Exception {
    String testHtml =
        "<img
src=\"http://g-images.amazon.com/images/G/01/merchants/logos/marshall-fields
-logo-20.gif\" width=87 height=20 border=0 alt=\"Marshall Field's\">";
    createParser(testHtml);
    parseAndAssertNodeCount(1);
    assertType("should be HTMLTag",HTMLTag.class,node[0]);
    HTMLTag tag = (HTMLTag)node[0];
    assertStringEquals("alt","Marshall Field's",tag.getAttribute("ALT"));
    assertStringEquals(
        "html",
        "<IMG BORDER=\"0\" ALT=\"Marshall Field's\" WIDTH=\"87\"
SRC=\"http://g-images.amazon.com/images/G/01/merchants/logos/marshall-fields
-logo-20.gif\" HEIGHT=\"20\">",
        tag.toHTML()
    );
}

This test is now in org.htmlparser.tests.parserHelperTests.TagParserTest

Regards,
Somik
----- Original Message -----
From: "Somik Raha" <so...@ya...>
To: <htm...@li...>
Sent: Saturday, February 08, 2003 6:44 PM
Subject: Re: [Htmlparser-developer] testStringBeanListener() consistently
failing


> Derrick Oswald wrote:
> > ERROR: HTMLReader.readElement() : Error occurred while trying to
> > decipher the tag using scanners
> > at Line 686 : null
> >
> > ...and then it really starts to have problems. It seems the "xxxxxx's"
> > pattern causes grief as it reads the > in what it thinks is a single
> > quoted string and 'fixes' it.
>
> I doubt that this is the problem.. It was bcos of the TableScanner, Div
> Scanner, and Span scanners. I had taken a gamble by putting them in the
> current set of registered scanners - there's a lot of dirty html out there
> that don't close the div's or span's or table's. Instead of fixing this
> issue by adding more code, I want to try refactoring this logic from the
> link and form scanners and reuse it.
>
> For now, the above mentioned three scanners are not registered by default
> (the page gets parsed just fine after that).
>
> Regards,
> Somik
>
>
>
> -------------------------------------------------------
> This SF.NET email is sponsored by:
> SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
> http://www.vasoftware.com
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer