Thread: [Htmlparser-developer] Charset tests failing

Brought to you by: derrickoswald

htmlparser-developer

[Htmlparser-developer] Charset tests failing

From: Somik R. <so...@ya...> - 2002-12-27 07:17:56

Hi Derrick,
    I was working on the latest parser, and just found that the charset =
tests are failing (I didnt modify anything in HTMLParser).
    Could you check what might have gone wrong ? Also, I was wondering =
if it might not be better to have the charset pages on our domain =
(http://htmlparser.sourceforge.net) - we could put up encoded pages and =
they'd always be there. =20

Regards
Somik

Re: [Htmlparser-developer] Charset tests failing

From: Derrick O. <Der...@ro...> - 2002-12-29 01:14:58

Somik,

Sorry, just got back from the pilgrimage to Bethlehem (Toronto).

All 268 tests are running OK against the version I took just now.
Perhaps www.ibm.co.jp or www.sony.co.jp weren't available for you when 
you tried it. Try again, please.

The tests can't be all in our repository, specifically the 
testHTTPCharset relies on an HTTP header charset parameter being set (by 
the HTTP server) to something other than ISO-8859-1 which the 
sourceforge people are unlikely to set up for us. We could put the 
testHTMLCharset under the sourceforge domain though. Will get to it as 
time permits.

Derrick

Somik Raha wrote:

> Hi Derrick,
>     I was working on the latest parser, and just found that the 
> charset tests are failing (I didnt modify anything in HTMLParser).
>     Could you check what might have gone wrong ? Also, I was wondering 
> if it might not be better to have the charset pages on our domain 
> (http://htmlparser.sourceforge.net) - we could put up encoded pages 
> and they'd always be there.  
>  
> Regards
> Somik
>

Re: [Htmlparser-developer] Charset tests failing

From: Somik R. <so...@ya...> - 2002-12-29 01:21:20

Hi Derrick,
    Are you sure ? I have the latest version here too - but the two =
tests are consistently failing.=20
Here's what I see :
Charset should be Shift_JIS but was ISO-8859-1

for both failures. =20

Regards,
Somik
  ----- Original Message -----=20
  From: Derrick Oswald=20
  To: htm...@li...=20
  Sent: Saturday, December 28, 2002 5:17 PM
  Subject: Re: [Htmlparser-developer] Charset tests failing


  Somik,

  Sorry, just got back from the pilgrimage to Bethlehem (Toronto).

  All 268 tests are running OK against the version I took just now.
  Perhaps www.ibm.co.jp or www.sony.co.jp weren't available for you when =
you tried it. Try again, please.

  The tests can't be all in our repository, specifically the =
testHTTPCharset relies on an HTTP header charset parameter being set (by =
the HTTP server) to something other than ISO-8859-1 which the =
sourceforge people are unlikely to set up for us. We could put the =
testHTMLCharset under the sourceforge domain though. Will get to it as =
time permits.

  Derrick

  Somik Raha wrote:

    Hi Derrick,
        I was working on the latest parser, and just found that the =
charset tests are failing (I didnt modify anything in HTMLParser).
        Could you check what might have gone wrong ? Also, I was =
wondering if it might not be better to have the charset pages on our =
domain (http://htmlparser.sourceforge.net) - we could put up encoded =
pages and they'd always be there. =20

    Regards
    Somik
       =20

Re: [Htmlparser-developer] Charset tests failing

From: Derrick O. <Der...@ro...> - 2002-12-29 15:50:41

Somik,

Finally reproduced it by backing down my JVM to 1.2.
Code fix and test modifications dropped.
Sorry about that agent 99.

Derrick

Somik Raha wrote:

> Hi Derrick,
>     Are you sure ? I have the latest version here too - but the two 
> tests are consistently failing.
> Here's what I see :
> Charset should be Shift_JIS but was ISO-8859-1
>  
> for both failures. 
>  
> Regards,
> Somik

[Htmlparser-developer] AI - to be or not to be

From: Somik R. <so...@ya...> - 2002-12-30 05:19:57

Hi Folks,
    Derrick Oswald just checked in a test case that fails..
Here's a link tag :

<a href="http://cbc.ca/artsCanada/stories/greatnorth271202"
class="lgblacku">Vancouver schools plan 'Great Northern Way'</a>

We are in a quandary now. When we have cases like :

<a href="something.html">Kaarle's Page</a>

we should accept the apostrophe without doing anything special.

When we get links like,
<script>
    var code = '<sometag>';
</script>

We should not take the tag symbols after code seriously, as they are part of
the string.

Handling the last two cases causes a conflict with the first case -bcos the
last case is handled by checking if there's a < after ' - and this causes
the first case to go into an ignoring mode.

How do we handle this problem ? Do we write smart code to handle this
particular situation ? From human experience, even if we've not encountered
these cases, we know how to differentiate between a string node and a tag.
Can AI help us here ? Also, pls feel free to suggest any straightforward
solutions as well.

Regards
Somik

Re: [Htmlparser-developer] AI - to be or not to be

From: Sam J. <ga...@yh...> - 2002-12-30 05:54:14

Hi Somik,

I would have thought the solution to this would be to define under which 
circumstances a pair of apostrophes will indicate text.  In the first 
two examples you are inside an ANCHOR tag, and in the third you are 
inside a SCRIPT tag.  It seems to me that you should only be using 
apostrophe's to indicate text strings inside a SCRIPT tag, no?  If 
that's true then set your parsing behaviour differently depending on the 
tag type.

CHEERS> SAM

Somik Raha wrote:

>Hi Folks,
>    Derrick Oswald just checked in a test case that fails..
>Here's a link tag :
>
><a href="http://cbc.ca/artsCanada/stories/greatnorth271202"
>class="lgblacku">Vancouver schools plan 'Great Northern Way'</a>
>
>We are in a quandary now. When we have cases like :
>
><a href="something.html">Kaarle's Page</a>
>
>we should accept the apostrophe without doing anything special.
>
>When we get links like,
><script>
>    var code = '<sometag>';
></script>
>
>We should not take the tag symbols after code seriously, as they are part of
>the string.
>
>Handling the last two cases causes a conflict with the first case -bcos the
>last case is handled by checking if there's a < after ' - and this causes
>the first case to go into an ignoring mode.
>
>How do we handle this problem ? Do we write smart code to handle this
>particular situation ? From human experience, even if we've not encountered
>these cases, we know how to differentiate between a string node and a tag.
>Can AI help us here ? Also, pls feel free to suggest any straightforward
>solutions as well.
>
>Regards
>Somik
>
>
>
>
>-------------------------------------------------------
>This sf.net email is sponsored by:ThinkGeek
>Welcome to geek heaven.
>http://thinkgeek.com/sf
>_______________________________________________
>Htmlparser-developer mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>
>  
>

Re: [Htmlparser-developer] AI - to be or not to be

From: Somik R. <so...@ya...> - 2002-12-30 06:14:51

Sam Joseph wrote:
> I would have thought the solution to this would be to define under which
> circumstances a pair of apostrophes will indicate text.  In the first
> two examples you are inside an ANCHOR tag, and in the third you are
> inside a SCRIPT tag.  It seems to me that you should only be using
> apostrophe's to indicate text strings inside a SCRIPT tag, no?  If
> that's true then set your parsing behaviour differently depending on the
> tag type.

You are right. The script scanner could set some static variable in the
HTMLStringNode - which tells it to move into the ignore states if it
encounters an apostrophe, and flag it off when its done. I'll probably do
that for now..

Regards,
Somik
----- Original Message -----
From: "Sam Joseph" <ga...@yh...>
To: <htm...@li...>
Sent: Sunday, December 29, 2002 10:09 PM
Subject: Re: [Htmlparser-developer] AI - to be or not to be


> Hi Somik,
>
> I would have thought the solution to this would be to define under which
> circumstances a pair of apostrophes will indicate text.  In the first
> two examples you are inside an ANCHOR tag, and in the third you are
> inside a SCRIPT tag.  It seems to me that you should only be using
> apostrophe's to indicate text strings inside a SCRIPT tag, no?  If
> that's true then set your parsing behaviour differently depending on the
> tag type.
>
> CHEERS> SAM
>
> Somik Raha wrote:
>
> >Hi Folks,
> >    Derrick Oswald just checked in a test case that fails..
> >Here's a link tag :
> >
> ><a href="http://cbc.ca/artsCanada/stories/greatnorth271202"
> >class="lgblacku">Vancouver schools plan 'Great Northern Way'</a>
> >
> >We are in a quandary now. When we have cases like :
> >
> ><a href="something.html">Kaarle's Page</a>
> >
> >we should accept the apostrophe without doing anything special.
> >
> >When we get links like,
> ><script>
> >    var code = '<sometag>';
> ></script>
> >
> >We should not take the tag symbols after code seriously, as they are part
of
> >the string.
> >
> >Handling the last two cases causes a conflict with the first case -bcos
the
> >last case is handled by checking if there's a < after ' - and this causes
> >the first case to go into an ignoring mode.
> >
> >How do we handle this problem ? Do we write smart code to handle this
> >particular situation ? From human experience, even if we've not
encountered
> >these cases, we know how to differentiate between a string node and a
tag.
> >Can AI help us here ? Also, pls feel free to suggest any straightforward
> >solutions as well.
> >
> >Regards
> >Somik
> >
> >
> >
> >
> >-------------------------------------------------------
> >This sf.net email is sponsored by:ThinkGeek
> >Welcome to geek heaven.
> >http://thinkgeek.com/sf
> >_______________________________________________
> >Htmlparser-developer mailing list
> >Htm...@li...
> >https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
> >
> >
> >
> >
>
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer