htmlparser-developer Mailing List for HTML Parser (Page 27)

Brought to you by: derrickoswald

htmlparser-developer — The developer mailing list of the htmlparser project

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (4)	Nov (1)	Dec (4)
2002	Jan (12)	Feb	Mar (7)	Apr (27)	May (14)	Jun (16)	Jul (27)	Aug (74)	Sep (1)	Oct (23)	Nov (12)	Dec (119)
2003	Jan (31)	Feb (23)	Mar (28)	Apr (59)	May (119)	Jun (10)	Jul (3)	Aug (17)	Sep (8)	Oct (38)	Nov (6)	Dec (1)
2004	Jan (4)	Feb (4)	Mar (1)	Apr (2)	May	Jun (7)	Jul (6)	Aug (1)	Sep	Oct	Nov	Dec
2005	Jan	Feb (1)	Mar	Apr (8)	May	Jun	Jul	Aug (2)	Sep (10)	Oct (4)	Nov (15)	Dec
2006	Jan	Feb (1)	Mar	Apr (4)	May (11)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2007	Jan (3)	Feb (2)	Mar	Apr (2)	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec
2008	Jan	Feb (1)	Mar	Apr	May	Jun	Jul	Aug	Sep (5)	Oct (1)	Nov	Dec
2009	Jan	Feb (1)	Mar	Apr (2)	May	Jun (4)	Jul	Aug (1)	Sep	Oct	Nov	Dec (2)
2010	Jan (1)	Feb	Mar	Apr (8)	May	Jun	Jul	Aug	Sep (6)	Oct	Nov (1)	Dec
2011	Jan	Feb	Mar	Apr	May (3)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2012	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2014	Jan	Feb	Mar	Apr	May (1)	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr (1)	May	Jun (1)	Jul	Aug	Sep	Oct	Nov (2)	Dec (1)
2016	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov (2)	Dec (2)

Flat | Threaded

<< < 1 .. 25 26 27 28 29 .. 33 > >> (Page 27 of 33)

RE: [Htmlparser-developer] Writing test cases

From: <dha...@or...> - 2002-08-16 09:07:58

Attachments: BDY.RTF

Hey Somik,

Thats a lovely idea. We'll definitely try it out.

Dhaval


-----Original Message-----
From: somik [mailto:so...@ya...]
Sent: Wednesday, August 14, 2002 8:55 PM
To: htmlparser-developer
Cc: somik
Subject: Re: [Htmlparser-developer] Writing test cases


Hi Dhaval,
     A word of advice, dont break your head over CVS  & SSH (unless u
have a
couple of aspirins).
    Switch to Eclipse - a free Open Source IDE and the best. :)
(www.eclipse.org) It has integration with CVS - suited for team
programming,
and also has vast no of winner features (not the least of which is
refactoring).

    If you cant make the switch - then go for Tortoise CVS - it
integrates
with your windows explorer, things get a lot easier to use CVS with SSH
after that..

    Though - another incentive for using Eclipse-  I have developed a
Pair
Programming plugin for eclipse at http://sangam.sourceforge.net - and I
am
releasing the next version this weekend. If you use Eclipse, then we can
pair program over the internet on htmlparser :). The future is here!

Cheers,
Somik



-------------------------------------------------------
This sf.net email is sponsored by: Dice - The leading online job board
for high-tech professionals. Search and apply for tech jobs today!
http://seeker.dice.com/seeker.epl?rel_code=31
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer

Re: [Htmlparser-developer] Writing test cases

From: Somik R. <so...@ya...> - 2002-08-14 15:25:00

Hi Dhaval,
     A word of advice, dont break your head over CVS  & SSH (unless u have a
couple of aspirins).
    Switch to Eclipse - a free Open Source IDE and the best. :)
(www.eclipse.org) It has integration with CVS - suited for team programming,
and also has vast no of winner features (not the least of which is
refactoring).

    If you cant make the switch - then go for Tortoise CVS - it integrates
with your windows explorer, things get a lot easier to use CVS with SSH
after that..

    Though - another incentive for using Eclipse-  I have developed a Pair
Programming plugin for eclipse at http://sangam.sourceforge.net - and I am
releasing the next version this weekend. If you use Eclipse, then we can
pair program over the internet on htmlparser :). The future is here!

Cheers,
Somik

RE: [Htmlparser-developer] HTMLStringNode

From: <dha...@or...> - 2002-08-14 12:13:49

Attachments: BDY.RTF

Hi Somik,

> be good to have a user-defined controlling mechanism, to choose if it
should be system autodetected, or the particular end of line char to be
used.

Yeah I think thats a good idea.

Bytway, I am finding very little time to devote at present (for a couple
of
weeks) on the parser. If you or other developers can volunteer to work
on
this, it will really benefit the product and the community.

I'm up for it as much as work here allows me to.

Dhaval

Re: [Htmlparser-developer] Writing test cases

From: Somik R. <so...@ya...> - 2002-08-14 12:00:32

Hi Dhaval,
> Do let me know if you are interested in including it as a
> part of the standard library and what I will need to do for that
> purpose. I will feel a little bit more confidant if someone goes through
> my code since this is my first time. I have changed the tag-scanners
> quite a fair bit from the time I last sent them to u.
>

That would be good. In fact, if you can check it into CVS, then I can work
on it the moment I find some time, maybe sometime tomorrow.

You need to sign up as a developer at
http://sourceforge.net/account/register.php and send me your id. I will then
add you as a developer for htmlparser. You can then check in your code
directly with CVS.

Bytway, thank you for asking so many questions - I have been wanting to put
out all this info in the docs, but now that you've brought it all out, its
there in the mail archives for others. Of course, we still need good docs :)

Cheers,
Somik

Re: [Htmlparser-developer] HTMLStringNode

From: Somik R. <so...@ya...> - 2002-08-14 10:30:59

> I am observing one more strange occurrence in a HTMLStringNode.
>
> Whenever I have a string between 2 tags whose last character is \n it is
> returned to me appended by \r\n.

Yes, this is indeed a bug.. I am not even sure if we should autodetect
system default. What if, you are creating the parsed file in linux and
sending it to windoze machines? We'd still have the nice squares. It would
be good to have a user-defined controlling mechanism, to choose if it should
be system autodetected, or the particular end of line char to be used.

Bytway, I am finding very little time to devote at present (for a couple of
weeks) on the parser. If you or other developers can volunteer to work on
this, it will really benefit the product and the community.
The process is now streamlined, so you can easily make releases by simple
using the ant file (in CVS).

Also - it is not good for a project that is used by so many to rest on one
person.


Regards,
Somik

Re: [Htmlparser-developer] Writing OPTION tag

From: Somik R. <so...@ya...> - 2002-08-14 08:49:53

Hi Dhaval,
> Thats exactly what happens. Everythign inside <OPTION ..> will be tag
> and outside it will be HTMLStringNode however when I ahve to read
> another <OPTIOn ....tag> wherein the previous OPTION tag did not have a
> closing </OPTION> the later <OPTION....> tag gets read and since it is
> once read it is unavailable for scanning again as a new Option tag.
> Anyway I seem to have made my testcases work by storing the previous
> node value and in case </OPTION> is not present I take care of it
> accordingly. I have just added some more test cases to validate its
> robustness. For the time being I think its done.

Good question. I faced the same thing with several other tags. To counter
this issue - you will find a variable in the evaluate() method -
previousOpenScanner. Suppose you are trying to search for </OPTION> and
encounter a <OPTION> instead, then evaluate actually allows you to do
something about it. At that point, you must fool the open scanner into
believing that the previous tag got closed. This is exactly whats done in
HTMLLinkScanner. On seeing there was a previousOpenScanner, we accept it as
true. And in scan(), the end tag (which wasnt there) is returned, putting in
a correction, so that the next tag still gets parsed (in elementEnd()
positioning).

Let me know if you need more help. (You simply cant do this without
testcases..)

Cheers
Somik

----- Original Message -----
From: <dha...@or...>
To: <htm...@li...>
Sent: Wednesday, August 14, 2002 5:15 PM
Subject: RE: [Htmlparser-developer] Writing OPTION tag


> Hi Somik,
>
> Thats exactly what happens. Everythign inside <OPTION ..> will be tag
> and outside it will be HTMLStringNode however when I ahve to read
> another <OPTIOn ....tag> wherein the previous OPTION tag did not have a
> closing </OPTION> the later <OPTION....> tag gets read and since it is
> once read it is unavailable for scanning again as a new Option tag.
> Anyway I seem to have made my testcases work by storing the previous
> node value and in case </OPTION> is not present I take care of it
> accordingly. I have just added some more test cases to validate its
> robustness. For the time being I think its done.
>
> Thanx for the response nevertheless.
>
> Regards,
>
> Dhaval Udani
> Senior Analyst
> M-Line, QPEG
> OrbiTech Solutions Ltd.
> +91-22-8290019 Extn. 1457
>
>
>
> -----Original Message-----
> From: somik [mailto:so...@ya...]
> Sent: Wednesday, August 14, 2002 1:14 PM
> To: htmlparser-developer
> Cc: somik
> Subject: Re: [Htmlparser-developer] Writing OPTION tag
>
>
> Hi Dhaval,
>     Sorry, Ive been really swamped..
> > The problem with my input is that <OPTION value="AltaVista Search">
> > would be read as an OptionTag, AltaVista would be read as the
> StringNode
> > and then <OPTION value="Lycos Search"> would be read and since it is
> > neither a StringNode nor an EndTag an OptionTag would be created for
> the
> > above 2 values. ..
>
> This idea is incorrect. <OPTION. .... > is a tag. Nothing inside the
> Option
> tag is a string node.
> <OPTION ... >  (this is HTMLTag)
>     some text here sdjklsdjk   (this is HTMLStringNode)
> </OPTION> (this is HTMLEndTag)
>
> HTH.
>
> Cheers,
> Somik
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: Dice - The leading online job board
> for high-tech professionals. Search and apply for tech jobs today!
> http://seeker.dice.com/seeker.epl?rel_code=31
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
>

RE: [Htmlparser-developer] HTMLStringNode

From: <dha...@or...> - 2002-08-14 08:30:30

Attachments: BDY.RTF

I am observing one more strange occurrence in a HTMLStringNode. 

Whenever I have a string between 2 tags whose last character is \n it is
returned to me appended by \r\n.

Regards,

Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457



-----Original Message-----
From: Udani, Dhaval H. 
Sent: Wednesday, August 14, 2002 1:45 PM
To: htmlparser-developer
Cc: Udani, Dhaval H.
Subject: RE: [Htmlparser-developer] Writing OPTION tag


Hi Somik,

Thats exactly what happens. Everythign inside <OPTION ..> will be tag
and outside it will be HTMLStringNode however when I ahve to read
another <OPTIOn ....tag> wherein the previous OPTION tag did not have a
closing </OPTION> the later <OPTION....> tag gets read and since it is
once read it is unavailable for scanning again as a new Option tag.
Anyway I seem to have made my testcases work by storing the previous
node value and in case </OPTION> is not present I take care of it
accordingly. I have just added some more test cases to validate its
robustness. For the time being I think its done.

Thanx for the response nevertheless.

Regards,

Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457



-----Original Message-----
From: somik [mailto:so...@ya...]
Sent: Wednesday, August 14, 2002 1:14 PM
To: htmlparser-developer
Cc: somik
Subject: Re: [Htmlparser-developer] Writing OPTION tag


Hi Dhaval,
    Sorry, Ive been really swamped..
> The problem with my input is that <OPTION value="AltaVista Search">
> would be read as an OptionTag, AltaVista would be read as the
StringNode
> and then <OPTION value="Lycos Search"> would be read and since it is
> neither a StringNode nor an EndTag an OptionTag would be created for
the
> above 2 values. ..

This idea is incorrect. <OPTION. .... > is a tag. Nothing inside the
Option
tag is a string node.
<OPTION ... >  (this is HTMLTag)
    some text here sdjklsdjk   (this is HTMLStringNode)
</OPTION> (this is HTMLEndTag)

HTH.

Cheers,
Somik



-------------------------------------------------------
This sf.net email is sponsored by: Dice - The leading online job board
for high-tech professionals. Search and apply for tech jobs today!
http://seeker.dice.com/seeker.epl?rel_code=31
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer

RE: [Htmlparser-developer] Writing OPTION tag

From: <dha...@or...> - 2002-08-14 08:16:58

Attachments: BDY.RTF

Hi Somik,

Thats exactly what happens. Everythign inside <OPTION ..> will be tag
and outside it will be HTMLStringNode however when I ahve to read
another <OPTIOn ....tag> wherein the previous OPTION tag did not have a
closing </OPTION> the later <OPTION....> tag gets read and since it is
once read it is unavailable for scanning again as a new Option tag.
Anyway I seem to have made my testcases work by storing the previous
node value and in case </OPTION> is not present I take care of it
accordingly. I have just added some more test cases to validate its
robustness. For the time being I think its done.

Thanx for the response nevertheless.

Regards,

Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457



-----Original Message-----
From: somik [mailto:so...@ya...]
Sent: Wednesday, August 14, 2002 1:14 PM
To: htmlparser-developer
Cc: somik
Subject: Re: [Htmlparser-developer] Writing OPTION tag


Hi Dhaval,
    Sorry, Ive been really swamped..
> The problem with my input is that <OPTION value="AltaVista Search">
> would be read as an OptionTag, AltaVista would be read as the
StringNode
> and then <OPTION value="Lycos Search"> would be read and since it is
> neither a StringNode nor an EndTag an OptionTag would be created for
the
> above 2 values. ..

This idea is incorrect. <OPTION. .... > is a tag. Nothing inside the
Option
tag is a string node.
<OPTION ... >  (this is HTMLTag)
    some text here sdjklsdjk   (this is HTMLStringNode)
</OPTION> (this is HTMLEndTag)

HTH.

Cheers,
Somik



-------------------------------------------------------
This sf.net email is sponsored by: Dice - The leading online job board
for high-tech professionals. Search and apply for tech jobs today!
http://seeker.dice.com/seeker.epl?rel_code=31
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer

Re: [Htmlparser-developer] Writing OPTION tag

From: Somik R. <so...@ya...> - 2002-08-14 07:43:29

Hi Dhaval,
    Sorry, Ive been really swamped..
> The problem with my input is that <OPTION value="AltaVista Search">
> would be read as an OptionTag, AltaVista would be read as the StringNode
> and then <OPTION value="Lycos Search"> would be read and since it is
> neither a StringNode nor an EndTag an OptionTag would be created for the
> above 2 values. ..

This idea is incorrect. <OPTION. .... > is a tag. Nothing inside the Option
tag is a string node.
<OPTION ... >  (this is HTMLTag)
    some text here sdjklsdjk   (this is HTMLStringNode)
</OPTION> (this is HTMLEndTag)

HTH.

Cheers,
Somik

RE: [Htmlparser-developer] Writing OPTION tag

From: <dha...@or...> - 2002-08-14 07:06:41

Attachments: BDY.RTF

Hi guys,

I am yet trying to solve my problem with the scanner of my OPTION tag. I
would really appreciate any help from the developers of the parsing
engine. I think a solution may lie in knowing certain internals of the
parser.

Let me explain my problem in detail.

Assume the following 2 OPTION tags :
<OPTION value="AltaVista Search">AltaVista
<OPTION value="Lycos Search"></OPTION>

The OPTION tag does not explicitly require an end tag. Hence the first
line is valid.
My parsing logic in scan is as follows :
1. Disable existing parsers
2. Read elements from the Reader.
3. Check whether it is an EndTag for OPTION or SELECT (since OPTION tags
are always under SELECT). If so create an OptionTag object with
necessary values
4. If it is not an EndTag, check whether it is a StringNode (this would
be for the value between <OPTION> and </OPTION> tags). If so it is the
text of the OPTION tag and store it temporarily. (This will be later
used in the constructor).
5. If it is neither it could be an error or the beginning of another tag
(possible another <OPTION> tag as above) and hence the current loop must
be terminated and the option object must be constructed.


The problem with my input is that <OPTION value="AltaVista Search">
would be read as an OptionTag, AltaVista would be read as the StringNode
and then <OPTION value="Lycos Search"> would be read and since it is
neither a StringNode nor an EndTag an OptionTag would be created for the
above 2 values. However since this tag is already read it will not
qualify as a new OptionTag and hence I am missing out this tag in my
parsing. I hope I have been able to explain my problem clearly. If not,
I would certainly like to clarify on any points which are not
understood.


A snippet of code from scan() of HTMLOptionTagScanner is given below

Vector lScannerVector = HTMLParserUtils.adjustScanners(pReader);  
do 
{
      lNode = pReader.readElement();
      System.out.println(lNode.toHTML());
      if (lNode instanceof HTMLEndTag)
      {
            lEndTag = (HTMLEndTag)lNode;
            String lEndTagString = lEndTag.getText().toUpperCase();
            if (lEndTagString.equals("OPTION") ||
lEndTagString.equals("SELECT")) 
            {
                  endTagFound = true;
            }
      }
      else if (lNode instanceof HTMLStringNode)
      {
            lText.append(lNode.toHTML());
      }
      else if (lNode instanceof HTMLTag)
      {
            endTagFound = true;
      }
}
while (!endTagFound);

HTMLOptionTag lOptionTag = new HTMLOptionTag(0, lNode.elementEnd(),
pTag.getText(), lText.toString(), pCurrLine);
HTMLParserUtils.restoreScanners(pReader, lScannerVector);

Regards,

Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457

[Htmlparser-developer] Writing test cases

From: <dha...@or...> - 2002-08-12 12:49:15

Attachments: BDY.RTF

Hi Somik,

My test case writing has become quite simplified by basically following
your template codes. Should be done with all test cases by tommorrow. I
have finished writing tag-scanner pairs for INPUT, TEXTAREA, OPTION &
SELECT tags. Do let me know if you are interested in including it as a
part of the standard library and what I will need to do for that
purpose. I will feel a little bit more confidant if someone goes through
my code since this is my first time. I have changed the tag-scanners
quite a fair bit from the time I last sent them to u.

Regards,

Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-8290019 Extn. 1457

[Htmlparser-developer] Re: [Htmlparser-user] Change in Layout

From: Somik R. <so...@ya...> - 2002-08-10 08:22:39

Dhaval Udani wrote :

My team is building a framework which is used by many projects in my
organization. All the other projects create HTML with their own
look-and-feel. To use the framework, they need to convert these files
into a JSP(using a tool developed by my team). The tool apart from jsut
changing the extension ;) also adds lots of JSP code and makes certain
modifications to the HTML tags(not the presentation tags though). After
the JSP is created if the layout changes, they will ahve to again spend
time correcting this anomaly and will need to keep doing it everytime
they change their HTML page or the tool is updated. Now I guess you can
understand why I feel so strongly about maintaining layout.

I am not sure I fully understand. The other teams are creating HTML with =
their own look and feel. You are converting it to a JSP. Naturally the =
alignment would have changes by your additions itself. Now, if the =
original HTML is preserved in functionality but not in exact layout as =
it arrived, I did not understand how that causes a problem in your other =
teams. Are they reading your jsp file through some program ?=20

If you can give some more details, a clearer picture might emerge.

Regards,
Somik

Re: [Htmlparser-developer] Re: [Htmlparser-user] Another Ill-Formed Example

From: Somik R. <so...@ya...> - 2002-08-10 08:17:21

Hi Claude,   =20
    You've again raised a good point. I will look into this for next =
week's release.

Regards
Somik
  ----- Original Message -----=20
  From: Claude Duguay=20
  To: htm...@li...=20
  Sent: Friday, August 09, 2002 12:58 AM
  Subject: RE: [Htmlparser-developer] Re: [Htmlparser-user] Another =
Ill-Formed Example

  Based on your description there is a risk that calling hasMoreNodes =
without calling nextHTMLNode a few times in a row will not have the =
desired API semantics. If the parsing takes place in the call to =
hasMoreNodes, then the parser moves forward, regardless of whether the =
nextHTMLNode method was called. This suggests that the method should be =
called something else, more indicative of this behavior, or the behavior =
should be changed.
  =20
  -----Original Message-----=20
  From: Somik Raha [mailto:so...@ya...]=20
  Sent: Thu 8/8/2002 12:07 AM=20
  To: htm...@li...=20
  Cc:=20
  Subject: Re: [Htmlparser-developer] Re: [Htmlparser-user] Another =
Ill-Formed Example

  Hi Claude,
      Thanks for the kind words.

  BTW: I was giving some thought to the calls that take place in =
HTMLEnumeration. As far as I could tell, many internal calls were made =
twice, by virtue of the hasMoreNodes/nextHTMLNode pattern. An alternate =
pattern is repeated calls to nextHTMLNode which should stop when a null =
response is returned. This pattern is used by the =
BufferedReader.readLine method, by the JDBC ResultSet.next method, etc. =
Based on the simple observation that calls to hasMoreNodes AND =
nextHTMLNode run some of the same underlying code, it seems that the =
speed of the parser could be positively influenced by reducing the =
interface to a single call. Any thoughts?

  I am not so sure this would be a good idea, because then, we'd have to =
compromise on the API. Then users would have to be checking for null =
values-  the iterator interface is also one that is popular and we have =
a familiarity factor here.

  As far as optimization goes, the nextHTMLNode doesent do parsing, it =
simply returns the node that was parsed internally when hasMoreNodes() =
was called. So, the only speed up would be in the reduction of a call - =
I am not so sure that this would be the best place for such a speedup.

  Bytway, talking about speedups, the last release and the next one =
should see some tweaks - and the performance ought to have gotten =
better. Are you still doing the performance testing ? Any results to =
share ?

  Cheers,
  Somik

[Htmlparser-developer] Next integration release is out

From: Somik R. <so...@ya...> - 2002-08-10 08:15:00

Hi Folks,
    Next release (v1.2-2002-08-11) is out.
    From the change log :

[1] Fixed bug 590703 - Empty values dont get parsed
[2] Fixed bug 591435 - Missing values cause keys to be missed
[3] Removed all infinite loops in scanners, replaced with throwing
HTMLParserException
[4] Fixed bug in HTMLTitleScanner, allowing certain malformed title tags to
be parsed
[5] Modified HTMLReader - now accepts Reader instead of BufferedReader
[6] HTMLParser constructor now throws HTMLParserException.
[7] Fixed bug 592355 - Empty tags throw exceptions from some scanners. Now,
if the tag is empty, it is not passed down to scanners. Also, fixed the
related issue in HTMLStringNode, causing empty tags to be treated as tags
and not strings.

A very significant fix is #3 - I would highly recommend upgrading your
copies asap.
Also, following suggestions of Amit Rana, the constructor itself throws
HTMLParserException.

You can expect some more API changes in the coming weeks, as we attempt to
integrate Claude's other contributions (Parser Feedback). We've got over 150
tests and all passing.

Regards,
Somik

RE: [Htmlparser-developer] Re: [Htmlparser-user] Another Ill-Formed Example

From: Claude D. <CD...@ar...> - 2002-08-08 15:58:20

QmFzZWQgb24geW91ciBkZXNjcmlwdGlvbiB0aGVyZSBpcyBhIHJpc2sgdGhhdCBjYWxsaW5nIGhh
c01vcmVOb2RlcyB3aXRob3V0IGNhbGxpbmcgbmV4dEhUTUxOb2RlIGEgZmV3IHRpbWVzIGluIGEg
cm93IHdpbGwgbm90IGhhdmUgdGhlIGRlc2lyZWQgQVBJIHNlbWFudGljcy4gSWYgdGhlIHBhcnNp
bmcgdGFrZXMgcGxhY2UgaW4gdGhlIGNhbGwgdG8gaGFzTW9yZU5vZGVzLCB0aGVuIHRoZSBwYXJz
ZXIgbW92ZXMgZm9yd2FyZCwgcmVnYXJkbGVzcyBvZiB3aGV0aGVyIHRoZSBuZXh0SFRNTE5vZGUg
bWV0aG9kIHdhcyBjYWxsZWQuIFRoaXMgc3VnZ2VzdHMgdGhhdCB0aGUgbWV0aG9kIHNob3VsZCBi
ZSBjYWxsZWQgc29tZXRoaW5nIGVsc2UsIG1vcmUgaW5kaWNhdGl2ZSBvZiB0aGlzIGJlaGF2aW9y
LCBvciB0aGUgYmVoYXZpb3Igc2hvdWxkIGJlIGNoYW5nZWQuDQogDQotLS0tLU9yaWdpbmFsIE1l
c3NhZ2UtLS0tLSANCkZyb206IFNvbWlrIFJhaGEgW21haWx0bzpzb21pa0B5YWhvby5jb21dIA0K
U2VudDogVGh1IDgvOC8yMDAyIDEyOjA3IEFNIA0KVG86IGh0bWxwYXJzZXItZGV2ZWxvcGVyQGxp
c3RzLnNvdXJjZWZvcmdlLm5ldCANCkNjOiANClN1YmplY3Q6IFJlOiBbSHRtbHBhcnNlci1kZXZl
bG9wZXJdIFJlOiBbSHRtbHBhcnNlci11c2VyXSBBbm90aGVyIElsbC1Gb3JtZWQgRXhhbXBsZQ0K
DQoNCg0KCUhpIENsYXVkZSwNCgkgICAgVGhhbmtzIGZvciB0aGUga2luZCB3b3Jkcy4NCgkNCglC
VFc6IEkgd2FzIGdpdmluZyBzb21lIHRob3VnaHQgdG8gdGhlIGNhbGxzIHRoYXQgdGFrZSBwbGFj
ZSBpbiBIVE1MRW51bWVyYXRpb24uIEFzIGZhciBhcyBJIGNvdWxkIHRlbGwsIG1hbnkgaW50ZXJu
YWwgY2FsbHMgd2VyZSBtYWRlIHR3aWNlLCBieSB2aXJ0dWUgb2YgdGhlIGhhc01vcmVOb2Rlcy9u
ZXh0SFRNTE5vZGUgcGF0dGVybi4gQW4gYWx0ZXJuYXRlIHBhdHRlcm4gaXMgcmVwZWF0ZWQgY2Fs
bHMgdG8gbmV4dEhUTUxOb2RlIHdoaWNoIHNob3VsZCBzdG9wIHdoZW4gYSBudWxsIHJlc3BvbnNl
IGlzIHJldHVybmVkLiBUaGlzIHBhdHRlcm4gaXMgdXNlZCBieSB0aGUgQnVmZmVyZWRSZWFkZXIu
cmVhZExpbmUgbWV0aG9kLCBieSB0aGUgSkRCQyBSZXN1bHRTZXQubmV4dCBtZXRob2QsIGV0Yy4g
QmFzZWQgb24gdGhlIHNpbXBsZSBvYnNlcnZhdGlvbiB0aGF0IGNhbGxzIHRvIGhhc01vcmVOb2Rl
cyBBTkQgbmV4dEhUTUxOb2RlIHJ1biBzb21lIG9mIHRoZSBzYW1lIHVuZGVybHlpbmcgY29kZSwg
aXQgc2VlbXMgdGhhdCB0aGUgc3BlZWQgb2YgdGhlIHBhcnNlciBjb3VsZCBiZSBwb3NpdGl2ZWx5
IGluZmx1ZW5jZWQgYnkgcmVkdWNpbmcgdGhlIGludGVyZmFjZSB0byBhIHNpbmdsZSBjYWxsLiBB
bnkgdGhvdWdodHM/DQoJIA0KCUkgYW0gbm90IHNvIHN1cmUgdGhpcyB3b3VsZCBiZSBhIGdvb2Qg
aWRlYSwgYmVjYXVzZSB0aGVuLCB3ZSdkIGhhdmUgdG8gY29tcHJvbWlzZSBvbiB0aGUgQVBJLiBU
aGVuIHVzZXJzIHdvdWxkIGhhdmUgdG8gYmUgY2hlY2tpbmcgZm9yIG51bGwgdmFsdWVzLSAgdGhl
IGl0ZXJhdG9yIGludGVyZmFjZSBpcyBhbHNvIG9uZSB0aGF0IGlzIHBvcHVsYXIgYW5kIHdlIGhh
dmUgYSBmYW1pbGlhcml0eSBmYWN0b3IgaGVyZS4NCgkgDQoJQXMgZmFyIGFzIG9wdGltaXphdGlv
biBnb2VzLCB0aGUgbmV4dEhUTUxOb2RlIGRvZXNlbnQgZG8gcGFyc2luZywgaXQgc2ltcGx5IHJl
dHVybnMgdGhlIG5vZGUgdGhhdCB3YXMgcGFyc2VkIGludGVybmFsbHkgd2hlbiBoYXNNb3JlTm9k
ZXMoKSB3YXMgY2FsbGVkLiBTbywgdGhlIG9ubHkgc3BlZWQgdXAgd291bGQgYmUgaW4gdGhlIHJl
ZHVjdGlvbiBvZiBhIGNhbGwgLSBJIGFtIG5vdCBzbyBzdXJlIHRoYXQgdGhpcyB3b3VsZCBiZSB0
aGUgYmVzdCBwbGFjZSBmb3Igc3VjaCBhIHNwZWVkdXAuDQoJIA0KCUJ5dHdheSwgdGFsa2luZyBh
Ym91dCBzcGVlZHVwcywgdGhlIGxhc3QgcmVsZWFzZSBhbmQgdGhlIG5leHQgb25lIHNob3VsZCBz
ZWUgc29tZSB0d2Vha3MgLSBhbmQgdGhlIHBlcmZvcm1hbmNlIG91Z2h0IHRvIGhhdmUgZ290dGVu
IGJldHRlci4gQXJlIHlvdSBzdGlsbCBkb2luZyB0aGUgcGVyZm9ybWFuY2UgdGVzdGluZyA/IEFu
eSByZXN1bHRzIHRvIHNoYXJlID8NCgkgDQoJQ2hlZXJzLA0KCVNvbWlrDQoNCg==

Re: [Htmlparser-developer] Re: [Htmlparser-user] Another Ill-Formed Example

From: Somik R. <so...@ya...> - 2002-08-08 07:14:19

MessageHi Claude,
    Thanks for the kind words.
BTW: I was giving some thought to the calls that take place in =
HTMLEnumeration. As far as I could tell, many internal calls were made =
twice, by virtue of the hasMoreNodes/nextHTMLNode pattern. An alternate =
pattern is repeated calls to nextHTMLNode which should stop when a null =
response is returned. This pattern is used by the =
BufferedReader.readLine method, by the JDBC ResultSet.next method, etc. =
Based on the simple observation that calls to hasMoreNodes AND =
nextHTMLNode run some of the same underlying code, it seems that the =
speed of the parser could be positively influenced by reducing the =
interface to a single call. Any thoughts?

I am not so sure this would be a good idea, because then, we'd have to =
compromise on the API. Then users would have to be checking for null =
values-  the iterator interface is also one that is popular and we have =
a familiarity factor here.

As far as optimization goes, the nextHTMLNode doesent do parsing, it =
simply returns the node that was parsed internally when hasMoreNodes() =
was called. So, the only speed up would be in the reduction of a call - =
I am not so sure that this would be the best place for such a speedup.

Bytway, talking about speedups, the last release and the next one should =
see some tweaks - and the performance ought to have gotten better. Are =
you still doing the performance testing ? Any results to share ?

Cheers,
Somik

RE: [Htmlparser-developer] Re: [Htmlparser-user] Another Ill-Formed Example

From: Claude D. <CD...@ar...> - 2002-08-07 15:48:15

You are not only talented but very kind! Thanks.
=20
BTW: I was giving some thought to the calls that take place in
HTMLEnumeration. As far as I could tell, many internal calls were made
twice, by virtue of the hasMoreNodes/nextHTMLNode pattern. An alternate
pattern is repeated calls to nextHTMLNode which should stop when a null
response is returned. This pattern is used by the
BufferedReader.readLine method, by the JDBC ResultSet.next method, etc.
Based on the simple observation that calls to hasMoreNodes AND
nextHTMLNode run some of the same underlying code, it seems that the
speed of the parser could be positively influenced by reducing the
interface to a single call. Any thoughts?
=20
-----Original Message-----
From: Somik Raha [mailto:so...@ya...]=20
Sent: Tuesday, August 06, 2002 9:56 PM
To: htm...@li...
Cc: htm...@li...
Subject: [Htmlparser-developer] Re: [Htmlparser-user] Another Ill-Formed
Example



Hi Claude,
    This has been handled, related to the earlier fix. All potential
infinite loops have been removed, and there will be no more hangings -
only HTMLParserExceptions from now on.
    There will be a release having all these fixes this weekend.
=20
Regards,
Somik

----- Original Message -----=20
From: Claude  <mailto:CD...@ar...> Duguay=20
To: htm...@li...=20
Sent: Wednesday, August 07, 2002 3:35 AM
Subject: [Htmlparser-user] Another Ill-Formed Example


Here's some markup we found in another document that causes the
HTMLParser to hang.

"<TITLE>KRP VALIDATION<PROCESS/TITLE>"

So far, we've had 4 documents cause our process to come to a grinding
halt. I would much prefer a policy of exception throwing to hangs asap,
followed by consideration of whether unusual markup can be handled more
elegantly in a subsequent phase. Thanks to everyone, as always.

=20

[Htmlparser-developer] Re: [Htmlparser-user] Another Ill-Formed Example

From: Somik R. <so...@ya...> - 2002-08-07 05:02:31

MessageHi Claude,
    This has been handled, related to the earlier fix. All potential =
infinite loops have been removed, and there will be no more hangings - =
only HTMLParserExceptions from now on.
    There will be a release having all these fixes this weekend.

Regards,
Somik
  ----- Original Message -----=20
  From: Claude Duguay=20
  To: htm...@li...=20
  Sent: Wednesday, August 07, 2002 3:35 AM
  Subject: [Htmlparser-user] Another Ill-Formed Example


  Here's some markup we found in another document that causes the =
HTMLParser to hang.

  "<TITLE>KRP VALIDATION<PROCESS/TITLE>"

  So far, we've had 4 documents cause our process to come to a grinding =
halt. I would much prefer a policy of exception throwing to hangs asap, =
followed by consideration of whether unusual markup can be handled more =
elegantly in a subsequent phase. Thanks to everyone, as always.

  =20

Re: [Htmlparser-developer] Malformed HTML

From: Somik R. <so...@ya...> - 2002-08-07 04:57:40

MessageHi Claude,
    This bug has been fixed. Bytway  - a request - please enter bug =
reports from the site http://htmlparser.sourceforge.net.

Regards,
Somik
  ----- Original Message -----=20
  From: Claude Duguay=20
  To: htm...@li...=20
  Sent: Tuesday, August 06, 2002 4:02 AM
  Subject: [Htmlparser-developer] Malformed HTML

  If the parser (1.2 integration build) encounters the following code it =
hangs:
  =20
  <html><head><TITLE>
  <html><head><TITLE>
  Double tags can hang the code
  </TITLE></head><body>
  <body><html>
  =20
  I have created this reproducible source document but I am still trying =
to issolate the source of the problem.
  =20
  BTW: The exception handling is excellent this way Somik. There are a =
few conditions that hang the parser which should throw exceptions, but =
the framework is in place to get there now. Thanks.

RE: [Htmlparser-developer] Language Support

From: Claude D. <CD...@ar...> - 2002-08-06 17:02:04

I should mention that it would not (in my view) be a good idea to tie
this project to the org.xml.sax package just for the InputSource object.
I would presume that a new InputSource object with the same semantics
would be created for the HTMLParser project.

-----Original Message-----
From: Claude Duguay=20
Sent: Tuesday, August 06, 2002 9:42 AM
To: htm...@li...
Subject: RE: [Htmlparser-developer] Language Support

I've been reading further on the use of InputSource in XML and recall
your interest in using a similar mechanism. In practice it seems easy
enough to do this and I've provided some same code to illustrate. The
InputSource provides either a Reader, InputStream of System ID (usually
a local file name) and they can be checked for existence, in that order.
=20
I raise this issue (in this context) because one of the reasons the XML
community adopted the InputSource was because it could contain
additional information about he character set encoding (which is not
used here but could be). This becomes much more important if you start
considering internationalization.
=20
Here's some sample code that is compilable (but untested - though it
should work):

import java.io.*;
import org.xml.sax.*;
import com.kizna.html.util.HTMLParserException;
 =20
public class InputSourceReader
  extends BufferedReader
{
  public InputSourceReader(InputSource source)
    throws HTMLParserException
  {
    super(getReaderFromInputSource(source));
  }
 =20
  protected static Reader getReaderFromInputSource(InputSource source)
    throws HTMLParserException
  {
    Reader reader =3D source.getCharacterStream();
    if (reader !=3D null)
    {
      return reader;
    }
=20
    InputStream input =3D source.getByteStream();
    if (input !=3D null)
    {
      return new InputStreamReader(input);
    }
   =20
    String systemId =3D source.getSystemId();
    if (systemId !=3D null)
    {
      try
      {
        return new FileReader(systemId);
      }
      catch (FileNotFoundException e)
      {
        throw new HTMLParserException("Invalid InputSource", e);
      }
    }
    throw new HTMLParserException("Invalid InputSource");
  }
}

-----Original Message-----
From: Somik Raha [mailto:so...@ya...]=20
Sent: Monday, August 05, 2002 7:07 PM
To: htm...@li...
Subject: [Htmlparser-developer] Language Support

Hi Folks,
    Amit Rana is a new developer on HTMLParser. He has considerable
experience in internationalization - and he is currently working to
enable language support and switching. Two languages high on my list are
- French and Finnish, considering we've had French and Finnish
developers on this project. We also want to do Japanese support.
    The architecture that Amit is trying is nice - it will simply
require publishing of a standard English properties file - and for any
language support, a corresponding translated properties file will be
loaded up.
    Amit --> you can probably give a more detailed explanation here.
=20
Regards,
Somik

RE: [Htmlparser-developer] Language Support

From: Claude D. <CD...@ar...> - 2002-08-06 16:41:59

I've been reading further on the use of InputSource in XML and recall
your interest in using a similar mechanism. In practice it seems easy
enough to do this and I've provided some same code to illustrate. The
InputSource provides either a Reader, InputStream of System ID (usually
a local file name) and they can be checked for existence, in that order.
=20
I raise this issue (in this context) because one of the reasons the XML
community adopted the InputSource was because it could contain
additional information about he character set encoding (which is not
used here but could be). This becomes much more important if you start
considering internationalization.
=20
Here's some sample code that is compilable (but untested - though it
should work):

import java.io.*;
import org.xml.sax.*;
import com.kizna.html.util.HTMLParserException;
 =20
public class InputSourceReader
  extends BufferedReader
{
  public InputSourceReader(InputSource source)
    throws HTMLParserException
  {
    super(getReaderFromInputSource(source));
  }
 =20
  protected static Reader getReaderFromInputSource(InputSource source)
    throws HTMLParserException
  {
    Reader reader =3D source.getCharacterStream();
    if (reader !=3D null)
    {
      return reader;
    }
=20
    InputStream input =3D source.getByteStream();
    if (input !=3D null)
    {
      return new InputStreamReader(input);
    }
   =20
    String systemId =3D source.getSystemId();
    if (systemId !=3D null)
    {
      try
      {
        return new FileReader(systemId);
      }
      catch (FileNotFoundException e)
      {
        throw new HTMLParserException("Invalid InputSource", e);
      }
    }
    throw new HTMLParserException("Invalid InputSource");
  }
}


-----Original Message-----
From: Somik Raha [mailto:so...@ya...]=20
Sent: Monday, August 05, 2002 7:07 PM
To: htm...@li...
Subject: [Htmlparser-developer] Language Support


Hi Folks,
    Amit Rana is a new developer on HTMLParser. He has considerable
experience in internationalization - and he is currently working to
enable language support and switching. Two languages high on my list are
- French and Finnish, considering we've had French and Finnish
developers on this project. We also want to do Japanese support.
    The architecture that Amit is trying is nice - it will simply
require publishing of a standard English properties file - and for any
language support, a corresponding translated properties file will be
loaded up.
    Amit --> you can probably give a more detailed explanation here.
=20
Regards,
Somik

RE: [Htmlparser-developer] Update (Claude - ur feedback needed)

From: Claude D. <CD...@ar...> - 2002-08-06 16:19:06

My own expectations are fairly simple.
=20
1) If the page is unparsable because it is ill-formed, the parser should
throw an exception. This is a priority behavior in that it is better for
the parser to report problems than it is for it to hang because the
internal logic to handle ill-formed documents has gotten too complicated
or unpredictable.
=20
2) If it is possible for the parser to handle certain types of
ill-formed documents, this should be considered a desirable feature, but
never at the expense of handling properly formed documents or notifiying
the library user that something went wrong if it couldn't.
=20
It may be best to consider these separate issues. Since item 1 is
imperative and item 2 is a feature, you may want to consider making item
2 a selectable feature. That is to say, there may be a need to have a
'strict' mode that never handles ill-formed documents (which has plenty
of value in and of itself, given that some folks actually want to
recognize bad HTML), and another 'liberal' mode, that does it''s best to
compensate for flaws in the document.
=20
The problem with compensating for ill-formed documents will always be
that to handle it one way may interfere with an alternate
interpretation, which in some cases may also be correct. In cases where
there is not alternate interpretation, the solution is simple. I cases
where an alternate interpretation is possible, the code is inevitably
wrong to someone who wanted to see the alternate behavior. It's probably
best, then, to further separate the compensation criteria to handle ONLY
those cases where the interpretation is unambiguous.
=20
-----Original Message-----
From: Somik Raha [mailto:so...@ya...]=20
Sent: Tuesday, August 06, 2002 12:11 AM
To: htm...@li...
Subject: Re: [Htmlparser-developer] Update (Claude - ur feedback needed)



Hi Kaarle,
    It seems like we may have acted hastily in correcting this (even in
HTMLImageScanner). I just tried Claude's page again, and I find that the
image is not parsed. Amit also mentioned sometime back that we ought to
flag some kind of error.=20
    Of course IE does not collapse- it continues parsing.=20
    So - I think you should not put in this fix to parseParameters(). I
should also rollback my fix and throw an error (?) - or probably throw a
bad image tag, where you cannot retrieve the data.
    OTOH - the other side of the coin is - if someday people decide to
kick IE out, and write a new browser with this parser, such pages would
work fine. In which case, it would be good to have fixes like this.
=20
    I find myself tilting to the former argument, however attractive the
latter may sound. Amit, Claude--> what are your comments ?
    Claude - as this bug was reported by you - I'd like to ask what do
you expect ?
=20
Regards,
Somik
=20
=20

----- Original Message -----=20
From: Kaarle Kaila <mailto:kaa...@kk...> =20
To: so...@ya... ; htm...@li...=20
Sent: Tuesday, August 06, 2002 4:07 PM
Subject: Re: [Htmlparser-developer] Update

I still had a look at the code and made a small addition
that would accept <a b"c"> as <a b=3D"c">
Would it be usefull to have it inserted into CVS?
or is it OK as it is?

regards
Kaarle

PS! I can't access CVS until the evening=20


---- Original Message ----
From: so...@ya...
To: htm...@li...
Subject: Re: [Htmlparser-developer] Update
Date: Tue, 6 Aug 2002 15:42:29 +0900

>Hi Kaarle,
>    Thanks for the clarification.
>
>Regards,
    >Somik
>
>  >I did not really do that I think. I just made a testcase that=20
>seems=20
>  >to verify=20
>  >that <a b"c"> will be assume to be <a b>   , same as <a b=3D"">
>  >
>  >Oh - then what happens to c, is it ignored?=20
>  >
>
>  Yes! That's what seems to happen. As I said I only added a testcase
>  to verify what happens. I did not change the code for this purpose.
>
>  regards
>  Kaarle
>
>
>
>  >Cheers,
>  >Somik
>  >
>  -----------------------------
>  Kaarle Kaila
>  http://www.iki.fi/kaila
>  mailto:kaa...@ik...
>
>
>
>  -------------------------------------------------------
>  This sf.net email is sponsored by:ThinkGeek
>  Welcome to geek heaven.
>  http://thinkgeek.com/sf
>  _______________________________________________
>  Htmlparser-developer mailing list
>  Htm...@li...
>  https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>
-----------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...

RE: [Htmlparser-developer] HTMLParserFeedback

From: Claude D. <CD...@ar...> - 2002-08-06 16:05:56

Great! Check out the Log class in the JavaDocs for the Apache Commons
project:=20

 <http://jakarta.apache.org/commons/logging.html>
http://jakarta.apache.org/commons/logging.html

It's intented to provide an abstraction that maps onto various logging
libraries (Log4J, JDK14 logging, etc). The API for the Log class looks
similar to the one I proposed. The main distinctions are that they've
used Object types for the message (I'd presume they count on the
toString method for logging) and they have more methods. I think there's
room for adding methods in the Feedback API, but I'd be inclided to do
it on an as-needed basis.

-----Original Message-----
From: Somik Raha [mailto:so...@ya...]=20
Sent: Monday, August 05, 2002 7:04 PM
To: htm...@li...
Subject: Re: [Htmlparser-developer] HTMLParserFeedback



Hi Claude,
    No no, I wasnt planning to use log4j for the parser :)
    Just mentioning that the model is so similar. J2SDK 1.4.x of course
has the same logging stuff in their APIs.
    I agree with your reasoning - we'll start putting in the feedback
classes down the line. Let me see if I can find some time in the weekend
to analyze this. If anyone else wants to try this integration - pls feel
free.
=20
Regards,
Somik

----- Original Message -----=20
From: Claude  <mailto:CD...@ar...> Duguay=20
To: htm...@li...=20
Sent: Monday, August 05, 2002 1:04 PM
Subject: RE: [Htmlparser-developer] HTMLParserFeedback

Please don't introduce any dependencies on other libraries. The Feedback
model is intended to allow users to redirect output to wherever they see
fit for their application. The default sends output to the console but
it's easy for implementers to make more local decisions based on their
context, by replacing the default implementation, so long as the
interface is valid. The whole idea of a library/framework is that the
input/output is controllable by the developer using it. You don't want
any coupling to other libraries. Let developers decide what's suitable
for their application. It's similar to the ErrorHandler in SAX, though
in their case, the output goes nowhere by default. It's up to users to
decide what to do.
=20
You'll notice that the Feedback classes introduce a model that library
developers can use to direct output to a place that won't interfere with
the library user/developer's notion of where things could go. I've been
meaning to write something more specific about this design pattern but
things just keep getting in the way. In any case, use the Feedback
mechanism as a way of allowing users to decide where the output should
go or whether it should be ignored. Consider it a replacement for
System.out and System.err. Users can later decide whether the output
(which falls into simple categories) should be logged, send to the
console, written to a GUI, rerouted to sockets, filtered by pipelines or
simply ignored. The beauty of this design is all in the uncoupling, ushc
that the library user decides what's relevant in their application.
=20
-----Original Message-----=20
From: Somik Raha [mailto:so...@ya...]=20
Sent: Sun 8/4/2002 12:34 AM=20
To: htm...@li...=20
Cc:=20
Subject: [Htmlparser-developer] HTMLParserFeedback



Hi Developers,
    This is to initiate a discussion on the next step, on integration
feedback into the parser. Claude had submitted HTMLParserFeedback
interface (in the util package) - which allow us to log the activity of
the parser, inform when errors occur, and show warnings.=20

    I am familiar with log4j, and this sounds pretty similar - in terms
of functionality, it sounds good. But in terms of performance, my
question is :
[1] Will this result in an unacceptable performance hit ?
[2] Should we provide alternate constructors or modify existing API ? If
we provide alternates, then what default behaviour would be best ? Are
we talking about default callback objects - if yes, the strings created
for each call would slow down the parser.

    It would be great to have some thoughts on this.

Regards,
Somik

[Htmlparser-developer] Re: [Htmlparser-user] Parsing query

From: Somik R. <so...@ya...> - 2002-08-06 07:23:36

Hi Dhaval,

I woudl like to know how "checked" would be reflected in the HTMLTag
during the parsing procedure.

This is what "should" happen - the tag will be treated the same as=20
<INPUT type=3D"checkbox" name=3D"Authorize" value=3D"Y" checked=3D"">

However, this is not what actually happens - I've written a testcase to =
demonstrate this, and we should be fixing it soon.

Kaarle - I've opened a bug report, can you check this ?
Thanks a lot.

Regards,
Somik   =20
  ----- Original Message -----=20
  From: dha...@or...=20
  To: htm...@li...=20
  Sent: Tuesday, August 06, 2002 4:04 PM
  Subject: RE: [Htmlparser-user] Parsing query


  Hi,

  I have a small doubt. For a checkbox or a radio button the following
  kind of tag is very normal.

  <INPUT type=3D"checkbox" name=3D"Authorize" value=3D"Y" checked>

  I woudl like to know how "checked" would be reflected in the HTMLTag
  during the parsing procedure.

  Thanx in advance,
  Dhaval

Re: [Htmlparser-developer] Update (Claude - ur feedback needed)

From: Somik R. <so...@ya...> - 2002-08-06 07:17:34

Hi Kaarle,
    It seems like we may have acted hastily in correcting this (even in =
HTMLImageScanner). I just tried Claude's page again, and I find that the =
image is not parsed. Amit also mentioned sometime back that we ought to =
flag some kind of error.=20
    Of course IE does not collapse- it continues parsing.=20
    So - I think you should not put in this fix to parseParameters(). I =
should also rollback my fix and throw an error (?) - or probably throw a =
bad image tag, where you cannot retrieve the data.
    OTOH - the other side of the coin is - if someday people decide to =
kick IE out, and write a new browser with this parser, such pages would =
work fine. In which case, it would be good to have fixes like this.

    I find myself tilting to the former argument, however attractive the =
latter may sound. Amit, Claude--> what are your comments ?
    Claude - as this bug was reported by you - I'd like to ask what do =
you expect ?

Regards,
Somik


  ----- Original Message -----=20
  From: Kaarle Kaila=20
  To: so...@ya... ; htm...@li...=20
  Sent: Tuesday, August 06, 2002 4:07 PM
  Subject: Re: [Htmlparser-developer] Update


  I still had a look at the code and made a small addition
  that would accept <a b"c"> as <a b=3D"c">
  Would it be usefull to have it inserted into CVS?
  or is it OK as it is?

  regards
  Kaarle

  PS! I can't access CVS until the evening=20


  ---- Original Message ----
  From: so...@ya...
  To: htm...@li...
  Subject: Re: [Htmlparser-developer] Update
  Date: Tue, 6 Aug 2002 15:42:29 +0900

  >Hi Kaarle,
  >    Thanks for the clarification.
  >
  >Regards,
      >Somik
  >
  >  >I did not really do that I think. I just made a testcase that=20
  >seems=20
  >  >to verify=20
  >  >that <a b"c"> will be assume to be <a b>   , same as <a b=3D"">
  >  >
  >  >Oh - then what happens to c, is it ignored?=20
  >  >
  >
  >  Yes! That's what seems to happen. As I said I only added a testcase
  >  to verify what happens. I did not change the code for this purpose.
  >
  >  regards
  >  Kaarle
  >
  >
  >
  >  >Cheers,
  >  >Somik
  >  >
  >  -----------------------------
  >  Kaarle Kaila
  >  http://www.iki.fi/kaila
  >  mailto:kaa...@ik...
  >
  >
  >
  >  -------------------------------------------------------
  >  This sf.net email is sponsored by:ThinkGeek
  >  Welcome to geek heaven.
  >  http://thinkgeek.com/sf
  >  _______________________________________________
  >  Htmlparser-developer mailing list
  >  Htm...@li...
  >  https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
  >
  -----------------------------
  Kaarle Kaila
  http://www.iki.fi/kaila
  mailto:kaa...@ik...

14 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 25 26 27 28 29 .. 33 > >> (Page 27 of 33)