Thread: RE: [Htmlparser-user] Quick beginner question...
Brought to you by:
derrickoswald
From: <dha...@or...> - 2002-09-05 05:18:26
Attachments:
BDY.RTF
|
Hi Joe, In the HTMLParser, all the tags are printed on new lines. So the output that you are gettign is expected output. We ahve discussed this problem on the list. can probably check it out fromt he archives. But more importantly, the HTMLParser is changing the HTML file given to it which it must not do (whethrr it is a browser bug or not should be immaterial to the parser). i.e. input and output of parser must match specially in the presentation aspects. I too think something should be done about it. What do u say Somik? Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 -----Original Message----- From: jryburn [mailto:jr...@ya...] Sent: Thursday, September 05, 2002 12:48 AM To: htmlparser-user Cc: jryburn Subject: [Htmlparser-user] Quick beginner question... I'm not sure if this is a browser bug or a parser bug, but the following code... <TABLE width="100%" cellspacing="0" cellpadding="1" border="0"> <TR> <TD valign="top"> <FONT face="arial" size="-1"><B>•</B> </FONT> </TD> <TD> <A href="s/15341"><FONT face="arial" size="-1">Bush vows to seek Congress' OK on Iraq</FONT></A> </TD> </TR> </TABLE> when parsed by the html parser and rewritten, is output with the first <TD> element broken up as follows... <FONT face="arial" size="-1"> <B> • </B> </FONT> This renders differently than when they are joined. I didn't think whitespace was supposed to affect presentation but here it seems to be significant, in both Internet Explorer and Mozilla. This is from the news headline table on the right of the main 'www.yahoo.com' page. Is there a way to output this to render correctly using HTMLParser? Regards, Joe Ryburn ------------------------------------------------------- This sf.net email is sponsored by: OSDN - Tired of that same old cell phone? Get a new here for FREE! https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: <dha...@or...> - 2002-09-06 05:00:48
Attachments:
BDY.RTF
|
Yeah thats the point. However not only new lines but any spaces, any tabs etc. A well-formed HTML document if given to the parser must result in the same at theoutput of the parser. =A0 Regards,=20 Dhaval Udani=20 Senior Analyst=20 M-Line, QPEG=20 OrbiTech Solutions Ltd.=20 +91-22-8290019 Extn. 1457=20 =A0 -----Original Message----- From: jryburn [mailto:jr...@ya...] Sent: Thursday, September 05, 2002 7:14 PM To: htmlparser-user Cc: jryburn Subject: RE: [Htmlparser-user] Quick beginner question... =20 =20 =20 I'd suggest that we also parse newlines on the input, and perhaps store them as a tag as well. Then we can walk through the parsed code and print all other=A0tags without newlines, and the newline tag would then be printed to output as a newline.=20 =A0 =20 =A0 Joe Ryburn Technical Director Lead Router LLC Office=A0 501-221-8865 Mobile 501-249-5015 =A0 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of dha...@or... Sent: Thursday, September 05, 2002 12:16 AM To: htm...@li... Subject: RE: [Htmlparser-user] Quick beginner question... =20 =20 =20 |
From: Somik R. <so...@ya...> - 2002-09-06 09:49:19
|
Hi Folks, I am on the road now, in Singapore.. I will be travelling the next 2 weeks, so I cannot be regular with mails till then. Dhaval and Joe --> Indeed, I understand the issue, that sometimes things get messed up. But storing all newlines would make the parser a whole lot more complex than it is right now. I am however open to examining it - when I find some time (from October). This does not stop anyone from doing an anlysis and see what can be done. That would be a good contribution for the community. So pls go ahead- and let us know what you think. Cheers, Somik ----- Original Message ----- From: <dha...@or...> To: <htm...@li...> Sent: Friday, September 06, 2002 1:00 PM Subject: RE: [Htmlparser-user] Quick beginner question... Yeah thats the point. However not only new lines but any spaces, any tabs etc. A well-formed HTML document if given to the parser must result in the same at theoutput of the parser. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-8290019 Extn. 1457 -----Original Message----- From: jryburn [mailto:jr...@ya...] Sent: Thursday, September 05, 2002 7:14 PM To: htmlparser-user Cc: jryburn Subject: RE: [Htmlparser-user] Quick beginner question... I'd suggest that we also parse newlines on the input, and perhaps store them as a tag as well. Then we can walk through the parsed code and print all other tags without newlines, and the newline tag would then be printed to output as a newline. Joe Ryburn Technical Director Lead Router LLC Office 501-221-8865 Mobile 501-249-5015 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of dha...@or... Sent: Thursday, September 05, 2002 12:16 AM To: htm...@li... Subject: RE: [Htmlparser-user] Quick beginner question... |
From: Joe R. <jr...@ya...> - 2002-09-05 13:43:42
|
I'd suggest that we also parse newlines on the input, and perhaps store them as a tag as well. Then we can walk through the parsed code and print all other tags without newlines, and the newline tag would then be printed to output as a newline. Joe Ryburn Technical Director Lead Router LLC Office 501-221-8865 Mobile 501-249-5015 -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of dha...@or... Sent: Thursday, September 05, 2002 12:16 AM To: htm...@li... Subject: RE: [Htmlparser-user] Quick beginner question... |