RE: [Htmlparser-user] Change in Layout

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,
=A0
I would definitely appreciate converting the hard-coded end-of-line
character with a detected end-of-line character from the system
property. Currently I read the entire file and replace the hard-coded
EOL with the system property EOL.
=A0
I think the last EOL for toHTML() should be removed and instead all "\n"
should be also parsed and reproduced exactly in the same way. Preserving
layout shoudl be as important as performance. Also my feeling is that
this tool will be used mostly by developers during development time and
not at runtime(though it is always possible) and hence performance may
not be an issue here.
=A0
Please feel free to criticize my opinion.
=A0
Typically my predicament is as follows :
=A0
My team is=A0building a framework which is used by many projects in my
organization. All the other projects create HTML with their own
look-and-feel. To use the framework, they need to convert these files
into a JSP(using a tool developed by my team). The tool apart from jsut
changing the extension ;) also adds lots of JSP code and makes certain
modifications to the HTML tags(not the presentation tags though). After
the JSP is created if the layout changes, they will ahve to again spend
time correcting this anomaly and will need to keep doing it everytime
they change their HTML page or the tool is updated. Now I guess you can
understand why I feel so strongly about maintaining layout.
=A0
At the same time I am aware that the parser is here for everyone's need
and will be driven accordingly. Hence am just presenting my point of
view.

Regards,=20

Dhaval Udani=20
Senior Analyst=20
M-Line, QPEG=20
OrbiTech Solutions Ltd.=20
+91-22-8290019 Extn. 1457=20

=A0

   -----Original Message-----
   From: somik [mailto:so...@ya...]
   Sent: Thursday, August 08, 2002 12:33 PM
   To: htmlparser-user
   Cc: somik
   Subject: Re: [Htmlparser-user] Change in Layout
  =20
  =20

  =20
   Hi Dhaval,
   =A0=A0=A0 This is actually a feature. If we try to give the exact same
   output as originally parsed, the performance of the parser could be
   compromised. Hence, giving=A0a corresponding output with slightly
   different formatting was chosen - in order to keep the design of the
   parser simple.
   =A0=A0=A0 However, related to this is an interesing issue - for which
   community feedback would be valuable. Currently, the formatting of
   toHTML() is rather arbitrary (in my opinion). By this I am
   particularly referring to the usage of end of line characters.
   Considering that=A0end of line characters differ=A0for each operating
   system - would it be a good idea to replace the hard-coded end of
   line characters with a the detected end of line char for a particular
   OS ?
   =A0
   Regards,
   Somik

      ----- Original Message -----=20
      From: dha...@or...=20
      To: htm...@li...=20
      Sent: Thursday, August 08, 2002 3:52 PM
      Subject: [Htmlparser-user] Change in Layout

      Hi,
     =20
      I have an HTML page which I am rying to modify. During this
      process, I
      have come across a quirk. I don't know whether the problem is
      browser
      related or parser related.
     =20
      The following HTML code :
      <TD align=3D"left" valign=3D"top" width=3D"18"><img
      src=3D"images/right_h1.gif"
      width=3D"18" height=3D"22"></TD>
     =20
      gets converted to
      <TD align=3D"left" valign=3D"top" width=3D"18">
      <img src=3D"images/right_h1.gif" width=3D"18" height=3D"22">
      </TD>
     =20
      This happens whenever I print back the parsed data using
      tag.toHTML().
     =20
      These 2 seem to be the same but presentation-wise I see different
      outputs. Is it write on part of tag.toHTML() to printout the EOL
      character at the end of the tag.
     =20
      Regards,=20
     =20
      Dhaval Udani=20
      Senior Analyst=20
      M-Line, QPEG=20
      OrbiTech Solutions Ltd.=20
      +91-22-8290019 Extn. 1457=20
     =20
     =20
     =20
      =A0=A0 -----Original Message-----
      =A0=A0 From: somik [ mailto:so...@ya...]
      =A0=A0 Sent: Wednesday, August 07, 2002 10:26 AM
      =A0=A0 To: htmlparser-user
      =A0=A0 Cc: somik; htmlparser-developer
      =A0=A0 Subject: Re: [Htmlparser-user] Another Ill-Formed Example
      =A0=A0=20
      =A0=A0=20
     =20
      =A0=A0=20
      =A0=A0 Hi Claude,
      =A0=A0 This has been handled, related to the earlier fix. All
      potential
      =A0=A0 infinite loops have been removed, and there will be no more
      hangings
      =A0=A0 - only HTMLParserExceptions from now on.
      =A0=A0 There will be a release having all these fixes this weekend.
      =A0=A0=20
      =A0=A0 Regards,
      =A0=A0 Somik
     =20
      =A0=A0=A0=A0=A0 ----- Original Message -----=20
      =A0=A0=A0=A0=A0 From: Claude Duguay=20
      =A0=A0=A0=A0=A0 To: htm...@li...=20
      =A0=A0=A0=A0=A0 Sent: Wednesday, August 07, 2002 3:35 AM
      =A0=A0=A0=A0=A0 Subject: [Htmlparser-user] Another Ill-Formed Examp=
le
     =20
     =20
      =A0=A0=A0=A0=A0 Here's some markup we found in another document tha=
t causes
      the
      =A0=A0=A0=A0=A0 HTMLParser to hang.
     =20
      =A0=A0=A0=A0=A0 "<TITLE>KRP VALIDATION<PROCESS/TITLE>"
     =20
      =A0=A0=A0=A0=A0 So far, we've had 4 documents cause our process to =
come to a
      =A0=A0=A0=A0=A0 grinding halt. I would much prefer a policy of exce=
ption
      throwing
      =A0=A0=A0=A0=A0 to hangs asap, followed by consideration of whether=
 unusual
      markup
      =A0=A0=A0=A0=A0 can be handled more elegantly in a subsequent phase=
. Thanks
      to
      =A0=A0=A0=A0=A0 everyone, as always.
     =20
      =A0=A0=A0=A0=A0=20
     =20
      =A0=A0=20
     =20
     =20
     =20

  =20