Hi,
Derrick has opened a lovely thread of discussion and I would like to add some
of my own thoughts.
Currently the parser does not store any tabs or newlines that may be present on
the HTML page. However if one wants to parse the page and reproduce it, it is
imperative that the formatting remains the same i.e. the look and feel of the
parsed page and the unparsed page do not have any difference(obviously unless
added during the parsing routine).
I think it is worthwhile giving a thought to this. I may be very selfish in
suggesting it since my usage requires a production of the HTML page after
parsing it and adding some information depending on the tags.
Regards,
Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-28290019 Extn. 1457
-----Original Message-----
From: DerrickOswald [mailto:Der...@ro...]
Sent: Monday, December 16, 2002 7:02 PM
To: htmlparser-developer
Cc: DerrickOswald
Subject: [Htmlparser-developer] version 1.3
This message is just to open discussion.
Here are some enhancements that might best be left till the next version.
POST constructor.
The basically two constructors that HTMLParser has either take a string
URL or a HTMLReader. This shifts the onus on performing HTTP to the API
user for POST operations. It might be good to have a HttpURLConnection
or URLConnection argument constructor, where a primed and loaded
connection is passed to the parser.
Tables
The current version flattens tables, pushing the onus on the API user to
syntactically walk through the table data to get to a certain table
entry. It may be useful to nest table entries, similar to what the the
FORM tag does now, but have it correctly generate rows and columns.
Logging
The use of a feedback object is adequate, but JDK version 1.4 has a rich
API, java.util.logging, that we might want to emulate (presuming we
don't want to force JDK 1.4 usage).
charset
Currently the charset directive within the HTML page is ignored. There
may be a need to honour this parameter on the Content-Type field.
beans
It might be nice to create one or more java beans that can be used
within GUI IDE's. The predefined behavior might be what the
parserapplications do now, but exposing some accessors on HTMLParser and
providing a zero arg constructor may also prove useful.
executable jar
There is no default application for the htmlparser.jar, i.e. java -jar
htmlparser.jar doesn't do anything at the moment. A little GUI
application might be nice. I'm not talking a browser, but rather a demo
of the applications (i.e. a tree view of the links a la robot, a text
view a la StringExtractor, a list of mail addresses a la ripper etc. ).
This would utilize the beans mentioned above.
-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
|