RE: [Htmlparser-developer] version 1.3

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Derrick has opened a lovely thread of discussion and I would like to add some 
of my own thoughts.

Currently the parser does not store any tabs or newlines that may be present on 
the HTML page. However if one wants to parse the page and reproduce it, it is 
imperative that the formatting remains the same i.e. the look and feel of the 
parsed page and the unparsed page do not have any difference(obviously unless 
added during the parsing routine). 

I think it is worthwhile giving a thought to this. I may be very selfish in 
suggesting it since my usage requires a production of the HTML page after 
parsing it and adding some information depending on the tags.

Regards,

Dhaval Udani
Senior Analyst
M-Line, QPEG
OrbiTech Solutions Ltd.
+91-22-28290019 Extn. 1457

-----Original Message-----
From: DerrickOswald [mailto:Der...@ro...]
Sent: Monday, December 16, 2002 7:02 PM
To: htmlparser-developer
Cc: DerrickOswald
Subject: [Htmlparser-developer] version 1.3

This message is just to open discussion.
Here are some enhancements that might best be left till the next version.

POST constructor.
The basically two constructors that HTMLParser has either take a string 
URL or a HTMLReader.  This shifts the onus on performing HTTP to the API 
user for POST operations.  It might be good to have a HttpURLConnection 
or URLConnection argument constructor, where a primed and loaded 
connection is passed to the parser.

Tables
The current version flattens tables, pushing the onus on the API user to 
syntactically walk through the table data to get to a certain table 
entry.  It may be useful to nest table entries, similar to what the the 
FORM tag does now, but have it correctly generate rows and columns.

Logging
The use of a feedback object is adequate, but JDK version 1.4 has a rich 
API, java.util.logging, that we might want to emulate (presuming we 
don't want to force JDK 1.4 usage).

charset
Currently the charset directive within the HTML page is ignored. There 
may be a need to honour this parameter on the Content-Type field.

beans
It might be nice to create one or more java beans that can be used 
within GUI IDE's. The predefined behavior might be what the 
parserapplications do now, but exposing some accessors on HTMLParser and 
providing a zero arg constructor may also prove useful.  

executable jar
There is no default application for the htmlparser.jar, i.e. java -jar 
htmlparser.jar    doesn't do anything at the moment. A little GUI 
application might be nice. I'm not talking a browser, but rather a demo 
of the applications (i.e. a tree view of the links a la robot, a text 
view a la StringExtractor, a list of mail addresses a la ripper etc. ). 
This would utilize the beans mentioned above.

-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility 
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer