RE: [Htmlparser-developer] version 1.3
Brought to you by:
derrickoswald
From: <dha...@or...> - 2002-12-16 13:35:22
|
Hi, Derrick has opened a lovely thread of discussion and I would like to add some of my own thoughts. Currently the parser does not store any tabs or newlines that may be present on the HTML page. However if one wants to parse the page and reproduce it, it is imperative that the formatting remains the same i.e. the look and feel of the parsed page and the unparsed page do not have any difference(obviously unless added during the parsing routine). I think it is worthwhile giving a thought to this. I may be very selfish in suggesting it since my usage requires a production of the HTML page after parsing it and adding some information depending on the tags. Regards, Dhaval Udani Senior Analyst M-Line, QPEG OrbiTech Solutions Ltd. +91-22-28290019 Extn. 1457 -----Original Message----- From: DerrickOswald [mailto:Der...@ro...] Sent: Monday, December 16, 2002 7:02 PM To: htmlparser-developer Cc: DerrickOswald Subject: [Htmlparser-developer] version 1.3 This message is just to open discussion. Here are some enhancements that might best be left till the next version. POST constructor. The basically two constructors that HTMLParser has either take a string URL or a HTMLReader. This shifts the onus on performing HTTP to the API user for POST operations. It might be good to have a HttpURLConnection or URLConnection argument constructor, where a primed and loaded connection is passed to the parser. Tables The current version flattens tables, pushing the onus on the API user to syntactically walk through the table data to get to a certain table entry. It may be useful to nest table entries, similar to what the the FORM tag does now, but have it correctly generate rows and columns. Logging The use of a feedback object is adequate, but JDK version 1.4 has a rich API, java.util.logging, that we might want to emulate (presuming we don't want to force JDK 1.4 usage). charset Currently the charset directive within the HTML page is ignored. There may be a need to honour this parameter on the Content-Type field. beans It might be nice to create one or more java beans that can be used within GUI IDE's. The predefined behavior might be what the parserapplications do now, but exposing some accessors on HTMLParser and providing a zero arg constructor may also prove useful. executable jar There is no default application for the htmlparser.jar, i.e. java -jar htmlparser.jar doesn't do anything at the moment. A little GUI application might be nice. I'm not talking a browser, but rather a demo of the applications (i.e. a tree view of the links a la robot, a text view a la StringExtractor, a list of mail addresses a la ripper etc. ). This would utilize the beans mentioned above. ------------------------------------------------------- This sf.net email is sponsored by: With Great Power, Comes Great Responsibility Learn to use your power at OSDN's High Performance Computing Channel http://hpc.devchannel.org/ _______________________________________________ Htmlparser-developer mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-developer |