I agree with Derrick. Usage should decide the different library jars and
Derrick a poll would definitely be in order. The groupings you have
suggested are quite appropriate.
I would also like to suggest incrementing libraries i.e. entire
parser_core.jar be incorporated in parser_edit.jar and similarly for
parser_applications.jar. Let not the developer keep 3 things in
CLASSPATH!!!
Dhaval
-----Original Message-----
From: DerrickOswald [mailto:Der...@ro...]
Sent: Saturday, May 03, 2003 5:47 PM
To: htmlparser-developer
Cc: DerrickOswald
Subject: [Htmlparser-developer] configuration items
Since it's a library incorporated within other applications, size is
always an issue.
There are two aspects though, disk footprint (jar size) and memory
usage.
Usually, there is a speed/memory usage trade-off to be made, which is
only sometimes reflected in the disk footprint size.
With current desktop hardware, people usually trade off memory for
speed.
It's only with embedded or mobile applications you concentrate on disk
size and memory consumption.
Regarding your picture, the layers won't necessarily follow the current
package structure.
For example, logging is integral to the core parser to report problems,
and the beans layer removes all HTML tags so it can't be used by upper
layers. In order to decide the breakdown in layers, a poll of users
regarding typical use-cases might be in order.
Lets say there are two major groupings:
1) extraction of all or part of the information on a page to be consumed
by another application.
2) rewriting URLs, content, specific tags, clean-up, reformatting or
pretty printing HTML text
This would suggest three configuration items (jars):
parser_applications.jar - Sample applications, GUI tools, beans, tests
parser_edit.jar - Rewriting tools, DOM type heirarchical editing,
visitors, smart tags
parser_core.jar - Read-only core parser, stream of undifferentiated tags
If a programs parser usage involves extraction, it need only use the
parser_core.jar and pass through the data in a stream-like fashion. But
if rewriting is in order, they use both parser_core.jar and
parser_edit.jar and the parser presents the full HTML document as a
heirarchy of tag specific nodes. All else goes into parser_applications.
We could probably get parser_core.jar below 25KB, or in that range.
Derrick
Somik Raha wrote:
<snip>
> [1] I find the parser's differentiating factor is its size - time and
> time again the feedback I've received is that folks love its being
> below 100K. Size almost directly maps on to simplicity. And that
> impacts the other important area - performance.
>
> [2] I hate to pay for what I don't need - when folks get tons of stuff
> that they don't need, they are paying for the needs of a few.
>
> At the same time, I think it is a challenge to be able to accomodate
> new requests and still keep the parser light. I see a natural layer
> forming:
>
>
> ,----------------------------------------.
> | Sample Applications, GUI |
> | ,'''''''''''''''''''''''''''''''`. |
> | | Logging Mechanism | |
> | | ,''''''''''''''''''''''''''| | |
> | | | Beans | | |
> | | | +--------------------b | | |
> | | | | Scanners | | | |
> | | | | ,---------------Y | | | |
> | | | | | Core Parser | | | | |
> | | | | `.............../ | | | |
> | | | L____________________| | | |
> | | | | | |
> | | '`'''''''''''''''''''''''''' | |
> | | default, log4j, jdk1.4 | |
> | `................................/ |
> |________________________________________|
>
> If we can perform this seperation in the design and the packaging, it
> might allow people to choose what they need. We don't have to follow
> the "one size fits all" policy.
>
> What are your thoughts? I am not sure how we'd achieve this seperation
> or whether it really makes sense - so please jump in with your two
cents..
>
> Regards,
> Somik
>
<snip>
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Htmlparser-developer mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
|