Re: [Htmlparser-developer] configuration items

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Oops, I should have said that the parser_core.jar outputs a stream of 
undifferentiated *nodes*

Derrick Oswald wrote:

>
> Since it's a library incorporated within other applications, size is 
> always an issue.
> There are two aspects though, disk footprint (jar size) and memory usage.
> Usually, there is a speed/memory usage trade-off to be made, which is 
> only sometimes reflected in the disk footprint size.
> With current desktop hardware, people usually trade off memory for speed.
> It's only with embedded or mobile applications you concentrate on disk 
> size and memory consumption.
>
> Regarding your picture, the layers won't necessarily follow the 
> current package structure.
> For example, logging is integral to the core parser to report 
> problems, and the beans layer removes all HTML tags so it can't be 
> used by upper layers. In order to decide the breakdown in layers, a 
> poll of users regarding typical use-cases might be in order.
>
> Lets say there are two major groupings:
>
> 1) extraction of all or part of the information on a page to be 
> consumed by another application.
> 2) rewriting URLs, content, specific tags, clean-up, reformatting or 
> pretty printing HTML text
>
> This would suggest three configuration items (jars):
>
> parser_applications.jar - Sample applications, GUI tools, beans, tests
> parser_edit.jar - Rewriting tools, DOM type heirarchical editing, 
> visitors, smart tags
> parser_core.jar - Read-only core parser, stream of undifferentiated tags
>
> If a programs parser usage involves extraction, it need only use the 
> parser_core.jar and pass through the data in a stream-like fashion. 
> But if rewriting is in order, they use both parser_core.jar and 
> parser_edit.jar and the parser presents the full HTML document as a 
> heirarchy of tag specific nodes. All else goes into parser_applications.
>
> We could probably get parser_core.jar below 25KB, or in that range.
>
> Derrick
>
> Somik Raha wrote:
> <snip>
>
>> [1] I find the parser's differentiating factor is its size - time and 
>> time again the feedback I've received is that folks love its being 
>> below 100K. Size almost directly maps on to simplicity. And that 
>> impacts the other important area - performance.
>>  
>> [2] I hate to pay for what I don't need - when folks get tons of 
>> stuff that they don't need, they are paying for the needs of a few.
>>  
>> At the same time, I think it is a challenge to be able to accomodate 
>> new requests and still keep the parser light. I see a natural layer 
>> forming:
>>  
>>
>>   ,----------------------------------------.
>>   |        Sample Applications, GUI        |
>>   |   ,'''''''''''''''''''''''''''''''`.   |
>>   |   |        Logging Mechanism       |   |
>>   |   |  ,''''''''''''''''''''''''''|  |   |
>>   |   |  |        Beans             |  |   |
>>   |   |  |  +--------------------b  |  |   |
>>   |   |  |  |    Scanners        |  |  |   |
>>   |   |  |  | ,---------------Y  |  |  |   |
>>   |   |  |  | |  Core Parser  |  |  |  |   |
>>   |   |  |  | `.............../  |  |  |   |
>>   |   |  |  L____________________|  |  |   |
>>   |   |  |                          |  |   |
>>   |   |  '`''''''''''''''''''''''''''  |   |
>>   |   |     default, log4j, jdk1.4     |   |
>>   |   `................................/   |
>>   |________________________________________|
>>  
>> If we can perform this seperation in the design and the packaging, it 
>> might allow people to choose what they need. We don't have to follow 
>> the "one size fits all" policy.
>>  
>> What are your thoughts? I am not sure how we'd achieve this 
>> seperation or whether it really makes sense - so please jump in with 
>> your two cents..
>>  
>> Regards,
>> Somik
>>  
>
>
> <snip>
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Htmlparser-developer mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer
>