Re: [Htmlparser-user] How to extract more than one tag by only once parsering?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Jesse,

 From your example, you can also get all the div tags at once and filter 
on class in a secondary pass:

NodeList divs = nodelist.extractAllTagsThatMatch (new TagNameFilter 
("DIV"));
DivTag div_a = divs.extractAllTagsThatMatch (new HasAttributeFilter 
("class", "A")).element (0); // presuming there is only one
DivTag div_b = divs.extractAllTagsThatMatch (new HasAttributeFilter 
("class", "B")).element (0); // presuming there is only one

and this may be faster than searching the entire page each time.

Derrick

Ian Macfarlane wrote:

>As long as you keep the original reference to the NodeList created by
>Parser.parse, and you haven't modified that NodeList, you should be
>able to reuse it, I think.
>
>Ian
>
>On 8/3/06, Jesse Hou <hp...@gm...> wrote:
>  
>
>>Hi All,   When I'm using the htmlparser library, I suffered from a
>>difficulty. In a html there are many tags such as title, div, input, span
>>and so on. For example:
>>
>><title>this is a test </title>
>>
>>
>>//...... any other tags
>>
>><div class="A">
>>       <span class="B"><a href=" www.google.com ">google</a></span>
>></div>
>>
>>
>>//...... any other tags
>>
>><div class="C">
>>       <div class="D"><input type="text" id="E" value="msn" /></div>
>></div>
>>
>>//...... any other tags
>>
>>
>><div class="C">
>>       <div class="E"><span class="B"><input type="text" id="E" value="aol"
>>/><a href=" www.live.com ">live</a></span></div>
>></div>
>>
>>In this example maybe the whole html include many tags. if I want to get the
>>content 'this is a test',  maybe I can use a TagNameFilter, I have to parse
>>the whole html. If I want to get the content 'google' or ' www.google.com'
>>then I have to parse the whole html for the second time and if I want to get
>>'msn', 'aol', 'live' maybe I should parse the whole html for several times.
>>In this way I can get the content what I need but maybe this way will impact
>>the performance. Is there any other way to do that?  Maybe I can also use
>>OrFilter to get the Nodes but how can I identify a text match which tag? If
>>I want to store them into DB I have no idea how to do that by only once
>>parsing the html (the best performance).  I beg your help. :-)
>>
>>Thanks and Best Regards
>>
>>Jesse
>>
>  
>