Re: [Htmlparser-user] How to extract more than one tag by only once parsering?
Brought to you by:
derrickoswald
|
From: Derrick O. <Der...@Ro...> - 2006-08-04 11:42:35
|
Jesse,
From your example, you can also get all the div tags at once and filter
on class in a secondary pass:
NodeList divs = nodelist.extractAllTagsThatMatch (new TagNameFilter
("DIV"));
DivTag div_a = divs.extractAllTagsThatMatch (new HasAttributeFilter
("class", "A")).element (0); // presuming there is only one
DivTag div_b = divs.extractAllTagsThatMatch (new HasAttributeFilter
("class", "B")).element (0); // presuming there is only one
and this may be faster than searching the entire page each time.
Derrick
Ian Macfarlane wrote:
>As long as you keep the original reference to the NodeList created by
>Parser.parse, and you haven't modified that NodeList, you should be
>able to reuse it, I think.
>
>Ian
>
>On 8/3/06, Jesse Hou <hp...@gm...> wrote:
>
>
>>Hi All, When I'm using the htmlparser library, I suffered from a
>>difficulty. In a html there are many tags such as title, div, input, span
>>and so on. For example:
>>
>><title>this is a test </title>
>>
>>
>>//...... any other tags
>>
>><div class="A">
>> <span class="B"><a href=" www.google.com ">google</a></span>
>></div>
>>
>>
>>//...... any other tags
>>
>><div class="C">
>> <div class="D"><input type="text" id="E" value="msn" /></div>
>></div>
>>
>>//...... any other tags
>>
>>
>><div class="C">
>> <div class="E"><span class="B"><input type="text" id="E" value="aol"
>>/><a href=" www.live.com ">live</a></span></div>
>></div>
>>
>>In this example maybe the whole html include many tags. if I want to get the
>>content 'this is a test', maybe I can use a TagNameFilter, I have to parse
>>the whole html. If I want to get the content 'google' or ' www.google.com'
>>then I have to parse the whole html for the second time and if I want to get
>>'msn', 'aol', 'live' maybe I should parse the whole html for several times.
>>In this way I can get the content what I need but maybe this way will impact
>>the performance. Is there any other way to do that? Maybe I can also use
>>OrFilter to get the Nodes but how can I identify a text match which tag? If
>>I want to store them into DB I have no idea how to do that by only once
>>parsing the html (the best performance). I beg your help. :-)
>>
>>Thanks and Best Regards
>>
>>Jesse
>>
>
>
|