Thread: [Htmlparser-user] How to extract more than one tag by only once parsering?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi All,   When I'm using the htmlparser library, I suffered from a
difficulty. In a html there are many tags such as title, div, input,
span and so on. For example:

<title>this is a test </title>

//...... any other tags

<div class="A">
       <span class="B"><a href=" www.google.com ">google</a></span>
</div>

//...... any other tags

<div class="C">
       <div class="D"><input type="text" id="E" value="msn" /></div>
</div>

//...... any other tags

<div class="C">
       <div class="E"><span class="B"><input type="text" id="E" value="aol"
/><a href=" www.live.com ">live</a></span></div>
</div>

In this example maybe the whole html include many tags. if I want to get the
content 'this is a test',  maybe I can use a TagNameFilter, I have to parse
the whole html. If I want to get the content 'google' or 'www.google.com'
then I have to parse the whole html for the second time and if I want to get
'msn', 'aol', 'live' maybe I should parse the whole html for several times.
In this way I can get the content what I need but maybe this way will impact
the performance. Is there any other way to do that?  Maybe I can also use
OrFilter to get the Nodes but how can I identify a text match which tag? If
I want to store them into DB I have no idea how to do that by only once
parsing the html (the best performance).  I beg your help. :-)

Thanks and Best Regards

Jesse

Thread: [Htmlparser-user] How to extract more than one tag by only once parsering?

htmlparser-user