[Htmlparser-user] How to extract more than one tag by only once parsering?
Brought to you by:
derrickoswald
From: Jesse H. <hp...@gm...> - 2006-08-03 02:21:56
|
Hi All, When I'm using the htmlparser library, I suffered from a difficulty. In a html there are many tags such as title, div, input, span and so on. For example: <title>this is a test </title> //...... any other tags <div class="A"> <span class="B"><a href=" www.google.com ">google</a></span> </div> //...... any other tags <div class="C"> <div class="D"><input type="text" id="E" value="msn" /></div> </div> //...... any other tags <div class="C"> <div class="E"><span class="B"><input type="text" id="E" value="aol" /><a href=" www.live.com ">live</a></span></div> </div> In this example maybe the whole html include many tags. if I want to get the content 'this is a test', maybe I can use a TagNameFilter, I have to parse the whole html. If I want to get the content 'google' or 'www.google.com' then I have to parse the whole html for the second time and if I want to get 'msn', 'aol', 'live' maybe I should parse the whole html for several times. In this way I can get the content what I need but maybe this way will impact the performance. Is there any other way to do that? Maybe I can also use OrFilter to get the Nodes but how can I identify a text match which tag? If I want to store them into DB I have no idea how to do that by only once parsing the html (the best performance). I beg your help. :-) Thanks and Best Regards Jesse |