[Htmlparser-user] Htmlparser does not parse <div> tag
Brought to you by:
derrickoswald
From: Henry T. <htr...@ya...> - 2008-06-16 11:07:47
|
Hi All, I am having difficulty parsing the following table using htmlparser table data filter statements: <table border="0" cellpadding="0" cellspacing="0" width="782" id="main-content"> <tr> <td valign="top" class="top"> <table border="0" cellpadding="0" cellspacing="0"> <tr> <td valign="top" class="top"> <!-- un-delay results 14/10/2004 .................................. ---> <div class="greyBorder"> <table border="0" cellspacing="0" cellpadding="2" width="100%"> <tr> <td class="propType"> </td> <td class="propType"><b>Patient</b></td> <td class="propType"><b>Firstname</b></td> <td class="propType"><b>Surname</b></td> <td class="propType" align="right"><b>Date of birth</b></td> <td class="propType">Sex</td> </tr> <tr class="smallnarrow"> <td class="even" width="10" align="left"></td> <td class="even" style="vertical-align: middle;">Clinic</td> <td class="even" style="vertical-align: middle;">John</td> <td class="even" style="vertical-align: middle;">Smith</td> <td class="even" align="right" style="vertical-align: middle;">10/02/1940</td> <td class="even" width="10" style="vertical-align: middle;">M</td> </tr> </table> </div> <div style="margin-top:10px;"> <br> <br> <br> </div> <div align="center" style="margin-bottom: 20px;"> ......... </td></tr></table></td></tr></table> The table data filter statements below pick up every lines shown above which is more than what I wanted: (1) new AndFilter ( new TagNameFilter ("table"), (2) new AndFilter ( new HasAttributeFilter ("border","0"), (3) new AndFilter ( new HasAttributeFilter ("cellspacing","0"), (4) new AndFilter ( new HasAttributeFilter ("cellpadding"), (5) new AndFilter ( new HasAttributeFilter ("width","782"), (6) new AndFilter ( new HasAttributeFilter ("id","main-content"), (7) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"), (8) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"), (9) new HasChildFilter ( new AndFilter ( new TagNameFilter ("table"), (10) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"), (11) new HasChildFilter ( new TagNameFilter ("td"),true)),true)),true)),true)),true))))))); However, I would like to narrow down the parsing by extracting only the Patient table data in bold aboved. Nevertheless, the additional parsing statements below have not proven to be successful: (1) new AndFilter ( new TagNameFilter ("table"), (2) new AndFilter ( new HasAttributeFilter ("border","0"), (3) new AndFilter ( new HasAttributeFilter ("cellspacing","0"), (4) new AndFilter ( new HasAttributeFilter ("cellpadding"), (5) new AndFilter ( new HasAttributeFilter ("width","782"), (6) new AndFilter ( new HasAttributeFilter ("id","main-content"), (7) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"), (8) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"), (9) new HasChildFilter ( new AndFilter ( new TagNameFilter ("table"), (10) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"), (11) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"), (12) new HasChildFilter ( new AndFilter ( new TagNameFilter ("div"), (13) new HasAttributeFilter "class","greyBorder")),true)),true)),true)),true)),true)),true))))))); Line 12-13 searches for the <div> with attribute class=greyBorder but it did not pick up the Patient table at all. Any idea on where the last parsing statement went wrong? It appears that the htmlparser does not treat <div> as a nested tag around the Patient table. Many thanks, Henry Get the name you always wanted with the new y7mail email address. www.yahoo7.com.au/mail |