[Htmlparser-user] Htmlparser does not parse <div> tag
Brought to you by:
derrickoswald
|
From: Henry T. <htr...@ya...> - 2008-06-16 11:07:47
|
Hi All,
I am having difficulty parsing the following table using htmlparser table data filter statements:
<table border="0" cellpadding="0" cellspacing="0" width="782" id="main-content">
<tr>
<td valign="top" class="top">
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td valign="top" class="top">
<!-- un-delay results 14/10/2004 .................................. --->
<div class="greyBorder">
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr>
<td class="propType"> </td>
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of birth</b></td>
<td class="propType">Sex</td>
</tr>
<tr class="smallnarrow">
<td class="even" width="10" align="left"></td>
<td class="even" style="vertical-align: middle;">Clinic</td>
<td class="even" style="vertical-align: middle;">John</td>
<td class="even" style="vertical-align: middle;">Smith</td>
<td class="even" align="right" style="vertical-align: middle;">10/02/1940</td>
<td class="even" width="10" style="vertical-align: middle;">M</td>
</tr>
</table>
</div>
<div style="margin-top:10px;">
<br> <br>
<br>
</div>
<div align="center" style="margin-bottom: 20px;">
.........
</td></tr></table></td></tr></table>
The table data filter statements below pick up every lines shown above which is more than what I wanted:
(1) new AndFilter ( new TagNameFilter ("table"),
(2) new AndFilter ( new HasAttributeFilter ("border","0"),
(3) new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
(4) new AndFilter ( new HasAttributeFilter ("cellpadding"),
(5) new AndFilter ( new HasAttributeFilter ("width","782"),
(6) new AndFilter ( new HasAttributeFilter ("id","main-content"),
(7) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
(8) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"),
(9) new HasChildFilter ( new AndFilter ( new TagNameFilter ("table"),
(10) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
(11) new HasChildFilter ( new TagNameFilter ("td"),true)),true)),true)),true)),true)))))));
However, I would like to narrow down the parsing by extracting only the Patient table data in bold aboved. Nevertheless, the additional parsing statements below have not proven to be successful:
(1) new AndFilter ( new TagNameFilter ("table"),
(2) new AndFilter ( new HasAttributeFilter ("border","0"),
(3) new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
(4) new AndFilter ( new HasAttributeFilter ("cellpadding"),
(5) new AndFilter ( new HasAttributeFilter ("width","782"),
(6) new AndFilter ( new HasAttributeFilter ("id","main-content"),
(7) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
(8) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"),
(9) new HasChildFilter ( new AndFilter ( new TagNameFilter ("table"),
(10) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
(11) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"),
(12) new HasChildFilter ( new AndFilter ( new TagNameFilter ("div"),
(13) new HasAttributeFilter "class","greyBorder")),true)),true)),true)),true)),true)),true)))))));
Line 12-13 searches for the <div> with attribute class=greyBorder but it did not pick up the Patient table at all. Any idea on where the last parsing statement went wrong? It appears that the htmlparser does not treat <div> as a nested tag around the Patient table.
Many thanks,
Henry
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |