htmlparser-user Mailing List for HTML Parser (Page 19)
Brought to you by:
derrickoswald
You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
| 2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
| 2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
| 2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
| 2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
| 2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
| 2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
| 2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
| 2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
| 2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
| 2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
| 2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
| 2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
| 2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
| 2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
| 2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
| 2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
|
From: Henry T. <htr...@ya...> - 2008-06-16 11:07:47
|
Hi All,
I am having difficulty parsing the following table using htmlparser table data filter statements:
<table border="0" cellpadding="0" cellspacing="0" width="782" id="main-content">
<tr>
<td valign="top" class="top">
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td valign="top" class="top">
<!-- un-delay results 14/10/2004 .................................. --->
<div class="greyBorder">
<table border="0" cellspacing="0" cellpadding="2" width="100%">
<tr>
<td class="propType"> </td>
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of birth</b></td>
<td class="propType">Sex</td>
</tr>
<tr class="smallnarrow">
<td class="even" width="10" align="left"></td>
<td class="even" style="vertical-align: middle;">Clinic</td>
<td class="even" style="vertical-align: middle;">John</td>
<td class="even" style="vertical-align: middle;">Smith</td>
<td class="even" align="right" style="vertical-align: middle;">10/02/1940</td>
<td class="even" width="10" style="vertical-align: middle;">M</td>
</tr>
</table>
</div>
<div style="margin-top:10px;">
<br> <br>
<br>
</div>
<div align="center" style="margin-bottom: 20px;">
.........
</td></tr></table></td></tr></table>
The table data filter statements below pick up every lines shown above which is more than what I wanted:
(1) new AndFilter ( new TagNameFilter ("table"),
(2) new AndFilter ( new HasAttributeFilter ("border","0"),
(3) new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
(4) new AndFilter ( new HasAttributeFilter ("cellpadding"),
(5) new AndFilter ( new HasAttributeFilter ("width","782"),
(6) new AndFilter ( new HasAttributeFilter ("id","main-content"),
(7) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
(8) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"),
(9) new HasChildFilter ( new AndFilter ( new TagNameFilter ("table"),
(10) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
(11) new HasChildFilter ( new TagNameFilter ("td"),true)),true)),true)),true)),true)))))));
However, I would like to narrow down the parsing by extracting only the Patient table data in bold aboved. Nevertheless, the additional parsing statements below have not proven to be successful:
(1) new AndFilter ( new TagNameFilter ("table"),
(2) new AndFilter ( new HasAttributeFilter ("border","0"),
(3) new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
(4) new AndFilter ( new HasAttributeFilter ("cellpadding"),
(5) new AndFilter ( new HasAttributeFilter ("width","782"),
(6) new AndFilter ( new HasAttributeFilter ("id","main-content"),
(7) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
(8) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"),
(9) new HasChildFilter ( new AndFilter ( new TagNameFilter ("table"),
(10) new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
(11) new HasChildFilter ( new AndFilter ( new TagNameFilter ("td"),
(12) new HasChildFilter ( new AndFilter ( new TagNameFilter ("div"),
(13) new HasAttributeFilter "class","greyBorder")),true)),true)),true)),true)),true)),true)))))));
Line 12-13 searches for the <div> with attribute class=greyBorder but it did not pick up the Patient table at all. Any idea on where the last parsing statement went wrong? It appears that the htmlparser does not treat <div> as a nested tag around the Patient table.
Many thanks,
Henry
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |
|
From: Henry T. <htr...@ya...> - 2008-06-11 23:36:22
|
Hi All,
Could anyone help out with this possible issue?
I still could not parse the cellpadding attribute.
Thanks,
Henry
----- Forwarded Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Sent: Monday, 9 June, 2008 8:40:39 PM
Subject: Does htmlparser Support cellpadding?
Hi forum members,
I am having difficulty parsing the content of the table below due to what appears to be the HasAttributeFilter() class which could not recognise the "cellpadding" attribute:
<table border="0" cellspacing="0" cellpadding="2" width="100%">
Here are the table data filters that I have tried without much luck:
(i) new AndFilter ( new TagNameFilter ("table"), new HasAttributeFilter("cellpadding","2"));
(ii) new AndFilter ( new TagNameFilter ("table"), new HasAttributeFilter("cellspacing","0"));
(iii) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("cellspacing","0"),
new HasAttributeFilter("width","100%")));
(iv) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("cellspacing","0"),
new AndFilter ( new HasAttributeFilter("cellpadding","2"),
new HasAttributeFilter("width","100%"))));
Table data filters (i) & (iv) did not pick up anything while (ii) and (iii) worked but also include other tables that were not needed. Filter (iv) is perfect if only it would work. As a result, I would like to make the following queries on this issue:
(a) Does HasAttributeFilter() support cellpadding?
(b) Is there a limit on how many attribute HasAttributeFilter() could pick up in a table?
(c) Can HasAttributeFilter() pick up attributes in nested tables? This table is nested inside another table.
(d) Does the search for the attributes follow certain order? If so, it may mean that order of the HasAttributeFilter() may need to be alter to achieve the desire search.
Many thanks,
Henry
________________________________
Get the name you always wanted with the new y7mail email address.
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |
|
From: Henry T. <htr...@ya...> - 2008-06-11 05:08:42
|
Hi Derrick,
I have tried the following 2 attempts by incorporating the HasChildFilter() around "new AndFilter ( new TagNameFilter ("tr"), " with little success still:
(a) new AndFilter ( new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter ("border","0"),
new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
new HasAttributeFilter ("width","100%")))),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new OrFilter ( new HasAttributeFilter ("class","propType"),
new HasAttributeFilter ("class","even")))));
(b) new AndFilter ( new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter ("border","0"),
new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
new HasAttributeFilter ("width","100%")))),
new HasChildFilter ( new AndFilter ( new TagNameFilter ("tr"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new OrFilter ( new HasAttributeFilter ("class","propType"),
new HasAttributeFilter ("class","even"))))));
What is missing this time?
Do you have any instruction on putting the FilterBuilder (SVN trunk\src\org\htmlparser\parserapplications\filterbuilder\FilterBuilder.java) application together? There appears to be some dependency for other classes. Is this the tutorial you are referring to?
Your guidances have been invaluable so far.
Thanks again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 11 June, 2008 10:14:09 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Yes, you are missing a HasChildFilter around the "new AndFilter ( new TagNameFilter ("tr"),"
Please try the FilterBuilder tool. All will become clear I think.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, June 10, 2008 7:43:51 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
I have tried the following table data filter by taking into account of your suggestion to use TagNameFilter() to look for <table> & <tr> as opposed to TagNameFilter() to look for <table> and HasChildFilter() for <tr> but still not parsing anything through:
new AndFilter ( new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter ("border","0"),
new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
new HasAttributeFilter ("width","100%")))),
new AndFilter ( new TagNameFilter ("tr"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new OrFilter ( new HasAttributeFilter ("class","propType"),
new HasAttributeFilter ("class","even")))));
I still don't understand why we should treat both <table> & <tr> on the same level even though <tr> is the child of <table>. As a result, <td> should be the grandchild of <table>.
The class attribute now should pick up either "propType" or "even" value but not both.
Thanks,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, 10 June, 2008 9:12:02 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
The FilterBuilder project is off the trunk in SVN and I believe it is included in the download.
Visitors are in the parser tree in SVN trunk\parser\src\main\java\org\htmlparser\visitors.
The link filters operate on the href text of a link <a href="KKK">.
NodeClassFilter is like TagNameFilter but uses the tag class instead of the tag name.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, June 10, 2008 12:03:01 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Where can I find a copy of the FilterBuilder, visitors, custom tags in conjunction with the PrototypicalNodeFactory tutorials?
I also not sure how do those LinkRegexFilter, LinkStringFilter and NodeClassFilter work.
Btw, I have worked out how to do the question (ii) earlier.
Thanks,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Monday, 9 June, 2008 8:56:47 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
It looks like you've got two HasAttribute filters looking for two different values in the same "class" attribute.
How can a tag have a "class=proType" *and* a "class=even" at the same time?
GrandParents and GrandChildren are handled with subfilters.
Here's an example for 'TABLE has a grand child TD'.
new AndFilter (new TagNameFilter ("TABLE"), new AndFilter (new TagNameFilter ("TR"), new HasChildFilter (new TagNameFilter ("TD")))
You should probably play with the FilterBuilder application - it has a tutorial - to get the hang of it.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Saturday, June 7, 2008 8:45:01 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.
I would like to read the following table:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of Birth</b></td>
<td class="propType">Sex</td>
</tr>
</table>
Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without
success:
(a) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")));
(b) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new AndFilter ( new HasAttributeFilter("class", "even"),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
new AndFilter ( new HasAttributeFilter ("border","0"),
( new HasAttributeFilter("width", "100%"))))))));
(c) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("border","0"),
new AndFilter ( new HasAttributeFilter("width", "100%"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")))))));
None of the above filters parse the table data I wanted. Where have I gone wrong?
(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various
parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any
idea on how this can be done?
Many thanks again,
Jack
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Create a node list:
NodeList results = new NodeList ();
Then in your loop over each result, add the nodes to the list instead of printing them out:
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
results.Add (tag);
}
Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
int len = results.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)results.elementAt(i);
// do what you want
}
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |
|
From: Derrick O. <der...@ro...> - 2008-06-11 00:14:18
|
Yes, you are missing a HasChildFilter around the "new AndFilter ( new TagNameFilter ("tr"),"
Please try the FilterBuilder tool. All will become clear I think.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, June 10, 2008 7:43:51 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
I have tried the following table data filter by taking into account of your suggestion to use TagNameFilter() to look for <table> & <tr> as opposed to TagNameFilter() to look for <table> and HasChildFilter() for <tr> but still not parsing anything through:
new AndFilter ( new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter ("border","0"),
new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
new HasAttributeFilter ("width","100%")))),
new AndFilter ( new TagNameFilter ("tr"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new OrFilter ( new HasAttributeFilter ("class","propType"),
new HasAttributeFilter ("class","even")))));
I still don't understand why we should treat both <table> & <tr> on the same level even though <tr> is the child of <table>. As a result, <td> should be the grandchild of <table>.
The class attribute now should pick up either "propType" or "even" value but not both.
Thanks,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, 10 June, 2008 9:12:02 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
The FilterBuilder project is off the trunk in SVN and I believe it is included in the download.
Visitors are in the parser tree in SVN trunk\parser\src\main\java\org\htmlparser\visitors.
The link filters operate on the href text of a link <a href="KKK">.
NodeClassFilter is like TagNameFilter but uses the tag class instead of the tag name.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, June 10, 2008 12:03:01 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Where can I find a copy of the FilterBuilder, visitors, custom tags in conjunction with the PrototypicalNodeFactory tutorials?
I also not sure how do those LinkRegexFilter, LinkStringFilter and NodeClassFilter work.
Btw, I have worked out how to do the question (ii) earlier.
Thanks,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Monday, 9 June, 2008 8:56:47 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
It looks like you've got two HasAttribute filters looking for two different values in the same "class" attribute.
How can a tag have a "class=proType" *and* a "class=even" at the same time?
GrandParents and GrandChildren are handled with subfilters.
Here's an example for 'TABLE has a grand child TD'.
new AndFilter (new TagNameFilter ("TABLE"), new AndFilter (new TagNameFilter ("TR"), new HasChildFilter (new TagNameFilter ("TD")))
You should probably play with the FilterBuilder application - it has a tutorial - to get the hang of it.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Saturday, June 7, 2008 8:45:01 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.
I would like to read the following table:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of Birth</b></td>
<td class="propType">Sex</td>
</tr>
</table>
Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without
success:
(a) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")));
(b) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new AndFilter ( new HasAttributeFilter("class", "even"),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
new AndFilter ( new HasAttributeFilter ("border","0"),
( new HasAttributeFilter("width", "100%"))))))));
(c) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("border","0"),
new AndFilter ( new HasAttributeFilter("width", "100%"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")))))));
None of the above filters parse the table data I wanted. Where have I gone wrong?
(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various
parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any
idea on how this can be done?
Many thanks again,
Jack
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Create a node list:
NodeList results = new NodeList ();
Then in your loop over each result, add the nodes to the list instead of printing them out:
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
results.Add (tag);
}
Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
int len = results.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)results.elementAt(i);
// do what you want
}
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address. |
|
From: Henry T. <htr...@ya...> - 2008-06-10 23:43:59
|
Hi Derrick,
I have tried the following table data filter by taking into account of your suggestion to use TagNameFilter() to look for <table> & <tr> as opposed to TagNameFilter() to look for <table> and HasChildFilter() for <tr> but still not parsing anything through:
new AndFilter ( new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter ("border","0"),
new AndFilter ( new HasAttributeFilter ("cellspacing","0"),
new HasAttributeFilter ("width","100%")))),
new AndFilter ( new TagNameFilter ("tr"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new OrFilter ( new HasAttributeFilter ("class","propType"),
new HasAttributeFilter ("class","even")))));
I still don't understand why we should treat both <table> & <tr> on the same level even though <tr> is the child of <table>. As a result, <td> should be the grandchild of <table>.
The class attribute now should pick up either "propType" or "even" value but not both.
Thanks,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, 10 June, 2008 9:12:02 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
The FilterBuilder project is off the trunk in SVN and I believe it is included in the download.
Visitors are in the parser tree in SVN trunk\parser\src\main\java\org\htmlparser\visitors.
The link filters operate on the href text of a link <a href="KKK">.
NodeClassFilter is like TagNameFilter but uses the tag class instead of the tag name.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, June 10, 2008 12:03:01 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Where can I find a copy of the FilterBuilder, visitors, custom tags in conjunction with the PrototypicalNodeFactory tutorials?
I also not sure how do those LinkRegexFilter, LinkStringFilter and NodeClassFilter work.
Btw, I have worked out how to do the question (ii) earlier.
Thanks,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Monday, 9 June, 2008 8:56:47 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
It looks like you've got two HasAttribute filters looking for two different values in the same "class" attribute.
How can a tag have a "class=proType" *and* a "class=even" at the same time?
GrandParents and GrandChildren are handled with subfilters.
Here's an example for 'TABLE has a grand child TD'.
new AndFilter (new TagNameFilter ("TABLE"), new AndFilter (new TagNameFilter ("TR"), new HasChildFilter (new TagNameFilter ("TD")))
You should probably play with the FilterBuilder application - it has a tutorial - to get the hang of it.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Saturday, June 7, 2008 8:45:01 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.
I would like to read the following table:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of Birth</b></td>
<td class="propType">Sex</td>
</tr>
</table>
Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without
success:
(a) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")));
(b) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new AndFilter ( new HasAttributeFilter("class", "even"),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
new AndFilter ( new HasAttributeFilter ("border","0"),
( new HasAttributeFilter("width", "100%"))))))));
(c) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("border","0"),
new AndFilter ( new HasAttributeFilter("width", "100%"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")))))));
None of the above filters parse the table data I wanted. Where have I gone wrong?
(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various
parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any
idea on how this can be done?
Many thanks again,
Jack
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Create a node list:
NodeList results = new NodeList ();
Then in your loop over each result, add the nodes to the list instead of printing them out:
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
results.Add (tag);
}
Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
int len = results.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)results.elementAt(i);
// do what you want
}
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |
|
From: Derrick O. <der...@ro...> - 2008-06-10 11:12:20
|
The FilterBuilder project is off the trunk in SVN and I believe it is included in the download.
Visitors are in the parser tree in SVN trunk\parser\src\main\java\org\htmlparser\visitors.
The link filters operate on the href text of a link <a href="KKK">.
NodeClassFilter is like TagNameFilter but uses the tag class instead of the tag name.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Tuesday, June 10, 2008 12:03:01 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Where can I find a copy of the FilterBuilder, visitors, custom tags in conjunction with the PrototypicalNodeFactory tutorials?
I also not sure how do those LinkRegexFilter, LinkStringFilter and NodeClassFilter work.
Btw, I have worked out how to do the question (ii) earlier.
Thanks,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Monday, 9 June, 2008 8:56:47 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
It looks like you've got two HasAttribute filters looking for two different values in the same "class" attribute.
How can a tag have a "class=proType" *and* a "class=even" at the same time?
GrandParents and GrandChildren are handled with subfilters.
Here's an example for 'TABLE has a grand child TD'.
new AndFilter (new TagNameFilter ("TABLE"), new AndFilter (new TagNameFilter ("TR"), new HasChildFilter (new TagNameFilter ("TD")))
You should probably play with the FilterBuilder application - it has a tutorial - to get the hang of it.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Saturday, June 7, 2008 8:45:01 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.
I would like to read the following table:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of Birth</b></td>
<td class="propType">Sex</td>
</tr>
</table>
Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without
success:
(a) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")));
(b) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new AndFilter ( new HasAttributeFilter("class", "even"),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
new AndFilter ( new HasAttributeFilter ("border","0"),
( new HasAttributeFilter("width", "100%"))))))));
(c) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("border","0"),
new AndFilter ( new HasAttributeFilter("width", "100%"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")))))));
None of the above filters parse the table data I wanted. Where have I gone wrong?
(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various
parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any
idea on how this can be done?
Many thanks again,
Jack
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Create a node list:
NodeList results = new NodeList ();
Then in your loop over each result, add the nodes to the list instead of printing them out:
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
results.Add (tag);
}
Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
int len = results.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)results.elementAt(i);
// do what you want
}
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address. |
|
From: Henry T. <htr...@ya...> - 2008-06-10 04:03:10
|
Hi Derrick,
Where can I find a copy of the FilterBuilder, visitors, custom tags in conjunction with the PrototypicalNodeFactory tutorials?
I also not sure how do those LinkRegexFilter, LinkStringFilter and NodeClassFilter work.
Btw, I have worked out how to do the question (ii) earlier.
Thanks,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Monday, 9 June, 2008 8:56:47 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
It looks like you've got two HasAttribute filters looking for two different values in the same "class" attribute.
How can a tag have a "class=proType" *and* a "class=even" at the same time?
GrandParents and GrandChildren are handled with subfilters.
Here's an example for 'TABLE has a grand child TD'.
new AndFilter (new TagNameFilter ("TABLE"), new AndFilter (new TagNameFilter ("TR"), new HasChildFilter (new TagNameFilter ("TD")))
You should probably play with the FilterBuilder application - it has a tutorial - to get the hang of it.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Saturday, June 7, 2008 8:45:01 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.
I would like to read the following table:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of Birth</b></td>
<td class="propType">Sex</td>
</tr>
</table>
Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without
success:
(a) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")));
(b) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new AndFilter ( new HasAttributeFilter("class", "even"),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
new AndFilter ( new HasAttributeFilter ("border","0"),
( new HasAttributeFilter("width", "100%"))))))));
(c) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("border","0"),
new AndFilter ( new HasAttributeFilter("width", "100%"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")))))));
None of the above filters parse the table data I wanted. Where have I gone wrong?
(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various
parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any
idea on how this can be done?
Many thanks again,
Jack
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Create a node list:
NodeList results = new NodeList ();
Then in your loop over each result, add the nodes to the list instead of printing them out:
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
results.Add (tag);
}
Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
int len = results.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)results.elementAt(i);
// do what you want
}
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address.
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |
|
From: Derrick O. <der...@ro...> - 2008-06-09 10:56:59
|
It looks like you've got two HasAttribute filters looking for two different values in the same "class" attribute.
How can a tag have a "class=proType" *and* a "class=even" at the same time?
GrandParents and GrandChildren are handled with subfilters.
Here's an example for 'TABLE has a grand child TD'.
new AndFilter (new TagNameFilter ("TABLE"), new AndFilter (new TagNameFilter ("TR"), new HasChildFilter (new TagNameFilter ("TD")))
You should probably play with the FilterBuilder application - it has a tutorial - to get the hang of it.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Saturday, June 7, 2008 8:45:01 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.
I would like to read the following table:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of Birth</b></td>
<td class="propType">Sex</td>
</tr>
</table>
Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without
success:
(a) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")));
(b) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new AndFilter ( new HasAttributeFilter("class", "even"),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
new AndFilter ( new HasAttributeFilter ("border","0"),
( new HasAttributeFilter("width", "100%"))))))));
(c) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("border","0"),
new AndFilter ( new HasAttributeFilter("width", "100%"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")))))));
None of the above filters parse the table data I wanted. Where have I gone wrong?
(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various
parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any
idea on how this can be done?
Many thanks again,
Jack
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Create a node list:
NodeList results = new NodeList ();
Then in your loop over each result, add the nodes to the list instead of printing them out:
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
results.Add (tag);
}
Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
int len = results.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)results.elementAt(i);
// do what you want
}
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
________________________________
Get the name you always wanted with the new y7mail email address.
________________________________
Get the name you always wanted with the new y7mail email address. |
|
From: Henry T. <htr...@ya...> - 2008-06-09 10:40:47
|
Hi forum members,
I am having difficulty parsing the content of the table below due to what appears to be the HasAttributeFilter() class which could not recognise the "cellpadding" attribute:
<table border="0" cellspacing="0" cellpadding="2" width="100%">
Here are the table data filters that I have tried without much luck:
(i) new AndFilter ( new TagNameFilter ("table"), new HasAttributeFilter("cellpadding","2"));
(ii) new AndFilter ( new TagNameFilter ("table"), new HasAttributeFilter("cellspacing","0"));
(iii) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("cellspacing","0"),
new HasAttributeFilter("width","100%")));
(iv) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("cellspacing","0"),
new AndFilter ( new HasAttributeFilter("cellpadding","2"),
new HasAttributeFilter("width","100%"))));
Table data filters (i) & (iv) did not pick up anything while (ii) and (iii) worked but also include other tables that were not needed. Filter (iv) is perfect if only it would work. As a result, I would like to make the following queries on this issue:
(a) Does HasAttributeFilter() support cellpadding?
(b) Is there a limit on how many attribute HasAttributeFilter() could pick up in a table?
(c) Can HasAttributeFilter() pick up attributes in nested tables? This table is nested inside another table.
(d) Does the search for the attributes follow certain order? If so, it may mean that order of the HasAttributeFilter() may need to be alter to achieve the desire search.
Many thanks,
Henry
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |
|
From: Henry T. <htr...@ya...> - 2008-06-08 00:45:12
|
Hi Derrick,
It appears that I have made one step forward but two steps back in terms of parsing some of these html tables.
I would like to read the following table:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
<td class="propType"><b>Patient</b></td>
<td class="propType"><b>Firstname</b></td>
<td class="propType"><b>Surname</b></td>
<td class="propType" align="right"><b>Date of Birth</b></td>
<td class="propType">Sex</td>
</tr>
</table>
Below are the various table data filters used to in an attempt to distinguish the correct table I wanted to read without
success:
(a) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")));
(b) new AndFilter ( new TagNameFilter ("td"),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new AndFilter ( new HasAttributeFilter("class", "even"),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasParentFilter ( new TagNameFilter ("table")),
new AndFilter ( new HasAttributeFilter ("border","0"),
( new HasAttributeFilter("width", "100%"))))))));
(c) new AndFilter ( new TagNameFilter ("table"),
new AndFilter ( new HasAttributeFilter("border","0"),
new AndFilter ( new HasAttributeFilter("width", "100%"),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("tr")),
new AndFilter ( new HasChildFilter ( new TagNameFilter ("td")),
new AndFilter ( new HasAttributeFilter("class", "proType"),
new HasAttributeFilter("class", "even")))))));
None of the above filters parse the table data I wanted. Where have I gone wrong?
(i) Btw, does htmlparser support HasGrandParent() and HasGrandChild() which would allow me to parse:
<table border="0" cellspacing="0" cellpadding="2" width="100%"> // HasGrandParent()...
<tr> // HasParent()...
<td class="propType"> </td> // HasAttributeFilter()...
(ii) I would also like to retrieve all the content of the same webpage to a file and then read it back to test out various
parsing needs without having a direct Internet connection to this site for every parsing test. Is this possible? If so, any
idea on how this can be done?
Many thanks again,
Jack
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Thursday, 5 June, 2008 9:08:32 AM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Create a node list:
NodeList results = new NodeList ();
Then in your loop over each result, add the nodes to the list instead of printing them out:
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
results.Add (tag);
}
Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
int len = results.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)results.elementAt(i);
// do what you want
}
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
________________________________
Get the name you always wanted with the new y7mail email address.
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |
|
From: Derrick O. <der...@ro...> - 2008-06-04 23:08:40
|
Create a node list:
NodeList results = new NodeList ();
Then in your loop over each result, add the nodes to the list instead of printing them out:
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
results.Add (tag);
}
Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have:
int len = results.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)results.elementAt(i);
// do what you want
}
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, June 4, 2008 5:40:29 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
________________________________
Get the name you always wanted with the new y7mail email address. |
|
From: Henry T. <htr...@ya...> - 2008-06-04 21:40:37
|
Hi Derrick,
Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort?
I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table.
I am very new to using htmlparser and would appreciate a little guidance.
Thanks very much again,
Henry
----- Original Message ----
From: Derrick Oswald <der...@ro...>
To: htmlparser user list <htm...@li...>
Sent: Wednesday, 4 June, 2008 10:56:07 PM
Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com
Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail |
|
From: Derrick O. <der...@ro...> - 2008-06-04 12:56:14
|
You should just add the tags you want to a NodeList of your own.
Then later on process all the nodes in the list... filing them to a database for instance.
----- Original Message ----
From: Henry Tran <htr...@ya...>
To: Htm...@li...
Cc: htm...@li...
Sent: Wednesday, June 4, 2008 8:43:09 AM
Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com |
|
From: Henry T. <htr...@ya...> - 2008-06-04 12:43:16
|
Hi All,
I have been successful in extracting almost all the table data using the following htmlparser statements in Java:
Parser parser = new Parser ("http://www.abc.com/...");
NodeList nl = parser.parse(null);
NodeFilter currenttabledatafilter =
new AndFilter (
new TagNameFilter ("td"),
new OrFilter (
new HasAttributeFilter("class","even"),
new OrFilter (
new HasAttributeFilter("class", "odd"),
new AndFilter (
new HasAttributeFilter("colspan","6"),
new HasChildFilter(new TagNameFilter ("Strong"))))));
NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true);
int len = a1.size();
for (int i=0; i<len; i+=1)
{
TagNode tag = (TagNode)a1.elementAt(i);
System.out.println(tag.toPlainTextString());
// System.out.println(tag.toHtml());
}
} catch(Exception pe) {
pe.printStackTrace();
}
This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows:
While not end of line
(i) identify a new table based on its unique attributes.
(ii) assign the value/content of each <td> in the current table to a unique variable for instance.
(iii) repeat step (i) and (ii) for remaining tables.
Thanks a lot,
Henry
Send instant messages to your online friends http://au.messenger.yahoo.com |
|
From: neethu j. <nee...@gm...> - 2008-06-04 01:25:15
|
Parser joburlparser=new Parser("
http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={5e86df59-eb37-4e01-864a-e7662b31e44b}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1
");
NodeList jobidList=joburlparser.parse(new
HasAttributeFilter("class","FormContentFieldValue"));
jobidList.extractAllNodesThatMatch(new
TagNameFilter("TD"));
System.out.println(jobidlist.toHtml());
NodeList
jobid_child=jobidlist.elementAt(3).getChildren();
System.out.println(jobid_child.toHtml());
this gives me the jobId ,but i do not want to use elementAt(3).
On Tue, Jun 3, 2008 at 5:17 PM, Derrick Oswald <der...@ro...>
wrote:
> I used the FilterBuilder application to quickly generate the filter you
> need:
>
> import org.htmlparser.*;
> import org.htmlparser.filters.*;
> import org.htmlparser.beans.*;
> import org.htmlparser.util.*;
>
> public class JobId
> {
> public static void main (String args[])
> {
> HasAttributeFilter filter0 = new HasAttributeFilter ();
> filter0.setAttributeName ("class");
> filter0.setAttributeValue ("FormContentFieldValue");
> StringFilter filter1 = new StringFilter ();
> filter1.setCaseSensitive (true);
> filter1.setLocale (new java.util.Locale ("en", "US", ""));
> filter1.setPattern ("Job ID");
> HasChildFilter filter2 = new HasChildFilter ();
> filter2.setRecursive (false);
> filter2.setChildFilter (filter1);
> HasSiblingFilter filter3 = new HasSiblingFilter ();
> filter3.setSiblingFilter (filter2);
> NodeFilter[] array0 = new NodeFilter[2];
> array0[0] = filter0;
> array0[1] = filter3;
> AndFilter filter4 = new AndFilter ();
> filter4.setPredicates (array0);
> NodeFilter[] array1 = new NodeFilter[1];
> array1[0] = filter4;
> FilterBean bean = new FilterBean ();
> bean.setFilters (array1);
> if (0 != args.length)
> {
> bean.setURL (args[0]);
> System.out.println (bean.getNodes ().toHtml ());
> }
> else
> System.out.println ("Usage: java -classpath
> .;htmlparser.jar;htmllexer.jar JobId <url>");
> }
> }
>
>
> ----- Original Message ----
> From: neethu joseph <nee...@gm...>
> To: htmlparser user list <htm...@li...>
> Sent: Tuesday, June 3, 2008 4:38:43 PM
> Subject: Re: [Htmlparser-user] how to extract content from the html tag
>
> Thanks Derrick !! I tried using the ANDFilter but no luck !! Gives me a
> null pointer exception
> here is the page that i'm trying to read
> http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={ce1c851e-f6ee-4194-ad6d-c020f94be177}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1<http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID=%7Bce1c851e-f6ee-4194-ad6d-c020f94be177%7D&sCOMP_ID=%7BD3C729A8-A506-438B-8840-C1615DD4E822%7D&sPers_ID=&tp_id=1>
>
> On Thu, May 29, 2008 at 8:43 PM, Derrick Oswald <der...@ro...>
> wrote:
>
>> The results of applying new AndFilter (new TagNameFilter ("TD"), new
>> HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the <
>> td class="FormContentFieldValue">524</td> tag, so you could ask for
>> toPlainText() and convert resulting the string into an integer value if you
>> want.
>>
>> ----- Original Message ----
>> From: neethu joseph <nee...@gm...>
>> To: htmlparser user list <htm...@li...>
>> Sent: Thursday, May 29, 2008 1:07:26 AM
>> Subject: Re: [Htmlparser-user] how to extract content from the html tag
>>
>> Thanks for your reply ...Could you please explain a little more on this
>> one ..
>> Well ultimately i'm interested in the field value of the job id i.e 524 .
>>
>> On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...>
>> wrote:
>>
>>>
>>> You should be able to construct a filter using the FilterBuilder
>>> application to look for the "Job ID" in the adjacent TD.
>>> It will be something like:
>>> new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new
>>> StringFilter ("Job ID", true)))
>>>
>>>
>>> ----- Original Message ----
>>> From: neethu joseph <nee...@gm...>
>>> To: htm...@li...
>>> Sent: Wednesday, May 28, 2008 1:06:00 PM
>>> Subject: [Htmlparser-user] how to extract content from the html tag
>>>
>>> Hi I'm new to HtmlParser.Could you please help me to extract the *Job ID
>>> * from the table .I was trying to located it as the 3rd element of the
>>> table, but the page is getting modified day by day so i need to work out an
>>> alternative to find the job ID
>>>
>>>
>>> </tr>
>>> <tr class="FormContent">
>>> <td class="FormContentFieldLabel">City</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td class="FormContentFieldValue">St. Louis</td>
>>> </tr>
>>>
>>> <tr class="FormContent">
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td class="FormContentFieldLabel">State/Province</td>
>>>
>>> <td class="FormContentFieldValue">Missouri [MO]</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> </tr>
>>>
>>> <tr class="FormContent">
>>> <td
>>> class="FormContentFieldLabel">Job Title</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td>
>>>
>>> </tr>
>>> <tr class="FormContent">
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td class="FormContentFieldLabel">Job ID</td>
>>>
>>> <td class="FormContentFieldValue">524</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> </tr>
>>>
>>> <tr class="FormContent">
>>>
>>> <td class="FormContentFieldLabel">Job Type</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td
>>> class="FormContentFieldValue">Director</td>
>>> </tr>
>>>
>>>
>>> regards
>>>
>>> NAT
>>>
>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by: Microsoft
>>> Defy all challenges. Microsoft(R) Visual Studio 2008.
>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>>> _______________________________________________
>>> Htmlparser-user mailing list
>>> Htm...@li...
>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>
>>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Microsoft
>> Defy all challenges. Microsoft(R) Visual Studio 2008.
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>> _______________________________________________
>> Htmlparser-user mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>
>>
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|
|
From: neethu j. <nee...@gm...> - 2008-06-03 20:38:46
|
Thanks Derrick !! I tried using the ANDFilter but no luck !! Gives me a null pointer exception here is the page that i'm trying to read http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={ce1c851e-f6ee-4194-ad6d-c020f94be177}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1 On Thu, May 29, 2008 at 8:43 PM, Derrick Oswald <der...@ro...> wrote: > The results of applying new AndFilter (new TagNameFilter ("TD"), new > HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the < > td class="FormContentFieldValue">524</td> tag, so you could ask for > toPlainText() and convert resulting the string into an integer value if you > want. > > ----- Original Message ---- > From: neethu joseph <nee...@gm...> > To: htmlparser user list <htm...@li...> > Sent: Thursday, May 29, 2008 1:07:26 AM > Subject: Re: [Htmlparser-user] how to extract content from the html tag > > Thanks for your reply ...Could you please explain a little more on this one > .. > Well ultimately i'm interested in the field value of the job id i.e 524 . > > On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...> > wrote: > >> >> You should be able to construct a filter using the FilterBuilder >> application to look for the "Job ID" in the adjacent TD. >> It will be something like: >> new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new >> StringFilter ("Job ID", true))) >> >> >> ----- Original Message ---- >> From: neethu joseph <nee...@gm...> >> To: htm...@li... >> Sent: Wednesday, May 28, 2008 1:06:00 PM >> Subject: [Htmlparser-user] how to extract content from the html tag >> >> Hi I'm new to HtmlParser.Could you please help me to extract the *Job ID*from the table .I was trying to located it as the 3rd element of the table, >> but the page is getting modified day by day so i need to work out an >> alternative to find the job ID >> >> >> </tr> >> <tr class="FormContent"> >> <td class="FormContentFieldLabel">City</td> >> >> >> >> >> >> >> <td class="FormContentFieldValue">St. Louis</td> >> </tr> >> >> <tr class="FormContent"> >> >> >> >> >> >> <td class="FormContentFieldLabel">State/Province</td> >> >> <td class="FormContentFieldValue">Missouri [MO]</td> >> >> >> >> >> >> </tr> >> >> <tr class="FormContent"> >> <td class="FormContentFieldLabel">Job Title</td> >> >> >> >> >> >> >> >> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td> >> >> </tr> >> <tr class="FormContent"> >> >> >> >> >> >> <td class="FormContentFieldLabel">Job ID</td> >> >> <td class="FormContentFieldValue">524</td> >> >> >> >> >> >> </tr> >> >> <tr class="FormContent"> >> >> <td class="FormContentFieldLabel">Job Type</td> >> >> >> >> >> >> >> <td class="FormContentFieldValue">Director</td> >> </tr> >> >> >> regards >> >> NAT >> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
|
From: Derrick O. <der...@ro...> - 2008-05-30 01:43:45
|
The results of applying new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the <tdclass="FormContentFieldValue">524</td> tag, so you could ask for toPlainText() and convert resulting the string into an integer value if you want.
----- Original Message ----
From: neethu joseph <nee...@gm...>
To: htmlparser user list <htm...@li...>
Sent: Thursday, May 29, 2008 1:07:26 AM
Subject: Re: [Htmlparser-user] how to extract content from the html tag
Thanks for your reply ...Could you please explain a little more on this one ..
Well ultimately i'm interested in the field value of the job id i.e 524 .
On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...> wrote:
You should be able to construct a filter using the FilterBuilder application to look for the "Job ID" in the adjacent TD.
It will be something like:
new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true)))
----- Original Message ----
From: neethu joseph <nee...@gm...>
To: htm...@li...
Sent: Wednesday, May 28, 2008 1:06:00 PM
Subject: [Htmlparser-user] how to extract content from the html tag
Hi I'm new to HtmlParser.Could you please help me to extract the Job ID from the table .I was trying to located it as the 3rd element of the table, but the page is getting modified day by day so i need to work out an alternative to find the job ID
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">City</td>
<td class="FormContentFieldValue">St. Louis</td>
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">State/Province</td>
<td class="FormContentFieldValue">Missouri [MO]</td>
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">Job Title</td>
<td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td>
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">Job ID</td>
<td class="FormContentFieldValue">524</td>
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">Job Type</td>
<td class="FormContentFieldValue">Director</td>
</tr>
regards
NAT
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
|
From: neethu j. <nee...@gm...> - 2008-05-29 05:07:29
|
Thanks for your reply ...Could you please explain a little more on this one
..
Well ultimately i'm interested in the field value of the job id i.e 524 .
On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...>
wrote:
>
> You should be able to construct a filter using the FilterBuilder
> application to look for the "Job ID" in the adjacent TD.
> It will be something like:
> new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new
> StringFilter ("Job ID", true)))
>
>
> ----- Original Message ----
> From: neethu joseph <nee...@gm...>
> To: htm...@li...
> Sent: Wednesday, May 28, 2008 1:06:00 PM
> Subject: [Htmlparser-user] how to extract content from the html tag
>
> Hi I'm new to HtmlParser.Could you please help me to extract the *Job ID*from the table .I was trying to located it as the 3rd element of the table,
> but the page is getting modified day by day so i need to work out an
> alternative to find the job ID
>
>
> </tr>
> <tr class="FormContent">
> <td class="FormContentFieldLabel">City</td>
>
> <td class="FormContentFieldValue">St. Louis</td>
> </tr>
>
> <tr class="FormContent">
> <td class="FormContentFieldLabel">State/Province</td>
>
> <td class="FormContentFieldValue">Missouri [MO]</td>
> </tr>
>
> <tr class="FormContent">
> <td class="FormContentFieldLabel">Job Title</td>
>
> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td>
>
> </tr>
> <tr class="FormContent">
> <td class="FormContentFieldLabel">Job ID</td>
>
> <td class="FormContentFieldValue">524</td>
> </tr>
>
> <tr class="FormContent">
>
> <td class="FormContentFieldLabel">Job Type</td>
>
> <td class="FormContentFieldValue">Director</td>
> </tr>
>
>
> regards
>
> NAT
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|
|
From: Derrick O. <der...@ro...> - 2008-05-29 00:53:12
|
You should be able to construct a filter using the FilterBuilder application to look for the "Job ID" in the adjacent TD.
It will be something like:
new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true)))
----- Original Message ----
From: neethu joseph <nee...@gm...>
To: htm...@li...
Sent: Wednesday, May 28, 2008 1:06:00 PM
Subject: [Htmlparser-user] how to extract content from the html tag
Hi I'm new to HtmlParser.Could you please help me to extract the Job ID from the table .I was trying to located it as the 3rd element of the table, but the page is getting modified day by day so i need to work out an alternative to find the job ID
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">City</td>
<td class="FormContentFieldValue">St. Louis</td>
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">State/Province</td>
<td class="FormContentFieldValue">Missouri [MO]</td>
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">Job Title</td>
<td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td>
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">Job ID</td>
<td class="FormContentFieldValue">524</td>
</tr>
<tr class="FormContent">
<td class="FormContentFieldLabel">Job Type</td>
<td class="FormContentFieldValue">Director</td>
</tr>
regards
NAT |
|
From: neethu j. <nee...@gm...> - 2008-05-28 17:06:01
|
Hi I'm new to HtmlParser.Could you please help me to extract the *Job ID*from the table .I was trying to located it as the 3rd element of the table, but the page is getting modified day by day so i need to work out an alternative to find the job ID </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">City</td> <td class="FormContentFieldValue">St. Louis</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">State/Province</td> <td class="FormContentFieldValue">Missouri [MO]</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job Title</td> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job ID</td> <td class="FormContentFieldValue">524</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job Type</td> <td class="FormContentFieldValue">Director</td> </tr> regards NAT |
|
From: answers s. <fas...@gm...> - 2008-05-26 09:15:16
|
hi
i am the same behaviour for div tag also
i am using like this
NodeFilter filterStyleClass = new
HasAttributeFilter("class",(String)list.get(j));
NodeList listStyleClass=parse.extractAllNodesThatMatch(filterStyleClass);
i want extract a div tag with attribute "class" and also the childTags
inside tht DivTag
On 5/24/08, Derrick Oswald <der...@ro...> wrote:
>
> The HtmlParser doesn't come with a Font tag that is composite.
> If you want this behaviour you need to define your own tag as described
> here: http://htmlparser.sourceforge.net/faq.html#composite
>
> ----- Original Message ----
> From: answers solutions <fas...@gm...>
> To: htm...@li...
> Sent: Friday, May 23, 2008 4:59:54 AM
> Subject: [Htmlparser-user] how to get a node struture with particular
> attribute
>
> hi
>
> i am filter like this
>
>
> NodeFilter filterClass = new AndFilter(new TagNameFilter("font"),new
> HasAttributeFilter("class","leftnavi"));
>
>
>
> I am using this filter aginst this Text
>
> <FONT class=leftnavi size=2>
> <a href="http://epaper.thehindu.com">ePaper</a><br<http://epaper.thehindu.com%22%3eepaper%3c/a%3E%3Cbr>
> >
> <A href="01hdline.htm">Front Page</A><BR>
> <A href="02hdline.htm">National</A><BR>
> <font class=leftnavi color=black>States:</font><br>
> • <A href="23hdline.htm">Tamil Nadu</A><BR>
> • <A href="21hdline.htm">Andhra Pradesh</A><BR>
> • <A href="22hdline.htm">Karnataka</A><BR>
> • <A href="25hdline.htm">Kerala</A><BR>
> • <A href="24hdline.htm">New Delhi</A><BR>
> • <A href="14hdline.htm">Other States</A><BR>
> <A href="03hdline.htm">International</A><BR>
> <A href="05hdline.htm">Opinion</A><BR>
> <A href="06hdline.htm">Business</A><BR>
> <A href="07hdline.htm">Sport</A><BR>
> <A href="10hdline.htm">Miscellaneous</A><BR>
> • <A href="10hdline.htm#019">Cartoons</A><BR>
> <A href="26hdline.htm">Engagements</A><BR>
> </FONT>
>
>
> but the i am getting is <FONT class=leftnavi size=2>
>
>
> but i want o/p as whole font tag as
>
> <FONT class=leftnavi size=2>
> <a href="http://epaper.thehindu.com">ePaper</a><br<http://epaper.thehindu.com%22%3eepaper%3c/a%3E%3Cbr>
> >
> <A href="01hdline.htm">Front Page</A><BR>
> <A href="02hdline.htm">National</A><BR>
> <font class=leftnavi color=black>States:</font><br>
> • <A href="23hdline.htm">Tamil Nadu</A><BR>
> • <A href="21hdline.htm">Andhra Pradesh</A><BR>
> • <A href="22hdline.htm">Karnataka</A><BR>
> • <A href="25hdline.htm">Kerala</A><BR>
> • <A href="24hdline.htm">New Delhi</A><BR>
> • <A href="14hdline.htm">Other States</A><BR>
> <A href="03hdline.htm">International</A><BR>
> <A href="05hdline.htm">Opinion</A><BR>
> <A href="06hdline.htm">Business</A><BR>
> <A href="07hdline.htm">Sport</A><BR>
> <A href="10hdline.htm">Miscellaneous</A><BR>
> • <A href="10hdline.htm#019">Cartoons</A><BR>
> <A href="26hdline.htm">Engagements</A><BR>
> </FONT>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|
|
From: <Sri...@ba...> - 2008-05-26 07:08:41
|
Hi Abdullah and everyone else, Thank you for looking into my request for help. I have attached an example of the HTML file I want to parse using HTMLParser. Regards, Sridhar Venkataraman Summer Analyst, Global Technology (Asia-Pacific) Barclays Capital Services Ltd 60B Orchard Road #10-00, TheAtrium@Orchard, Singapore - 238891 + (65) 6828 4609 (O) + (65) 9871 0076 (m) | sri...@ba... -----Original Message----- From: htm...@li... [mailto:htm...@li...] On Behalf Of htm...@li... Sent: 22 May 2008 21:14 To: htm...@li... Subject: Htmlparser-user Digest, Vol 23, Issue 3 Send Htmlparser-user mailing list submissions to htm...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/htmlparser-user or, via email, send a message with subject or body 'help' to htm...@li... You can reach the person managing the list at htm...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of Htmlparser-user digest..." Today's Topics: 1. Help with a link extraction program (Sri...@ba...) 2. Replacing attributes of DOCTYPE tag (?? ??) 3. Re: Help with a link extraction program (abdullah) 4. How to extract table without a nested table in it (answers solutions) 5. Re: How to extract table without a nested table in it (Derrick Oswald) ---------------------------------------------------------------------- Message: 1 Date: Tue, 20 May 2008 15:13:39 +0800 From: <Sri...@ba...> Subject: [Htmlparser-user] Help with a link extraction program To: <htm...@li...> Message-ID: <B89...@SG... PINT.COM> Content-Type: text/plain; charset="us-ascii" Hi everyone, I am a new user of the HTMLParser API. I have found the link extraction features to be very useful even in this short space of time. I would like to seek help with a program that I have to write. It involves link extraction, but the logic is slightly more convoluted. Currently, I know how to use the LinkExtractor to supply a HTML document as input and output the links in that document to either the command prompt or a text file (with suitable modifications where required of course). I have a HTML document in which there is a hierarchy of links in the form of lists. I would like the output of the link information given by LinkExtractor to reflect this hierarchy in some way. For example, I have a list of items in a <ul> tag. Each of these items may/may not contain their own sub-items with their own links, so that the HTML looks something like: <ul> <li> <a href="...."> Item 1 </a> <ul> <li> <a href="...."> Sub-Item 1 </a> </li> <li> <a href="...."> Sub-Item 2 </a> </li> </ul> <li> Item 2 </li> </ul> I would like to know how I can parse a document full of lists like these and extract the links while having some indication of the hierarchy, either the "tree path" of the link (i.e. if I extract the link underyling Sub-Item 1 in my example, my text file should contain something along the lines of "Item 1 > Sub-Item 1" before printing the actual link path) or outputting a page identical to the one I am parsing but with the full path of the link printed beside each of those list items. Thanks for all your help in this regard. Warm Regards, Sridhar Venkataraman Summer Analyst, Global Technology (Asia-Pacific) Barclays Capital Services Ltd 60B Orchard Road #10-00, TheAtrium@Orchard, Singapore - 238891 + (65) 6828 4609 (O) + (65) 9871 0076 (m) | sri...@ba... _______________________________________________ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered offic e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. _______________________________________________ ------------------------------ Message: 2 Date: Tue, 20 May 2008 17:34:15 +0900 From: ?? ?? <nag...@by...> Subject: [Htmlparser-user] Replacing attributes of DOCTYPE tag To: htm...@li... Message-ID: <483...@by...> Content-Type: text/plain; charset=ISO-2022-JP Dear All, I am new to HTML Parser, and I don't understand well how to handle !DOCTYPE tag. Shortly speaking, I'd like to replace tag like this: <!DOCTYPE html PUBLIC "XXXX" "AAAA"> into: <! DOCTYPE html PUBLIC "YYYY" "BBBB"> I sat on my chair and had a lots of trial and error, but it did'nt work. I'd appreciate it if you could give me advice. (My e-mail address had changed.) ------------------------------ Message: 3 Date: Tue, 20 May 2008 15:37:18 +0300 From: abdullah <abd...@id...> Subject: Re: [Htmlparser-user] Help with a link extraction program To: "htmlparser user list" <htm...@li...> Message-ID: <17d...@ma...> Content-Type: text/plain; charset="iso-8859-1" you dont need a linkExtractor you need a listExtractor , if all the links are inside lists you should get the list and navigate to its children which is the links .. for this case i suggest you parse the page with filter as following : Parser parser = new Parser(); NodeList lists = parser.parse(new NodeClassFilter(BulletList.class)); for(int i=0 i < lists.size() ;i++ ){ BulletList list = lists.elementAt(i); links = list.getChildern(); // this will give you another NodeList with children tags // do whatever you want with the links note that you need to cast each child them forn Node to LinkTag } i didnt test this code , but hopefully it will work if you gave me a specific example of the html page you want to parse i may help more good luck : ) On Tue, May 20, 2008 at 10:13 AM, <Sri...@ba...> wrote: > > Hi everyone, > > I am a new user of the HTMLParser API. I have found the link > extraction features to be very useful even in this short space of time. > > I would like to seek help with a program that I have to write. It > involves link extraction, but the logic is slightly more convoluted. > > Currently, I know how to use the LinkExtractor to supply a HTML > document as input and output the links in that document to either the > command prompt or a text file (with suitable modifications where > required of course). I have a HTML document in which there is a > hierarchy of links in the form of lists. I would like the output of > the link information given by LinkExtractor to reflect this hierarchy in some way. > > For example, I have a list of items in a <ul> tag. Each of these items > may/may not contain their own sub-items with their own links, so that > the HTML looks something like: > > <ul> > <li> <a href="...."> Item 1 </a> > <ul> > <li> <a href="...."> Sub-Item 1 </a> </li> > <li> <a href="...."> Sub-Item 2 </a> </li> > </ul> > > <li> Item 2 </li> > </ul> > > I would like to know how I can parse a document full of lists like > these and extract the links while having some indication of the > hierarchy, either the "tree path" of the link (i.e. if I extract the > link underyling Sub-Item 1 in my example, my text file should contain > something along the lines of "Item 1 > Sub-Item 1" before printing the > actual link path) or outputting a page identical to the one I am > parsing but with the full path of the link printed beside each of > those list items. > > Thanks for all your help in this regard. > > Warm Regards, > > Sridhar Venkataraman > Summer Analyst, Global Technology (Asia-Pacific) Barclays Capital > Services Ltd 60B Orchard Road #10-00, TheAtrium@Orchard, Singapore - > 238891 > + (65) 6828 4609 (O) > + (65) 9871 0076 (m) | sri...@ba... > > > _______________________________________________ > > This e-mail may contain information that is confidential, privileged > or otherwise protected from disclosure. If you are not an intended > recipient of this e-mail, do not duplicate or redistribute it by any > means. Please delete it and any attachments and notify the sender that > you have received it in error. Unless specifically indicated, this > e-mail is not an offer to buy or sell or a solicitation to buy or sell > any securities, investment products or other financial product or > service, an official confirmation of any transaction, or an official > statement of Barclays. Any views or opinions presented are solely > those of the author and do not necessarily represent those of > Barclays. This e-mail is subject to terms available at the following > link: www.barcap.com/emaildisclaimer. By messaging with Barclays you > consent to the foregoing. Barclays Capital is the investment banking > division of Barclays Bank PLC, a company registered in England (number > 1026167) with its registered offic > e at 1 Churchill Place, London, E14 5HP. This email may relate to or > be sent from other members of the Barclays Group. > _______________________________________________ > > ---------------------------------------------------------------------- > --- This SF.net email is sponsored by: Microsoft Defy all challenges. > Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 4 Date: Thu, 22 May 2008 18:06:00 +0530 From: "answers solutions" <fas...@gm...> Subject: [Htmlparser-user] How to extract table without a nested table in it To: htm...@li... Message-ID: <992...@ma...> Content-Type: text/plain; charset="iso-8859-1" Hi i am strututre like to extract a table so that it doesnot have nested table inside it . nodefilter filtertable = new AndFilter( new HasParentFilter(new TagNameFilter("table"),new NotFilter(new HasChildFilter(new TagNameFilter("table))); still the o/p i see a table with nested table in it . -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 5 Date: Thu, 22 May 2008 06:14:19 -0700 (PDT) From: Derrick Oswald <der...@ro...> Subject: Re: [Htmlparser-user] How to extract table without a nested table in it To: htmlparser user list <htm...@li...> Message-ID: <423...@we...> Content-Type: text/plain; charset="us-ascii" You probably catch these because the inner tables are not direct children of the outer table. You need the HasChildFilter (NodeFilter filter, boolean recursive) constructor with recursive set to true. ----- Original Message ---- From: answers solutions <fas...@gm...> To: htm...@li... Sent: Thursday, May 22, 2008 5:36:00 AM Subject: [Htmlparser-user] How to extract table without a nested table in it Hi i am strututre like to extract a table so that it doesnot have nested table inside it . nodefilter filtertable = new AndFilter( new HasParentFilter(new TagNameFilter("table"),new NotFilter(new HasChildFilter(new TagNameFilter("table))); still the o/p i see a table with nested table in it . -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------ - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ------------------------------ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user End of Htmlparser-user Digest, Vol 23, Issue 3 ********************************************** _______________________________________________ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. _______________________________________________ |
|
From: Derrick O. <der...@ro...> - 2008-05-23 21:28:00
|
The HtmlParser doesn't come with a Font tag that is composite. If you want this behaviour you need to define your own tag as described here: http://htmlparser.sourceforge.net/faq.html#composite ----- Original Message ---- From: answers solutions <fas...@gm...> To: htm...@li... Sent: Friday, May 23, 2008 4:59:54 AM Subject: [Htmlparser-user] how to get a node struture with particular attribute hi i am filter like this NodeFilter filterClass = new AndFilter(new TagNameFilter("font"),new HasAttributeFilter("class","leftnavi")); I am using this filter aginst this Text <FONT class=leftnavi size=2> <a href="http://epaper.thehindu.com">ePaper</a><br> <A href="01hdline.htm">Front Page</A><BR> <A href="02hdline.htm">National</A><BR> <font class=leftnavi color=black>States:</font><br> • <A href="23hdline.htm">Tamil Nadu</A><BR> • <A href="21hdline.htm">Andhra Pradesh</A><BR> • <A href="22hdline.htm">Karnataka</A><BR> • <A href="25hdline.htm">Kerala</A><BR> • <A href="24hdline.htm">New Delhi</A><BR> • <A href="14hdline.htm">Other States</A><BR> <A href="03hdline.htm">International</A><BR> <A href="05hdline.htm">Opinion</A><BR> <A href="06hdline.htm">Business</A><BR> <A href="07hdline.htm">Sport</A><BR> <A href="10hdline.htm">Miscellaneous</A><BR> • <A href="10hdline.htm#019">Cartoons</A><BR> <A href="26hdline.htm">Engagements</A><BR> </FONT> but the i am getting is <FONT class=leftnavi size=2> but i want o/p as whole font tag as <FONT class=leftnavi size=2> <a href="http://epaper.thehindu.com">ePaper</a><br> <A href="01hdline.htm">Front Page</A><BR> <A href="02hdline.htm">National</A><BR> <font class=leftnavi color=black>States:</font><br> • <A href="23hdline.htm">Tamil Nadu</A><BR> • <A href="21hdline.htm">Andhra Pradesh</A><BR> • <A href="22hdline.htm">Karnataka</A><BR> • <A href="25hdline.htm">Kerala</A><BR> • <A href="24hdline.htm">New Delhi</A><BR> • <A href="14hdline.htm">Other States</A><BR> <A href="03hdline.htm">International</A><BR> <A href="05hdline.htm">Opinion</A><BR> <A href="06hdline.htm">Business</A><BR> <A href="07hdline.htm">Sport</A><BR> <A href="10hdline.htm">Miscellaneous</A><BR> • <A href="10hdline.htm#019">Cartoons</A><BR> <A href="26hdline.htm">Engagements</A><BR> </FONT> |
|
From: abdullah <abd...@id...> - 2008-05-23 13:47:45
|
what ive understood is that you want the children tag of the FONT tag ?and
you've been able to have the FONT tag ..
so just call .getChildern() funciton on the Node you've extracted e.g :
NodeList childern =fontTag.getChildern() ;
On Fri, May 23, 2008 at 11:59 AM, answers solutions <
fas...@gm...> wrote:
> hi
>
> i am filter like this
>
>
> NodeFilter filterClass = new AndFilter(new TagNameFilter("font"),new
> HasAttributeFilter("class","leftnavi"));
>
>
>
> I am using this filter aginst this Text
>
> <FONT class=leftnavi size=2>
> <a href="http://epaper.thehindu.com">ePaper</a><br<http://epaper.thehindu.com%22%3Eepaper%3C/a%3E%3Cbr>
> >
> <A href="01hdline.htm">Front Page</A><BR>
> <A href="02hdline.htm">National</A><BR>
> <font class=leftnavi color=black>States:</font><br>
> • <A href="23hdline.htm">Tamil Nadu</A><BR>
> • <A href="21hdline.htm">Andhra Pradesh</A><BR>
> • <A href="22hdline.htm">Karnataka</A><BR>
> • <A href="25hdline.htm">Kerala</A><BR>
> • <A href="24hdline.htm">New Delhi</A><BR>
> • <A href="14hdline.htm">Other States</A><BR>
> <A href="03hdline.htm">International</A><BR>
> <A href="05hdline.htm">Opinion</A><BR>
> <A href="06hdline.htm">Business</A><BR>
> <A href="07hdline.htm">Sport</A><BR>
> <A href="10hdline.htm">Miscellaneous</A><BR>
> • <A href="10hdline.htm#019">Cartoons</A><BR>
> <A href="26hdline.htm">Engagements</A><BR>
> </FONT>
>
>
> but the i am getting is <FONT class=leftnavi size=2>
>
>
> but i want o/p as whole font tag as
>
> <FONT class=leftnavi size=2>
> <a href="http://epaper.thehindu.com">ePaper</a><br<http://epaper.thehindu.com%22%3Eepaper%3C/a%3E%3Cbr>
> >
> <A href="01hdline.htm">Front Page</A><BR>
> <A href="02hdline.htm">National</A><BR>
> <font class=leftnavi color=black>States:</font><br>
> • <A href="23hdline.htm">Tamil Nadu</A><BR>
> • <A href="21hdline.htm">Andhra Pradesh</A><BR>
> • <A href="22hdline.htm">Karnataka</A><BR>
> • <A href="25hdline.htm">Kerala</A><BR>
> • <A href="24hdline.htm">New Delhi</A><BR>
> • <A href="14hdline.htm">Other States</A><BR>
> <A href="03hdline.htm">International</A><BR>
> <A href="05hdline.htm">Opinion</A><BR>
> <A href="06hdline.htm">Business</A><BR>
> <A href="07hdline.htm">Sport</A><BR>
> <A href="10hdline.htm">Miscellaneous</A><BR>
> • <A href="10hdline.htm#019">Cartoons</A><BR>
> <A href="26hdline.htm">Engagements</A><BR>
> </FONT>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|
|
From: answers s. <fas...@gm...> - 2008-05-23 09:00:02
|
hi
i am filter like this
NodeFilter filterClass = new AndFilter(new TagNameFilter("font"),new
HasAttributeFilter("class","leftnavi"));
I am using this filter aginst this Text
<FONT class=leftnavi size=2>
<a href="http://epaper.thehindu.com">ePaper</a><br<http://epaper.thehindu.com">epaper</a><br>
>
<A href="01hdline.htm">Front Page</A><BR>
<A href="02hdline.htm">National</A><BR>
<font class=leftnavi color=black>States:</font><br>
• <A href="23hdline.htm">Tamil Nadu</A><BR>
• <A href="21hdline.htm">Andhra Pradesh</A><BR>
• <A href="22hdline.htm">Karnataka</A><BR>
• <A href="25hdline.htm">Kerala</A><BR>
• <A href="24hdline.htm">New Delhi</A><BR>
• <A href="14hdline.htm">Other States</A><BR>
<A href="03hdline.htm">International</A><BR>
<A href="05hdline.htm">Opinion</A><BR>
<A href="06hdline.htm">Business</A><BR>
<A href="07hdline.htm">Sport</A><BR>
<A href="10hdline.htm">Miscellaneous</A><BR>
• <A href="10hdline.htm#019">Cartoons</A><BR>
<A href="26hdline.htm">Engagements</A><BR>
</FONT>
but the i am getting is <FONT class=leftnavi size=2>
but i want o/p as whole font tag as
<FONT class=leftnavi size=2>
<a href="http://epaper.thehindu.com">ePaper</a><br<http://epaper.thehindu.com">epaper</a><br>
>
<A href="01hdline.htm">Front Page</A><BR>
<A href="02hdline.htm">National</A><BR>
<font class=leftnavi color=black>States:</font><br>
• <A href="23hdline.htm">Tamil Nadu</A><BR>
• <A href="21hdline.htm">Andhra Pradesh</A><BR>
• <A href="22hdline.htm">Karnataka</A><BR>
• <A href="25hdline.htm">Kerala</A><BR>
• <A href="24hdline.htm">New Delhi</A><BR>
• <A href="14hdline.htm">Other States</A><BR>
<A href="03hdline.htm">International</A><BR>
<A href="05hdline.htm">Opinion</A><BR>
<A href="06hdline.htm">Business</A><BR>
<A href="07hdline.htm">Sport</A><BR>
<A href="10hdline.htm">Miscellaneous</A><BR>
• <A href="10hdline.htm#019">Cartoons</A><BR>
<A href="26hdline.htm">Engagements</A><BR>
</FONT>
|