Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2008-06-04 23:08:40
|
Create a node list: NodeList results = new NodeList (); Then in your loop over each result, add the nodes to the list instead of printing them out: for (int i=0; i<len; i+=1) { TagNode tag = (TagNode)a1.elementAt(i); results.Add (tag); } Then when you've collected all the tables using whatever currenttabledatafilter values you have, all the tables will be in your results NodeList and you can iterate over them with the same type of loop that you have: int len = results.size(); for (int i=0; i<len; i+=1) { TagNode tag = (TagNode)results.elementAt(i); // do what you want } ----- Original Message ---- From: Henry Tran <htr...@ya...> To: htmlparser user list <htm...@li...> Sent: Wednesday, June 4, 2008 5:40:29 PM Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables Hi Derrick, Can you explain a little more perhaps with a few lines of example, if it is not to much of an effort? I thought I have already got a Nodelist a1 but the challenge is to distinguish which <TD> from which table. I am very new to using htmlparser and would appreciate a little guidance. Thanks very much again, Henry ----- Original Message ---- From: Derrick Oswald <der...@ro...> To: htmlparser user list <htm...@li...> Sent: Wednesday, 4 June, 2008 10:56:07 PM Subject: Re: [Htmlparser-user] How to save <TD> value to unique variables from html tables You should just add the tags you want to a NodeList of your own. Then later on process all the nodes in the list... filing them to a database for instance. ----- Original Message ---- From: Henry Tran <htr...@ya...> To: Htm...@li... Cc: htm...@li... Sent: Wednesday, June 4, 2008 8:43:09 AM Subject: [Htmlparser-user] How to save <TD> value to unique variables from html tables Hi All, I have been successful in extracting almost all the table data using the following htmlparser statements in Java: Parser parser = new Parser ("http://www.abc.com/..."); NodeList nl = parser.parse(null); NodeFilter currenttabledatafilter = new AndFilter ( new TagNameFilter ("td"), new OrFilter ( new HasAttributeFilter("class","even"), new OrFilter ( new HasAttributeFilter("class", "odd"), new AndFilter ( new HasAttributeFilter("colspan","6"), new HasChildFilter(new TagNameFilter ("Strong")))))); NodeList a1 = nl.extractAllNodesThatMatch(currenttabledatafilter,true); int len = a1.size(); for (int i=0; i<len; i+=1) { TagNode tag = (TagNode)a1.elementAt(i); System.out.println(tag.toPlainTextString()); // System.out.println(tag.toHtml()); } } catch(Exception pe) { pe.printStackTrace(); } This is great for retrieving all these table data. However, I would like to save the value of each <td> to a unique variable so that they could be used in the program and ultimately save them to database. As a result, I am looking to structure a program to assign each value to a unique variable (or insert it into the database, which I can do once they are available) from as many html tables on a web page. Each table has some distinct attributes but varies on the number of <td> in them. In other, I am looking for some thing similar to the loop through a text a file as follows: While not end of line (i) identify a new table based on its unique attributes. (ii) assign the value/content of each <td> in the current table to a unique variable for instance. (iii) repeat step (i) and (ii) for remaining tables. Thanks a lot, Henry Send instant messages to your online friends http://au.messenger.yahoo.com ________________________________ Get the name you always wanted with the new y7mail email address. |