Re: [Htmlparser-user] how to extract content from the html tag
Brought to you by:
derrickoswald
From: neethu j. <nee...@gm...> - 2008-06-04 01:25:15
|
Parser joburlparser=new Parser(" http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={5e86df59-eb37-4e01-864a-e7662b31e44b}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1 "); NodeList jobidList=joburlparser.parse(new HasAttributeFilter("class","FormContentFieldValue")); jobidList.extractAllNodesThatMatch(new TagNameFilter("TD")); System.out.println(jobidlist.toHtml()); NodeList jobid_child=jobidlist.elementAt(3).getChildren(); System.out.println(jobid_child.toHtml()); this gives me the jobId ,but i do not want to use elementAt(3). On Tue, Jun 3, 2008 at 5:17 PM, Derrick Oswald <der...@ro...> wrote: > I used the FilterBuilder application to quickly generate the filter you > need: > > import org.htmlparser.*; > import org.htmlparser.filters.*; > import org.htmlparser.beans.*; > import org.htmlparser.util.*; > > public class JobId > { > public static void main (String args[]) > { > HasAttributeFilter filter0 = new HasAttributeFilter (); > filter0.setAttributeName ("class"); > filter0.setAttributeValue ("FormContentFieldValue"); > StringFilter filter1 = new StringFilter (); > filter1.setCaseSensitive (true); > filter1.setLocale (new java.util.Locale ("en", "US", "")); > filter1.setPattern ("Job ID"); > HasChildFilter filter2 = new HasChildFilter (); > filter2.setRecursive (false); > filter2.setChildFilter (filter1); > HasSiblingFilter filter3 = new HasSiblingFilter (); > filter3.setSiblingFilter (filter2); > NodeFilter[] array0 = new NodeFilter[2]; > array0[0] = filter0; > array0[1] = filter3; > AndFilter filter4 = new AndFilter (); > filter4.setPredicates (array0); > NodeFilter[] array1 = new NodeFilter[1]; > array1[0] = filter4; > FilterBean bean = new FilterBean (); > bean.setFilters (array1); > if (0 != args.length) > { > bean.setURL (args[0]); > System.out.println (bean.getNodes ().toHtml ()); > } > else > System.out.println ("Usage: java -classpath > .;htmlparser.jar;htmllexer.jar JobId <url>"); > } > } > > > ----- Original Message ---- > From: neethu joseph <nee...@gm...> > To: htmlparser user list <htm...@li...> > Sent: Tuesday, June 3, 2008 4:38:43 PM > Subject: Re: [Htmlparser-user] how to extract content from the html tag > > Thanks Derrick !! I tried using the ANDFilter but no luck !! Gives me a > null pointer exception > here is the page that i'm trying to read > http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={ce1c851e-f6ee-4194-ad6d-c020f94be177}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1<http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID=%7Bce1c851e-f6ee-4194-ad6d-c020f94be177%7D&sCOMP_ID=%7BD3C729A8-A506-438B-8840-C1615DD4E822%7D&sPers_ID=&tp_id=1> > > On Thu, May 29, 2008 at 8:43 PM, Derrick Oswald <der...@ro...> > wrote: > >> The results of applying new AndFilter (new TagNameFilter ("TD"), new >> HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the < >> td class="FormContentFieldValue">524</td> tag, so you could ask for >> toPlainText() and convert resulting the string into an integer value if you >> want. >> >> ----- Original Message ---- >> From: neethu joseph <nee...@gm...> >> To: htmlparser user list <htm...@li...> >> Sent: Thursday, May 29, 2008 1:07:26 AM >> Subject: Re: [Htmlparser-user] how to extract content from the html tag >> >> Thanks for your reply ...Could you please explain a little more on this >> one .. >> Well ultimately i'm interested in the field value of the job id i.e 524 . >> >> On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...> >> wrote: >> >>> >>> You should be able to construct a filter using the FilterBuilder >>> application to look for the "Job ID" in the adjacent TD. >>> It will be something like: >>> new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new >>> StringFilter ("Job ID", true))) >>> >>> >>> ----- Original Message ---- >>> From: neethu joseph <nee...@gm...> >>> To: htm...@li... >>> Sent: Wednesday, May 28, 2008 1:06:00 PM >>> Subject: [Htmlparser-user] how to extract content from the html tag >>> >>> Hi I'm new to HtmlParser.Could you please help me to extract the *Job ID >>> * from the table .I was trying to located it as the 3rd element of the >>> table, but the page is getting modified day by day so i need to work out an >>> alternative to find the job ID >>> >>> >>> </tr> >>> <tr class="FormContent"> >>> <td class="FormContentFieldLabel">City</td> >>> >>> >>> >>> >>> >>> >>> >>> >>> <td class="FormContentFieldValue">St. Louis</td> >>> </tr> >>> >>> <tr class="FormContent"> >>> >>> >>> >>> >>> >>> >>> >>> <td class="FormContentFieldLabel">State/Province</td> >>> >>> <td class="FormContentFieldValue">Missouri [MO]</td> >>> >>> >>> >>> >>> >>> >>> >>> </tr> >>> >>> <tr class="FormContent"> >>> <td >>> class="FormContentFieldLabel">Job Title</td> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td> >>> >>> </tr> >>> <tr class="FormContent"> >>> >>> >>> >>> >>> >>> >>> >>> <td class="FormContentFieldLabel">Job ID</td> >>> >>> <td class="FormContentFieldValue">524</td> >>> >>> >>> >>> >>> >>> >>> >>> </tr> >>> >>> <tr class="FormContent"> >>> >>> <td class="FormContentFieldLabel">Job Type</td> >>> >>> >>> >>> >>> >>> >>> >>> >>> <td >>> class="FormContentFieldValue">Director</td> >>> </tr> >>> >>> >>> regards >>> >>> NAT >>> >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.net email is sponsored by: Microsoft >>> Defy all challenges. Microsoft(R) Visual Studio 2008. >>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >>> _______________________________________________ >>> Htmlparser-user mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >>> >>> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |