Thread: Re: [Htmlparser-user] how to extract content from the html tag
Brought to you by:
derrickoswald
From: Derrick O. <der...@ro...> - 2008-05-29 00:53:12
|
You should be able to construct a filter using the FilterBuilder application to look for the "Job ID" in the adjacent TD. It will be something like: new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true))) ----- Original Message ---- From: neethu joseph <nee...@gm...> To: htm...@li... Sent: Wednesday, May 28, 2008 1:06:00 PM Subject: [Htmlparser-user] how to extract content from the html tag Hi I'm new to HtmlParser.Could you please help me to extract the Job ID from the table .I was trying to located it as the 3rd element of the table, but the page is getting modified day by day so i need to work out an alternative to find the job ID </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">City</td> <td class="FormContentFieldValue">St. Louis</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">State/Province</td> <td class="FormContentFieldValue">Missouri [MO]</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job Title</td> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job ID</td> <td class="FormContentFieldValue">524</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job Type</td> <td class="FormContentFieldValue">Director</td> </tr> regards NAT |
From: Derrick O. <der...@ro...> - 2008-05-30 01:43:45
|
The results of applying new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the <tdclass="FormContentFieldValue">524</td> tag, so you could ask for toPlainText() and convert resulting the string into an integer value if you want. ----- Original Message ---- From: neethu joseph <nee...@gm...> To: htmlparser user list <htm...@li...> Sent: Thursday, May 29, 2008 1:07:26 AM Subject: Re: [Htmlparser-user] how to extract content from the html tag Thanks for your reply ...Could you please explain a little more on this one .. Well ultimately i'm interested in the field value of the job id i.e 524 . On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...> wrote: You should be able to construct a filter using the FilterBuilder application to look for the "Job ID" in the adjacent TD. It will be something like: new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new StringFilter ("Job ID", true))) ----- Original Message ---- From: neethu joseph <nee...@gm...> To: htm...@li... Sent: Wednesday, May 28, 2008 1:06:00 PM Subject: [Htmlparser-user] how to extract content from the html tag Hi I'm new to HtmlParser.Could you please help me to extract the Job ID from the table .I was trying to located it as the 3rd element of the table, but the page is getting modified day by day so i need to work out an alternative to find the job ID </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">City</td> <td class="FormContentFieldValue">St. Louis</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">State/Province</td> <td class="FormContentFieldValue">Missouri [MO]</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job Title</td> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job ID</td> <td class="FormContentFieldValue">524</td> </tr> <tr class="FormContent"> <td class="FormContentFieldLabel">Job Type</td> <td class="FormContentFieldValue">Director</td> </tr> regards NAT ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user |
From: neethu j. <nee...@gm...> - 2008-06-03 20:38:46
|
Thanks Derrick !! I tried using the ANDFilter but no luck !! Gives me a null pointer exception here is the page that i'm trying to read http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={ce1c851e-f6ee-4194-ad6d-c020f94be177}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1 On Thu, May 29, 2008 at 8:43 PM, Derrick Oswald <der...@ro...> wrote: > The results of applying new AndFilter (new TagNameFilter ("TD"), new > HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the < > td class="FormContentFieldValue">524</td> tag, so you could ask for > toPlainText() and convert resulting the string into an integer value if you > want. > > ----- Original Message ---- > From: neethu joseph <nee...@gm...> > To: htmlparser user list <htm...@li...> > Sent: Thursday, May 29, 2008 1:07:26 AM > Subject: Re: [Htmlparser-user] how to extract content from the html tag > > Thanks for your reply ...Could you please explain a little more on this one > .. > Well ultimately i'm interested in the field value of the job id i.e 524 . > > On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...> > wrote: > >> >> You should be able to construct a filter using the FilterBuilder >> application to look for the "Job ID" in the adjacent TD. >> It will be something like: >> new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new >> StringFilter ("Job ID", true))) >> >> >> ----- Original Message ---- >> From: neethu joseph <nee...@gm...> >> To: htm...@li... >> Sent: Wednesday, May 28, 2008 1:06:00 PM >> Subject: [Htmlparser-user] how to extract content from the html tag >> >> Hi I'm new to HtmlParser.Could you please help me to extract the *Job ID*from the table .I was trying to located it as the 3rd element of the table, >> but the page is getting modified day by day so i need to work out an >> alternative to find the job ID >> >> >> </tr> >> <tr class="FormContent"> >> <td class="FormContentFieldLabel">City</td> >> >> >> >> >> >> >> <td class="FormContentFieldValue">St. Louis</td> >> </tr> >> >> <tr class="FormContent"> >> >> >> >> >> >> <td class="FormContentFieldLabel">State/Province</td> >> >> <td class="FormContentFieldValue">Missouri [MO]</td> >> >> >> >> >> >> </tr> >> >> <tr class="FormContent"> >> <td class="FormContentFieldLabel">Job Title</td> >> >> >> >> >> >> >> >> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td> >> >> </tr> >> <tr class="FormContent"> >> >> >> >> >> >> <td class="FormContentFieldLabel">Job ID</td> >> >> <td class="FormContentFieldValue">524</td> >> >> >> >> >> >> </tr> >> >> <tr class="FormContent"> >> >> <td class="FormContentFieldLabel">Job Type</td> >> >> >> >> >> >> >> <td class="FormContentFieldValue">Director</td> >> </tr> >> >> >> regards >> >> NAT >> >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
From: neethu j. <nee...@gm...> - 2008-05-29 05:07:29
|
Thanks for your reply ...Could you please explain a little more on this one .. Well ultimately i'm interested in the field value of the job id i.e 524 . On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...> wrote: > > You should be able to construct a filter using the FilterBuilder > application to look for the "Job ID" in the adjacent TD. > It will be something like: > new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new > StringFilter ("Job ID", true))) > > > ----- Original Message ---- > From: neethu joseph <nee...@gm...> > To: htm...@li... > Sent: Wednesday, May 28, 2008 1:06:00 PM > Subject: [Htmlparser-user] how to extract content from the html tag > > Hi I'm new to HtmlParser.Could you please help me to extract the *Job ID*from the table .I was trying to located it as the 3rd element of the table, > but the page is getting modified day by day so i need to work out an > alternative to find the job ID > > > </tr> > <tr class="FormContent"> > <td class="FormContentFieldLabel">City</td> > > <td class="FormContentFieldValue">St. Louis</td> > </tr> > > <tr class="FormContent"> > <td class="FormContentFieldLabel">State/Province</td> > > <td class="FormContentFieldValue">Missouri [MO]</td> > </tr> > > <tr class="FormContent"> > <td class="FormContentFieldLabel">Job Title</td> > > <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td> > > </tr> > <tr class="FormContent"> > <td class="FormContentFieldLabel">Job ID</td> > > <td class="FormContentFieldValue">524</td> > </tr> > > <tr class="FormContent"> > > <td class="FormContentFieldLabel">Job Type</td> > > <td class="FormContentFieldValue">Director</td> > </tr> > > > regards > > NAT > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |