Re: [Htmlparser-user] how to extract content from the html tag
Brought to you by:
derrickoswald
|
From: neethu j. <nee...@gm...> - 2008-06-04 01:25:15
|
Parser joburlparser=new Parser("
http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={5e86df59-eb37-4e01-864a-e7662b31e44b}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1
");
NodeList jobidList=joburlparser.parse(new
HasAttributeFilter("class","FormContentFieldValue"));
jobidList.extractAllNodesThatMatch(new
TagNameFilter("TD"));
System.out.println(jobidlist.toHtml());
NodeList
jobid_child=jobidlist.elementAt(3).getChildren();
System.out.println(jobid_child.toHtml());
this gives me the jobId ,but i do not want to use elementAt(3).
On Tue, Jun 3, 2008 at 5:17 PM, Derrick Oswald <der...@ro...>
wrote:
> I used the FilterBuilder application to quickly generate the filter you
> need:
>
> import org.htmlparser.*;
> import org.htmlparser.filters.*;
> import org.htmlparser.beans.*;
> import org.htmlparser.util.*;
>
> public class JobId
> {
> public static void main (String args[])
> {
> HasAttributeFilter filter0 = new HasAttributeFilter ();
> filter0.setAttributeName ("class");
> filter0.setAttributeValue ("FormContentFieldValue");
> StringFilter filter1 = new StringFilter ();
> filter1.setCaseSensitive (true);
> filter1.setLocale (new java.util.Locale ("en", "US", ""));
> filter1.setPattern ("Job ID");
> HasChildFilter filter2 = new HasChildFilter ();
> filter2.setRecursive (false);
> filter2.setChildFilter (filter1);
> HasSiblingFilter filter3 = new HasSiblingFilter ();
> filter3.setSiblingFilter (filter2);
> NodeFilter[] array0 = new NodeFilter[2];
> array0[0] = filter0;
> array0[1] = filter3;
> AndFilter filter4 = new AndFilter ();
> filter4.setPredicates (array0);
> NodeFilter[] array1 = new NodeFilter[1];
> array1[0] = filter4;
> FilterBean bean = new FilterBean ();
> bean.setFilters (array1);
> if (0 != args.length)
> {
> bean.setURL (args[0]);
> System.out.println (bean.getNodes ().toHtml ());
> }
> else
> System.out.println ("Usage: java -classpath
> .;htmlparser.jar;htmllexer.jar JobId <url>");
> }
> }
>
>
> ----- Original Message ----
> From: neethu joseph <nee...@gm...>
> To: htmlparser user list <htm...@li...>
> Sent: Tuesday, June 3, 2008 4:38:43 PM
> Subject: Re: [Htmlparser-user] how to extract content from the html tag
>
> Thanks Derrick !! I tried using the ANDFilter but no luck !! Gives me a
> null pointer exception
> here is the page that i'm trying to read
> http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID={ce1c851e-f6ee-4194-ad6d-c020f94be177}&sCOMP_ID={D3C729A8-A506-438B-8840-C1615DD4E822}&sPers_ID=&tp_id=1<http://careers2.hiredesk.net/viewjobs/jobdetail.asp?Comp=oci&PROJ_ID=%7Bce1c851e-f6ee-4194-ad6d-c020f94be177%7D&sCOMP_ID=%7BD3C729A8-A506-438B-8840-C1615DD4E822%7D&sPers_ID=&tp_id=1>
>
> On Thu, May 29, 2008 at 8:43 PM, Derrick Oswald <der...@ro...>
> wrote:
>
>> The results of applying new AndFilter (new TagNameFilter ("TD"), new
>> HasSiblingFilter (new StringFilter ("Job ID", true))) would give you the <
>> td class="FormContentFieldValue">524</td> tag, so you could ask for
>> toPlainText() and convert resulting the string into an integer value if you
>> want.
>>
>> ----- Original Message ----
>> From: neethu joseph <nee...@gm...>
>> To: htmlparser user list <htm...@li...>
>> Sent: Thursday, May 29, 2008 1:07:26 AM
>> Subject: Re: [Htmlparser-user] how to extract content from the html tag
>>
>> Thanks for your reply ...Could you please explain a little more on this
>> one ..
>> Well ultimately i'm interested in the field value of the job id i.e 524 .
>>
>> On Wed, May 28, 2008 at 7:53 PM, Derrick Oswald <der...@ro...>
>> wrote:
>>
>>>
>>> You should be able to construct a filter using the FilterBuilder
>>> application to look for the "Job ID" in the adjacent TD.
>>> It will be something like:
>>> new AndFilter (new TagNameFilter ("TD"), new HasSiblingFilter (new
>>> StringFilter ("Job ID", true)))
>>>
>>>
>>> ----- Original Message ----
>>> From: neethu joseph <nee...@gm...>
>>> To: htm...@li...
>>> Sent: Wednesday, May 28, 2008 1:06:00 PM
>>> Subject: [Htmlparser-user] how to extract content from the html tag
>>>
>>> Hi I'm new to HtmlParser.Could you please help me to extract the *Job ID
>>> * from the table .I was trying to located it as the 3rd element of the
>>> table, but the page is getting modified day by day so i need to work out an
>>> alternative to find the job ID
>>>
>>>
>>> </tr>
>>> <tr class="FormContent">
>>> <td class="FormContentFieldLabel">City</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td class="FormContentFieldValue">St. Louis</td>
>>> </tr>
>>>
>>> <tr class="FormContent">
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td class="FormContentFieldLabel">State/Province</td>
>>>
>>> <td class="FormContentFieldValue">Missouri [MO]</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> </tr>
>>>
>>> <tr class="FormContent">
>>> <td
>>> class="FormContentFieldLabel">Job Title</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td class="FormContentFieldValue">Director, Graduate Studies in IS Management</td>
>>>
>>> </tr>
>>> <tr class="FormContent">
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td class="FormContentFieldLabel">Job ID</td>
>>>
>>> <td class="FormContentFieldValue">524</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> </tr>
>>>
>>> <tr class="FormContent">
>>>
>>> <td class="FormContentFieldLabel">Job Type</td>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <td
>>> class="FormContentFieldValue">Director</td>
>>> </tr>
>>>
>>>
>>> regards
>>>
>>> NAT
>>>
>>>
>>>
>>> -------------------------------------------------------------------------
>>> This SF.net email is sponsored by: Microsoft
>>> Defy all challenges. Microsoft(R) Visual Studio 2008.
>>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>>> _______________________________________________
>>> Htmlparser-user mailing list
>>> Htm...@li...
>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>>
>>>
>>
>> -------------------------------------------------------------------------
>> This SF.net email is sponsored by: Microsoft
>> Defy all challenges. Microsoft(R) Visual Studio 2008.
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>> _______________________________________________
>> Htmlparser-user mailing list
>> Htm...@li...
>> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>>
>>
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|