Thread: [Htmlparser-developer] Return all table rows
Brought to you by:
derrickoswald
From: S A. <sah...@gm...> - 2010-04-13 03:55:40
|
Hi, Sending my question here, the user mailing list seems to be filled with spam? I have a HTML page, and part of the page that I want to focus on looks like: <table> <tr><td class="prod">.... </tr> <tr><td class="prod">.... </tr> <tr><td class="prod">.... </tr> <tr><td class="prod">.... </tr> </tabe> So I want to extract all the <tr>. I have used tmlParser.extractAllNodesThatMatch(...) in the past, but in this case it seems the only way to get a NodeList of all the <tr> groupings in this table is to use the value from the 1st <td> in each <tr> that has a class of 'Prod". (class="prod" is unique to the entire HTML page). Is this possible to do? |
From: Derrick O. <der...@gm...> - 2010-04-13 05:38:45
|
You should be able to create a filter that finds all TR nodes that have TD child nodes with the class="prod" attribute. See the FilterBuilder application. On Tue, Apr 13, 2010 at 5:55 AM, S Ahmed <sah...@gm...> wrote: > Hi, > > Sending my question here, the user mailing list seems to be filled with > spam? > > I have a HTML page, and part of the page that I want to focus on looks > like: > > > <table> > > <tr><td class="prod">.... > </tr> > > <tr><td class="prod">.... > </tr> > > <tr><td class="prod">.... > </tr> > > <tr><td class="prod">.... > </tr> > > </tabe> > > > So I want to extract all the <tr>. > > I have used tmlParser.extractAllNodesThatMatch(...) in the past, but in > this case it seems the only way to get a NodeList of all the <tr> groupings > in this table is to use the value from the 1st <td> in each <tr> that has a > class of 'Prod". > > (class="prod" is unique to the entire HTML page). > > Is this possible to do? > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: S A. <sah...@gm...> - 2010-04-14 10:23:30
|
Sorry kinda new to java, the filterbuilder seems to be a .jar? I have this so far: nlRows = htmlParser.extractAllNodesThatMatch( new AndFilter( new HasChildFilter("class","prod") ) ); How do I get all elements with <tr>? On Tue, Apr 13, 2010 at 1:38 AM, Derrick Oswald <der...@gm...>wrote: > You should be able to create a filter that finds all TR nodes that have TD > child nodes with the class="prod" attribute. > See the FilterBuilder application. > > > On Tue, Apr 13, 2010 at 5:55 AM, S Ahmed <sah...@gm...> wrote: > >> Hi, >> >> Sending my question here, the user mailing list seems to be filled with >> spam? >> >> I have a HTML page, and part of the page that I want to focus on looks >> like: >> >> >> <table> >> >> <tr><td class="prod">.... >> </tr> >> >> <tr><td class="prod">.... >> </tr> >> >> <tr><td class="prod">.... >> </tr> >> >> <tr><td class="prod">.... >> </tr> >> >> </tabe> >> >> >> So I want to extract all the <tr>. >> >> I have used tmlParser.extractAllNodesThatMatch(...) in the past, but in >> this case it seems the only way to get a NodeList of all the <tr> groupings >> in this table is to use the value from the 1st <td> in each <tr> that has a >> class of 'Prod". >> >> (class="prod" is unique to the entire HTML page). >> >> Is this possible to do? >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Htmlparser-developer mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: Derrick O. <der...@gm...> - 2010-04-14 17:06:23
|
FilterBuilder is a program that helps you build filters. See http://htmlparser.sourceforge.net/samples.html You can even run it online. After playing with the FilterBuilder (there is Help), you can save the filter as a small executable class. Then include the outputted code that creates the filter into your applications. On Wed, Apr 14, 2010 at 12:23 PM, S Ahmed <sah...@gm...> wrote: > Sorry kinda new to java, the filterbuilder seems to be a .jar? > > I have this so far: > > nlRows = htmlParser.extractAllNodesThatMatch( > new AndFilter( > > new HasChildFilter("class","prod") > > ) > ); > > > How do I get all elements with <tr>? > > > On Tue, Apr 13, 2010 at 1:38 AM, Derrick Oswald <der...@gm...>wrote: > >> You should be able to create a filter that finds all TR nodes that have TD >> child nodes with the class="prod" attribute. >> See the FilterBuilder application. >> >> >> On Tue, Apr 13, 2010 at 5:55 AM, S Ahmed <sah...@gm...> wrote: >> >>> Hi, >>> >>> Sending my question here, the user mailing list seems to be filled with >>> spam? >>> >>> I have a HTML page, and part of the page that I want to focus on looks >>> like: >>> >>> >>> <table> >>> >>> <tr><td class="prod">.... >>> </tr> >>> >>> <tr><td class="prod">.... >>> </tr> >>> >>> <tr><td class="prod">.... >>> </tr> >>> >>> <tr><td class="prod">.... >>> </tr> >>> >>> </tabe> >>> >>> >>> So I want to extract all the <tr>. >>> >>> I have used tmlParser.extractAllNodesThatMatch(...) in the past, but in >>> this case it seems the only way to get a NodeList of all the <tr> groupings >>> in this table is to use the value from the 1st <td> in each <tr> that has a >>> class of 'Prod". >>> >>> (class="prod" is unique to the entire HTML page). >>> >>> Is this possible to do? >>> >>> >>> ------------------------------------------------------------------------------ >>> Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> _______________________________________________ >>> Htmlparser-developer mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Htmlparser-developer mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: S A. <sah...@gm...> - 2010-04-18 01:41:58
|
Is there a way to put my HTML so I can see if the filter is applied correctly? On Wed, Apr 14, 2010 at 1:06 PM, Derrick Oswald <der...@gm...>wrote: > FilterBuilder is a program that helps you build filters. > See http://htmlparser.sourceforge.net/samples.html > You can even run it online. > After playing with the FilterBuilder (there is Help), you can save the > filter as a small executable class. > Then include the outputted code that creates the filter into your > applications. > > > On Wed, Apr 14, 2010 at 12:23 PM, S Ahmed <sah...@gm...> wrote: > >> Sorry kinda new to java, the filterbuilder seems to be a .jar? >> >> I have this so far: >> >> nlRows = htmlParser.extractAllNodesThatMatch( >> new AndFilter( >> >> new HasChildFilter("class","prod") >> >> ) >> ); >> >> >> How do I get all elements with <tr>? >> >> >> On Tue, Apr 13, 2010 at 1:38 AM, Derrick Oswald <der...@gm... >> > wrote: >> >>> You should be able to create a filter that finds all TR nodes that have >>> TD child nodes with the class="prod" attribute. >>> See the FilterBuilder application. >>> >>> >>> On Tue, Apr 13, 2010 at 5:55 AM, S Ahmed <sah...@gm...> wrote: >>> >>>> Hi, >>>> >>>> Sending my question here, the user mailing list seems to be filled with >>>> spam? >>>> >>>> I have a HTML page, and part of the page that I want to focus on looks >>>> like: >>>> >>>> >>>> <table> >>>> >>>> <tr><td class="prod">.... >>>> </tr> >>>> >>>> <tr><td class="prod">.... >>>> </tr> >>>> >>>> <tr><td class="prod">.... >>>> </tr> >>>> >>>> <tr><td class="prod">.... >>>> </tr> >>>> >>>> </tabe> >>>> >>>> >>>> So I want to extract all the <tr>. >>>> >>>> I have used tmlParser.extractAllNodesThatMatch(...) in the past, but in >>>> this case it seems the only way to get a NodeList of all the <tr> groupings >>>> in this table is to use the value from the 1st <td> in each <tr> that has a >>>> class of 'Prod". >>>> >>>> (class="prod" is unique to the entire HTML page). >>>> >>>> Is this possible to do? >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Download Intel® Parallel Studio Eval >>>> Try the new software tools for yourself. Speed compiling, find bugs >>>> proactively, and fine-tune applications for parallel performance. >>>> See why Intel Parallel Studio got high marks during beta. >>>> http://p.sf.net/sfu/intel-sw-dev >>>> _______________________________________________ >>>> Htmlparser-developer mailing list >>>> Htm...@li... >>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> _______________________________________________ >>> Htmlparser-developer mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Htmlparser-developer mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |
From: Derrick O. <der...@gm...> - 2010-04-18 05:23:36
|
Any NodeList can be output with toHtml(). If you've first caught the whole page in a NodeList before applying your changes, they should be reflected in the output from that list. On Sun, Apr 18, 2010 at 3:41 AM, S Ahmed <sah...@gm...> wrote: > Is there a way to put my HTML so I can see if the filter is applied > correctly? > > > On Wed, Apr 14, 2010 at 1:06 PM, Derrick Oswald <der...@gm...>wrote: > >> FilterBuilder is a program that helps you build filters. >> See http://htmlparser.sourceforge.net/samples.html >> You can even run it online. >> After playing with the FilterBuilder (there is Help), you can save the >> filter as a small executable class. >> Then include the outputted code that creates the filter into your >> applications. >> >> >> On Wed, Apr 14, 2010 at 12:23 PM, S Ahmed <sah...@gm...> wrote: >> >>> Sorry kinda new to java, the filterbuilder seems to be a .jar? >>> >>> I have this so far: >>> >>> nlRows = htmlParser.extractAllNodesThatMatch( >>> new AndFilter( >>> >>> new HasChildFilter("class","prod") >>> >>> ) >>> ); >>> >>> >>> How do I get all elements with <tr>? >>> >>> >>> On Tue, Apr 13, 2010 at 1:38 AM, Derrick Oswald < >>> der...@gm...> wrote: >>> >>>> You should be able to create a filter that finds all TR nodes that have >>>> TD child nodes with the class="prod" attribute. >>>> See the FilterBuilder application. >>>> >>>> >>>> On Tue, Apr 13, 2010 at 5:55 AM, S Ahmed <sah...@gm...> wrote: >>>> >>>>> Hi, >>>>> >>>>> Sending my question here, the user mailing list seems to be filled with >>>>> spam? >>>>> >>>>> I have a HTML page, and part of the page that I want to focus on looks >>>>> like: >>>>> >>>>> >>>>> <table> >>>>> >>>>> <tr><td class="prod">.... >>>>> </tr> >>>>> >>>>> <tr><td class="prod">.... >>>>> </tr> >>>>> >>>>> <tr><td class="prod">.... >>>>> </tr> >>>>> >>>>> <tr><td class="prod">.... >>>>> </tr> >>>>> >>>>> </tabe> >>>>> >>>>> >>>>> So I want to extract all the <tr>. >>>>> >>>>> I have used tmlParser.extractAllNodesThatMatch(...) in the past, but in >>>>> this case it seems the only way to get a NodeList of all the <tr> groupings >>>>> in this table is to use the value from the 1st <td> in each <tr> that has a >>>>> class of 'Prod". >>>>> >>>>> (class="prod" is unique to the entire HTML page). >>>>> >>>>> Is this possible to do? >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Download Intel® Parallel Studio Eval >>>>> Try the new software tools for yourself. Speed compiling, find bugs >>>>> proactively, and fine-tune applications for parallel performance. >>>>> See why Intel Parallel Studio got high marks during beta. >>>>> http://p.sf.net/sfu/intel-sw-dev >>>>> _______________________________________________ >>>>> Htmlparser-developer mailing list >>>>> Htm...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Download Intel® Parallel Studio Eval >>>> Try the new software tools for yourself. Speed compiling, find bugs >>>> proactively, and fine-tune applications for parallel performance. >>>> See why Intel Parallel Studio got high marks during beta. >>>> http://p.sf.net/sfu/intel-sw-dev >>>> _______________________________________________ >>>> Htmlparser-developer mailing list >>>> Htm...@li... >>>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> _______________________________________________ >>> Htmlparser-developer mailing list >>> Htm...@li... >>> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> Htmlparser-developer mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-developer >> >> > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Htmlparser-developer mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-developer > > |