Thread: [Htmlparser-user] finding meta data
Brought to you by:
derrickoswald
From: kavorka <the...@gm...> - 2006-07-19 11:44:23
|
Hi all, I'm new to HTML-parser. I used sample programs to understand how can i find the meta data of the page but i could't use it. Do you have any code samples that finds meta data of the page using HTMLparser. Thank you best regards |
From: Derrick O. <Der...@Ro...> - 2006-07-24 03:25:32
|
Kavorka, This should give you the meta tag, from which you can get the information you want: NodeList nodes = parser.parse (null); NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter ("META")); MetaTag meta = (MetaTag)metas.elementAt (0); System.out.println (meta); Derrick kavorka wrote: > Hi all, > I'm new to HTML-parser. I used sample programs to understand how can i > find the meta data of the page but i could't use it. Do you have any > code samples that finds meta data of the page using HTMLparser. > Thank you > best regards > |
From: kavorka <the...@gm...> - 2006-07-25 08:49:52
|
Hi Oswald, Thanks a lot for your help. Murat On 7/24/06, Derrick Oswald <Der...@ro...> wrote: > > Kavorka, > > This should give you the meta tag, from which you can get the > information you want: > > NodeList nodes = parser.parse (null); > NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter > ("META")); > MetaTag meta = (MetaTag)metas.elementAt (0); > System.out.println (meta); > > Derrick > > kavorka wrote: > > > Hi all, > > I'm new to HTML-parser. I used sample programs to understand how can i > > find the meta data of the page but i could't use it. Do you have any > > code samples that finds meta data of the page using HTMLparser. > > Thank you > > best regards > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: kavorka <the...@gm...> - 2006-07-28 20:53:56
|
Hi Oswald, I have another question. In HTMLPARSER, is it possible to extract only the text in the webpage. In the stringextractor program, it extract also link text in the page, i want to extract "pure" text. can i do it? thanks Murat On 7/25/06, kavorka <the...@gm...> wrote: > > Hi Oswald, > > Thanks a lot for your help. > > Murat > > > On 7/24/06, Derrick Oswald <Der...@ro...> wrote: > > > > Kavorka, > > > > This should give you the meta tag, from which you can get the > > information you want: > > > > NodeList nodes = parser.parse (null); > > NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter > > ("META")); > > MetaTag meta = (MetaTag)metas.elementAt (0); > > System.out.println (meta); > > > > Derrick > > > > kavorka wrote: > > > > > Hi all, > > > I'm new to HTML-parser. I used sample programs to understand how can i > > > find the meta data of the page but i could't use it. Do you have any > > > code samples that finds meta data of the page using HTMLparser. > > > Thank you > > > best regards > > > > > > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share > > your > > opinions on IT & business topics through brief surveys -- and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > |
From: Derrick O. <Der...@Ro...> - 2006-07-29 11:14:28
|
Murat, I'm not sure what you mean by 'pure' text. The stringextractor program uses the StringBean under the hood. It only collects text which would be presented in a browser - or at least it's supposed to. The stringextractor program has an option (-links) to output the links within angle brackets. Make sure this is not used. If you want to remove text within <a></a> pairs you will need to override the default LinkTag to not do this and register it with the PrototypicalNodeFactory. Derrick kavorka wrote: > Hi Oswald, > I have another question. In HTMLPARSER, is it possible to extract only > the text in the webpage. In the stringextractor program, it extract > also link text in the page, i want to extract "pure" text. can i do it? > thanks > Murat > |
From: kavorka <the...@gm...> - 2006-07-29 13:07:11
|
Hi Oswald, Yes i want to remove text within <a></a>. i'll try to do what you have said, but i'm a newbie java coder i didnt understand what you have said clearly. I tried to override linkTAg to not to take text <a></a> now myLinkTag doesnt find links. but now how can i take text other that <a></a>. if i ask to much, i'm sorry. thanks a lot murat On 7/29/06, Derrick Oswald <Der...@ro...> wrote: > > Murat, > > I'm not sure what you mean by 'pure' text. > The stringextractor program uses the StringBean under the hood. > It only collects text which would be presented in a browser - or at > least it's supposed to. > The stringextractor program has an option (-links) to output the links > within angle brackets. Make sure this is not used. > If you want to remove text within <a></a> pairs you will need to > override the default LinkTag to not do this and register it with the > PrototypicalNodeFactory. > > Derrick > > kavorka wrote: > > > Hi Oswald, > > I have another question. In HTMLPARSER, is it possible to extract only > > the text in the webpage. In the stringextractor program, it extract > > also link text in the page, i want to extract "pure" text. can i do it? > > thanks > > Murat > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share > your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Derrick O. <Der...@Ro...> - 2006-07-30 12:12:21
|
Kavorka, Maybe if you just want to remove the whole link, use something like: getParent ().getChildren ().remove (this); in the doSemanticAction() override of your custom LinkTag class. That will remove the current link tag from the enclosing parent tag by altering the children list. Derrick kavorka wrote: > Hi Oswald, > Yes i want to remove text within <a></a>. i'll try to do what you have > said, but > i'm a newbie java coder i didnt understand what you have said clearly. > I tried to override > linkTAg to not to take text <a></a> now myLinkTag doesnt find links. > but now how can i take > text other that <a></a>. > if i ask to much, i'm sorry. > thanks a lot > murat > > > On 7/29/06, *Derrick Oswald* <Der...@ro... > <mailto:Der...@ro...>> wrote: > > Murat, > > I'm not sure what you mean by 'pure' text. > The stringextractor program uses the StringBean under the hood. > It only collects text which would be presented in a browser - or at > least it's supposed to. > The stringextractor program has an option (-links) to output the links > within angle brackets. Make sure this is not used. > If you want to remove text within <a></a> pairs you will need to > override the default LinkTag to not do this and register it with the > PrototypicalNodeFactory. > > Derrick > > kavorka wrote: > > > Hi Oswald, > > I have another question. In HTMLPARSER, is it possible to > extract only > > the text in the webpage. In the stringextractor program, it extract > > also link text in the page, i want to extract "pure" text. can i > do it? > > thanks > > Murat > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys -- and earn > cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV> > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > <mailto:Htm...@li...> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > >------------------------------------------------------------------------ > >------------------------------------------------------------------------- >Take Surveys. Earn Cash. Influence the Future of IT >Join SourceForge.net's Techsay panel and you'll get the chance to share your >opinions on IT & business topics through brief surveys -- and earn cash >http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > >------------------------------------------------------------------------ > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |