Thread: [Htmlparser-user] finding meta data

Brought to you by: derrickoswald

htmlparser-user

[Htmlparser-user] finding meta data

From: kavorka <the...@gm...> - 2006-07-19 11:44:23

Hi all,
I'm new to HTML-parser. I used sample programs to understand how can i find
the meta data of the page but i could't use it. Do you have any code samples
that finds meta data of the page using HTMLparser.
Thank you
best regards

Re: [Htmlparser-user] finding meta data

From: Derrick O. <Der...@Ro...> - 2006-07-24 03:25:32

Kavorka,

This should give you the meta tag, from which you can get the 
information you want:

NodeList nodes = parser.parse (null);
NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter 
("META"));
MetaTag meta = (MetaTag)metas.elementAt (0);
System.out.println (meta);

Derrick

kavorka wrote:

> Hi all,
> I'm new to HTML-parser. I used sample programs to understand how can i 
> find the meta data of the page but i could't use it. Do you have any 
> code samples that finds meta data of the page using HTMLparser.
> Thank you
> best regards
>

Re: [Htmlparser-user] finding meta data

From: kavorka <the...@gm...> - 2006-07-25 08:49:52

Hi Oswald,

Thanks a lot for your help.

Murat


On 7/24/06, Derrick Oswald <Der...@ro...> wrote:
>
> Kavorka,
>
> This should give you the meta tag, from which you can get the
> information you want:
>
> NodeList nodes = parser.parse (null);
> NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter
> ("META"));
> MetaTag meta = (MetaTag)metas.elementAt (0);
> System.out.println (meta);
>
> Derrick
>
> kavorka wrote:
>
> > Hi all,
> > I'm new to HTML-parser. I used sample programs to understand how can i
> > find the meta data of the page but i could't use it. Do you have any
> > code samples that finds meta data of the page using HTMLparser.
> > Thank you
> > best regards
> >
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

Re: [Htmlparser-user] finding meta data

From: kavorka <the...@gm...> - 2006-07-28 20:53:56

Hi Oswald,
I have another question. In HTMLPARSER, is it possible to extract only the
text in the webpage. In the stringextractor program, it extract also link
text in the page, i want to extract "pure" text. can i do it?
thanks
Murat


On 7/25/06, kavorka <the...@gm...> wrote:
>
>  Hi Oswald,
>
> Thanks a lot for your help.
>
> Murat
>
>
>  On 7/24/06, Derrick Oswald <Der...@ro...> wrote:
> >
> > Kavorka,
> >
> > This should give you the meta tag, from which you can get the
> > information you want:
> >
> > NodeList nodes = parser.parse (null);
> > NodeList metas = nodes.extractAllNodesThatMatch (new TagNameFilter
> > ("META"));
> > MetaTag meta = (MetaTag)metas.elementAt (0);
> > System.out.println (meta);
> >
> > Derrick
> >
> > kavorka wrote:
> >
> > > Hi all,
> > > I'm new to HTML-parser. I used sample programs to understand how can i
> > > find the meta data of the page but i could't use it. Do you have any
> > > code samples that finds meta data of the page using HTMLparser.
> > > Thank you
> > > best regards
> > >
> >
> >
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share
> > your
> > opinions on IT & business topics through brief surveys -- and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> >
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
>
>

Re: [Htmlparser-user] finding meta data

From: Derrick O. <Der...@Ro...> - 2006-07-29 11:14:28

Murat,

I'm not sure what you mean by 'pure' text.
The stringextractor program uses the StringBean under the hood.
It only collects text which would be presented in a browser - or at 
least it's supposed to.
The stringextractor program has an option (-links) to output the links 
within angle brackets. Make sure this is not used.
If you want to remove text within <a></a> pairs you will need to 
override the default LinkTag to not do this and register it with the 
PrototypicalNodeFactory.

Derrick

kavorka wrote:

> Hi Oswald,
> I have another question. In HTMLPARSER, is it possible to extract only 
> the text in the webpage. In the stringextractor program, it extract 
> also link text in the page, i want to extract "pure" text. can i do it?
> thanks
> Murat
>

Re: [Htmlparser-user] finding meta data

From: kavorka <the...@gm...> - 2006-07-29 13:07:11

Hi Oswald,
Yes i want to remove text within <a></a>. i'll try to do what you have said,
but
i'm a newbie java coder i didnt understand what you have said clearly. I
tried to override
linkTAg to not to take text <a></a> now myLinkTag doesnt find links. but now
how can i take
text other that <a></a>.
if i ask to much, i'm sorry.
thanks a lot
murat


On 7/29/06, Derrick Oswald <Der...@ro...> wrote:
>
> Murat,
>
> I'm not sure what you mean by 'pure' text.
> The stringextractor program uses the StringBean under the hood.
> It only collects text which would be presented in a browser - or at
> least it's supposed to.
> The stringextractor program has an option (-links) to output the links
> within angle brackets. Make sure this is not used.
> If you want to remove text within <a></a> pairs you will need to
> override the default LinkTag to not do this and register it with the
> PrototypicalNodeFactory.
>
> Derrick
>
> kavorka wrote:
>
> > Hi Oswald,
> > I have another question. In HTMLPARSER, is it possible to extract only
> > the text in the webpage. In the stringextractor program, it extract
> > also link text in the page, i want to extract "pure" text. can i do it?
> > thanks
> > Murat
> >
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
> your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

Re: [Htmlparser-user] finding meta data

From: Derrick O. <Der...@Ro...> - 2006-07-30 12:12:21

Kavorka,

Maybe if you just want to remove the whole link, use something like:
   getParent ().getChildren ().remove (this);
in the doSemanticAction() override of your custom LinkTag class.
That will remove the current link tag from the enclosing parent tag by 
altering the children list.

Derrick

kavorka wrote:

> Hi Oswald,
> Yes i want to remove text within <a></a>. i'll try to do what you have 
> said, but 
> i'm a newbie java coder i didnt understand what you have said clearly. 
> I tried to override
> linkTAg to not to take text <a></a> now myLinkTag doesnt find links. 
> but now how can i take
> text other that <a></a>.
> if i ask to much, i'm sorry.
> thanks a lot
> murat
>
>  
> On 7/29/06, *Derrick Oswald* <Der...@ro... 
> <mailto:Der...@ro...>> wrote:
>
>     Murat,
>
>     I'm not sure what you mean by 'pure' text.
>     The stringextractor program uses the StringBean under the hood.
>     It only collects text which would be presented in a browser - or at
>     least it's supposed to.
>     The stringextractor program has an option (-links) to output the links
>     within angle brackets. Make sure this is not used.
>     If you want to remove text within <a></a> pairs you will need to
>     override the default LinkTag to not do this and register it with the
>     PrototypicalNodeFactory.
>
>     Derrick
>
>     kavorka wrote:
>
>     > Hi Oswald,
>     > I have another question. In HTMLPARSER, is it possible to
>     extract only
>     > the text in the webpage. In the stringextractor program, it extract
>     > also link text in the page, i want to extract "pure" text. can i
>     do it?
>     > thanks
>     > Murat
>     >
>
>
>     -------------------------------------------------------------------------
>     Take Surveys. Earn Cash. Influence the Future of IT
>     Join SourceForge.net's Techsay panel and you'll get the chance to
>     share your
>     opinions on IT & business topics through brief surveys -- and earn
>     cash
>     http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>     <http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV>
>     _______________________________________________
>     Htmlparser-user mailing list
>     Htm...@li...
>     <mailto:Htm...@li...>
>     https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>------------------------------------------------------------------------
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys -- and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>  
>