Menu

#63 question on extracting text

open
nobody
5
2008-08-02
2008-08-02
westofpluto
No

Hello
I'm trying to understand how to use htmlparser to do a specific set of tasks as follows.
Suppose I have a URL like http://whatever.
I want to do the following tasks:
1) Extract the text (ie remove HTML tags) from the <BODY>..</BODY> portion of the HTML document
2) Extract the TITLE from the HEAD element
3) Extract the content from the META KEYWORDS, and META DESCRIPTION tags

These seem like they would be very common tasks, but I'm finding it very difficult to dig through the code to see how to do this. Does anyone have any code snippets that would work for this?

Thanks
Marc
marc@westofpluto.com

Discussion


Log in to post a comment.