question on extracting text

Brought to you by: derrickoswald

#63 question on extracting text

Status: open

Owner: nobody

Labels: Programming Problem (39)

Priority: 5

Updated: 2008-08-02

Created: 2008-08-02

Creator: westofpluto

Private: No

Hello
I'm trying to understand how to use htmlparser to do a specific set of tasks as follows.
Suppose I have a URL like http://whatever.
I want to do the following tasks:
1) Extract the text (ie remove HTML tags) from the <BODY>..</BODY> portion of the HTML document
2) Extract the TITLE from the HEAD element
3) Extract the content from the META KEYWORDS, and META DESCRIPTION tags

These seem like they would be very common tasks, but I'm finding it very difficult to dig through the code to see how to do this. Does anyone have any code snippets that would work for this?

Thanks
Marc
marc@westofpluto.com

question on extracting text

Group

Searches

Help

#63 question on extracting text

Discussion