question on extracting text
Brought to you by:
derrickoswald
Hello
I'm trying to understand how to use htmlparser to do a specific set of tasks as follows.
Suppose I have a URL like http://whatever.
I want to do the following tasks:
1) Extract the text (ie remove HTML tags) from the <BODY>..</BODY> portion of the HTML document
2) Extract the TITLE from the HEAD element
3) Extract the content from the META KEYWORDS, and META DESCRIPTION tags
These seem like they would be very common tasks, but I'm finding it very difficult to dig through the code to see how to do this. Does anyone have any code snippets that would work for this?
Thanks
Marc
marc@westofpluto.com