Thread: Re: [Htmlparser-user] Help on extracting clean body content from web page

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

You probably want the StringBean.
The main() method of StringBean is an example of its use.

----- Original Message ----
From: cash cash <ca...@ya...>
To: htm...@li...
Sent: Tuesday, November 13, 2007 1:07:33 AM
Subject: [Htmlparser-user] Help on extracting clean body content from web page

Hi all,

I am new to htmlparser. have download it and tried a few examples.
 However, i am having problem knowing the" correct way" to achieve my goal.
 I'm looking for a way to extract body content from web page, exclude
 all script sections.

For example, using the following text

<html>
<head><title>title</title>
<style>
css style
</style>
</head>

<body>
Hello world

<?php
phpinfo()
?>
</body>

The correct code should only extract Hello world.

Can any one help me on this?

Thanks in advance.

  ____________________________________________________________________________________
Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.
  http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user