Re: [Htmlparser-user] Web Crawler Thesis Project Using HTML Parser To collect links

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Have a look at org.htmlparser.beans.HTMLLinkBean<http://htmlparser.sourceforge.net/javadoc/index.html>

At the bottom of the source
file<http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/beans/HTMLLinkBean.java?revision=4&view=markup>is
a commented out main program to get you started.

On Fri, Aug 21, 2009 at 7:42 PM, Neftali Papelleras <
pap...@ya...> wrote:

> Hi everyone.
>
> I am Neftali Papelleras, an Engineering student from University of San
> Carlos, Cebu City, Philippines. I am currently having my thesis project
> which involves web crawling. The title of my project is A Web Extraction
> Tool to Monitor Websites and is implemented in Java. I am still on the first
> month of this one-year thesis project, and still on the information
> gathering stage.
>
> The first question I need to answer is how to create a Java-based web
> crawler. And next is how to retrieve the the web contents on every web page.
> And lastly, how to retrieve links from a given web source. First thing came
> to my mind was to use Java RegEx to retrieve the links given a web source.
> But now I understand it's not the right way to do it. And that's why I came
> to HTML Parser, because I knew this is the right way.
>
> I know Java but not on advanced level, I just know the concept. Though I
> have created several programs already, last was a chat system, I am still
> not confident with my skills on Java. But I am very much eager to learn and
> I am starting now, again.
>
> I have already downloaded the 1.6 version of HTML Parser and have browsed
> on different folders and files. I attempted to create a very simple parser
> program using the HTML Parser API, but unfortunately I was confused where to
> and how to start. I am hoping that this organization can provide a simple
> program that illustrates how to retrieve a link given a  web page
> source/html text. I can follow through the program and eventually lead me to
> the understanding of using this API.
>
> Looking forward for a good response from this organization.
>
> Respectfully,
> neftali
>
> ------------------------------
>  Have a new Yahoo! Mail account?<http://us.rd.yahoo.com/SIG=11dea1p2c/**http%3A%2F%2Fwww.trueswitch.com%2Fyahoo-ph>
> Kick start your journey by importing all your contacts!
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>