Re: [Htmlparser-user] Web Crawler Thesis Project Using HTML Parser To collect links
Brought to you by:
derrickoswald
From: Derrick O. <der...@gm...> - 2009-08-21 20:56:24
|
Have a look at org.htmlparser.beans.HTMLLinkBean<http://htmlparser.sourceforge.net/javadoc/index.html> At the bottom of the source file<http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/beans/HTMLLinkBean.java?revision=4&view=markup>is a commented out main program to get you started. On Fri, Aug 21, 2009 at 7:42 PM, Neftali Papelleras < pap...@ya...> wrote: > Hi everyone. > > I am Neftali Papelleras, an Engineering student from University of San > Carlos, Cebu City, Philippines. I am currently having my thesis project > which involves web crawling. The title of my project is A Web Extraction > Tool to Monitor Websites and is implemented in Java. I am still on the first > month of this one-year thesis project, and still on the information > gathering stage. > > The first question I need to answer is how to create a Java-based web > crawler. And next is how to retrieve the the web contents on every web page. > And lastly, how to retrieve links from a given web source. First thing came > to my mind was to use Java RegEx to retrieve the links given a web source. > But now I understand it's not the right way to do it. And that's why I came > to HTML Parser, because I knew this is the right way. > > I know Java but not on advanced level, I just know the concept. Though I > have created several programs already, last was a chat system, I am still > not confident with my skills on Java. But I am very much eager to learn and > I am starting now, again. > > I have already downloaded the 1.6 version of HTML Parser and have browsed > on different folders and files. I attempted to create a very simple parser > program using the HTML Parser API, but unfortunately I was confused where to > and how to start. I am hoping that this organization can provide a simple > program that illustrates how to retrieve a link given a web page > source/html text. I can follow through the program and eventually lead me to > the understanding of using this API. > > Looking forward for a good response from this organization. > > Respectfully, > neftali > > ------------------------------ > Have a new Yahoo! Mail account?<http://us.rd.yahoo.com/SIG=11dea1p2c/**http%3A%2F%2Fwww.trueswitch.com%2Fyahoo-ph> > Kick start your journey by importing all your contacts! > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |