Re: [Htmlparser-user] Htmlparser-user Digest, Vol 35, Issue 4
Brought to you by:
derrickoswald
From: Neftali P. <pap...@ya...> - 2009-08-22 00:59:41
|
Good Day! I just woke up,8:30 in the morning. I'm very glad got a reply from this organization already with very helpful information. I will look at this later this morning as I will have a seminar to attend to at university. Thank you very much! i really appreciated this help :) I will check on here from time to time if I get hung up on a problem regarding the topic. Respectfully, neftali ________________________________ From: "htm...@li..." <htm...@li...> To: htm...@li... Sent: Saturday, August 22, 2009 4:56:24 AM Subject: Htmlparser-user Digest, Vol 35, Issue 4 Send Htmlparser-user mailing list submissions to htm...@li... To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/htmlparser-user or, via email, send a message with subject or body 'help' to htm...@li...urceforge..net You can reach the person managing the list at htm...@li... When replying, please edit your Subject line so it is more specific than "Re: Contents of Htmlparser-user digest...." Today's Topics: 1. Need Suggestions to get Started in HTML parsing (tamizh vendan) 2. Re: Need Suggestions to get Started in HTML parsing (Derrick Oswald) 3. Web Crawler Thesis Project Using HTML Parser To collect links (Neftali Papelleras) 4.. Web Crawler Thesis Project Using HTML Parser To collect links (Neftali Papelleras) 5. Re: Web Crawler Thesis Project Using HTML Parser To collect links (Derrick Oswald) ---------------------------------------------------------------------- Message: 1 Date: Wed, 19 Aug 2009 20:42:04 +0530 From: tamizh vendan <tam...@gm...> Subject: [Htmlparser-user] Need Suggestions to get Started in HTML parsing To: htm...@li... Message-ID: <b98...@ma...> Content-Type: text/plain; charset="iso-8859-1" I am newbie to HTML parsing. I knew both Java and HTML well. I would like to construct a DOM tree from the HTML coding of a Webpage. It would be helpful for me if someone specify how to get started and kindly provide some tutorial or article links. Provide Sample programs if possible.. Thanks in advance.. -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 2 Date: Wed, 19 Aug 2009 19:18:39 +0200 From: Derrick Oswald <der...@gm...> Subject: Re: [Htmlparser-user] Need Suggestions to get Started in HTML parsing To: htmlparser user list <htm...@li...> Message-ID: <16a...@ma...> Content-Type: text/plain; charset="iso-8859-1" Have a look at the mainline in Parser.java: http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/Parser.java?revision=8&view=markup That program prints it out, but the results of parser.Parse (filter) is a NodeList which is your (nested) dom tree. Also have a look for other main methods in the code. On Wed, Aug 19, 2009 at 5:12 PM, tamizh vendan <tam...@gm...> wrote: > > I am newbie to HTML parsing.. I knew both Java and HTML well. I would like > to construct a DOM tree from the HTML coding of a Webpage. It would be > helpful for me if someone specify how to get started and kindly provide some > tutorial or article links. Provide Sample programs if possible.. Thanks in > advance.. > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 3 Date: Fri, 21 Aug 2009 10:40:19 -0700 (PDT) From: Neftali Papelleras <pap...@ya...> Subject: [Htmlparser-user] Web Crawler Thesis Project Using HTML Parser To collect links To: htm...@li... Cc: pap...@ya... Message-ID: <661...@we...> Content-Type: text/plain; charset="utf-8" Hi everyone. I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage. The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way. I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again. I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API. Looking forward for a good response from this organization. Respectfully, neftali Surf faster. Internet Explorer 8 optmized for Yahoo! auto launches 2 of your favorite pages everytime you open your browser. Get IE8 here! http://downloads.yahoo.com/sg/internetexplorer/ -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 4 Date: Fri, 21 Aug 2009 10:42:32 -0700 (PDT) From: Neftali Papelleras <pap...@ya...> Subject: [Htmlparser-user] Web Crawler Thesis Project Using HTML Parser To collect links To: htm...@li... Cc: pap...@ya... Message-ID: <269...@we...> Content-Type: text/plain; charset="utf-8" Hi everyone. I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage. The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way. I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again. I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API. Looking forward for a good response from this organization. Respectfully, neftali Design your own exclusive Pingbox today! It's easy to create your personal chat space on your blogs. http://ph.messenger.yahoo.com/pingbox -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 5 Date: Fri, 21 Aug 2009 22:56:14 +0200 From: Derrick Oswald <der...@gm...> Subject: Re: [Htmlparser-user] Web Crawler Thesis Project Using HTML Parser To collect links To: htmlparser user list <htm...@li...> Message-ID: <16a...@ma...> Content-Type: text/plain; charset="iso-8859-1" Have a look at org.htmlparser.beans.HTMLLinkBean<http://htmlparser.sourceforge.net/javadoc/index.html> At the bottom of the source file<http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/beans/HTMLLinkBean.java?revision=4&view=markup>is a commented out main program to get you started. On Fri, Aug 21, 2009 at 7:42 PM, Neftali Papelleras < pap...@ya...> wrote: > Hi everyone. > > I am Neftali Papelleras, an Engineering student from University of San > Carlos, Cebu City, Philippines. I am currently having my thesis project > which involves web crawling. The title of my project is A Web Extraction > Tool to Monitor Websites and is implemented in Java. I am still on the first > month of this one-year thesis project, and still on the information > gathering stage. > > The first question I need to answer is how to create a Java-based web > crawler. And next is how to retrieve the the web contents on every web page. > And lastly, how to retrieve links from a given web source. First thing came > to my mind was to use Java RegEx to retrieve the links given a web source. > But now I understand it's not the right way to do it. And that's why I came > to HTML Parser, because I knew this is the right way. > > I know Java but not on advanced level, I just know the concept. Though I > have created several programs already, last was a chat system, I am still > not confident with my skills on Java. But I am very much eager to learn and > I am starting now, again. > > I have already downloaded the 1.6 version of HTML Parser and have browsed > on different folders and files. I attempted to create a very simple parser > program using the HTML Parser API, but unfortunately I was confused where to > and how to start. I am hoping that this organization can provide a simple > program that illustrates how to retrieve a link given a web page > source/html text. I can follow through the program and eventually lead me to > the understanding of using this API. > > Looking forward for a good response from this organization. > > Respectfully, > neftali > > ------------------------------ > Have a new Yahoo! Mail account?<http://us.rd.yahoo.com/SIG=11dea1p2c/**http%3A%2F%2Fwww.trueswitch.com%2Fyahoo-ph> > Kick start your journey by importing all your contacts! > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ------------------------------ _______________________________________________ Htmlparser-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlparser-user End of Htmlparser-user Digest, Vol 35, Issue 4 ********************************************** Cleaner, Better, Faster - Experience the new Faster Yahoo! Mail today at http://ph.mail.yahoo.com |