htmlparser-user Mailing List for HTML Parser (Page 15)
Brought to you by:
derrickoswald
You can subscribe to this list here.
| 2001 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2002 |
Jan
(7) |
Feb
|
Mar
(9) |
Apr
(50) |
May
(20) |
Jun
(47) |
Jul
(37) |
Aug
(32) |
Sep
(30) |
Oct
(11) |
Nov
(37) |
Dec
(47) |
| 2003 |
Jan
(31) |
Feb
(70) |
Mar
(67) |
Apr
(34) |
May
(66) |
Jun
(25) |
Jul
(48) |
Aug
(43) |
Sep
(58) |
Oct
(25) |
Nov
(10) |
Dec
(25) |
| 2004 |
Jan
(38) |
Feb
(17) |
Mar
(24) |
Apr
(25) |
May
(11) |
Jun
(6) |
Jul
(24) |
Aug
(42) |
Sep
(13) |
Oct
(17) |
Nov
(13) |
Dec
(44) |
| 2005 |
Jan
(10) |
Feb
(16) |
Mar
(16) |
Apr
(23) |
May
(6) |
Jun
(19) |
Jul
(39) |
Aug
(15) |
Sep
(40) |
Oct
(49) |
Nov
(29) |
Dec
(41) |
| 2006 |
Jan
(28) |
Feb
(24) |
Mar
(52) |
Apr
(41) |
May
(31) |
Jun
(34) |
Jul
(22) |
Aug
(12) |
Sep
(11) |
Oct
(11) |
Nov
(11) |
Dec
(4) |
| 2007 |
Jan
(39) |
Feb
(13) |
Mar
(16) |
Apr
(24) |
May
(13) |
Jun
(12) |
Jul
(21) |
Aug
(61) |
Sep
(31) |
Oct
(13) |
Nov
(32) |
Dec
(15) |
| 2008 |
Jan
(7) |
Feb
(8) |
Mar
(14) |
Apr
(12) |
May
(23) |
Jun
(20) |
Jul
(9) |
Aug
(6) |
Sep
(2) |
Oct
(7) |
Nov
(3) |
Dec
(2) |
| 2009 |
Jan
(5) |
Feb
(8) |
Mar
(10) |
Apr
(22) |
May
(85) |
Jun
(82) |
Jul
(45) |
Aug
(28) |
Sep
(26) |
Oct
(50) |
Nov
(8) |
Dec
(16) |
| 2010 |
Jan
(3) |
Feb
(11) |
Mar
(39) |
Apr
(56) |
May
(80) |
Jun
(64) |
Jul
(49) |
Aug
(48) |
Sep
(16) |
Oct
(3) |
Nov
(5) |
Dec
(5) |
| 2011 |
Jan
(13) |
Feb
|
Mar
(1) |
Apr
(7) |
May
(7) |
Jun
(7) |
Jul
(7) |
Aug
(8) |
Sep
|
Oct
(6) |
Nov
(2) |
Dec
|
| 2012 |
Jan
(5) |
Feb
|
Mar
(3) |
Apr
(3) |
May
(4) |
Jun
(8) |
Jul
(1) |
Aug
(5) |
Sep
(10) |
Oct
(3) |
Nov
(2) |
Dec
(4) |
| 2013 |
Jan
(4) |
Feb
(2) |
Mar
(7) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
| 2014 |
Jan
|
Feb
(2) |
Mar
(1) |
Apr
|
May
(3) |
Jun
(1) |
Jul
|
Aug
|
Sep
(1) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
| 2015 |
Jan
(4) |
Feb
(2) |
Mar
(8) |
Apr
(7) |
May
(6) |
Jun
(7) |
Jul
(3) |
Aug
(1) |
Sep
(1) |
Oct
(4) |
Nov
(3) |
Dec
(4) |
| 2016 |
Jan
(4) |
Feb
(6) |
Mar
(9) |
Apr
(9) |
May
(6) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(1) |
Nov
(1) |
Dec
(1) |
| 2017 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
(1) |
May
|
Jun
(1) |
Jul
(2) |
Aug
(3) |
Sep
(6) |
Oct
(3) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
(3) |
Feb
(13) |
Mar
(28) |
Apr
(5) |
May
(4) |
Jun
(2) |
Jul
(2) |
Aug
(8) |
Sep
(2) |
Oct
(1) |
Nov
(5) |
Dec
(1) |
| 2019 |
Jan
(8) |
Feb
(1) |
Mar
|
Apr
(1) |
May
(4) |
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
(2) |
| 2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
(1) |
Dec
(1) |
| 2021 |
Jan
(3) |
Feb
(2) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
(1) |
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2023 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
|
From: Derrick O. <der...@gm...> - 2009-10-09 04:48:06
|
The Page class maintains this type of information. When the source is exhausted it could record the cursor position. I dont think it does now. The PageIndex has the position of each end-of-line except possibly the last.I think that adding one for the end-of-file wouldn't hurt. Then the size (in characters) would be that last cursor position. The size in bytes depends on the encoding, as mentioned here.. http://htmlparser.sourceforge.net/faq.html#byte On Fri, Oct 9, 2009 at 1:47 AM, Graham Bentley < gra...@wh...> wrote: > Incidentally, I was passing the html to the parser as a string as I > wanted to know the size of the page in bytes as I couldnt see > how to get this from the htmlparser if passing in an inputstream - and > cant rely on the contenttype header as sometimes is missing. > How do I get the compressed and or uncompressed size of the stream - is > this possible or is it a feature that could be added ? > regards, > Graham > > htm...@li... wrote: > > Send Htmlparser-user mailing list submissions to > > htm...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > or, via email, send a message with subject or body 'help' to > > htm...@li... > > > > You can reach the person managing the list at > > htm...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Htmlparser-user digest..." > > > > > > Today's Topics: > > > > 1. [SPAM] Related game (Edmund Kuyzyoj) > > 2. No line numbers if using a string source for parser > > (Graham Bentley) > > 3. Re: No line numbers if using a string source for parser > > (Derrick Oswald) > > 4. [SPAM] No ads, only love (Adelina Qbunu) > > 5. [SPAM] Favorite Brand promotion!! (Evon Pribyl) > > 6. [SPAM] They changed format (Kacie Jsotuv) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sat, 3 Oct 2009 21:01:44 +0200 > > From: "Edmund Kuyzyoj" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] Related game > > To: htm...@li... > > Message-ID: <4728PU.8178EF249.84990311452528BAVFMWIYGTOJIUJ52@bombel> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 2 > > Date: Sun, 04 Oct 2009 04:04:26 +0100 > > From: Graham Bentley <gra...@wh...> > > Subject: [Htmlparser-user] No line numbers if using a string source > > for parser > > To: htm...@li... > > Message-ID: <4AC...@wh...> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Hi there, > > > > Im using the parser to extract links and am passing the html in as a > > string source ie: > > Parser oParser = new Parser(new Lexer(sHTML)); > > However Ive noticed that I can get the starting position of the > > extracted nodes: > > LinkTag.getStartPosition > > but cannot get the starting line number: > > LinkTag.getStartingLineNumber > > is always 0 > > > > it works fine and gives the line number if I pass in an input stream > > from httpurlconnection or the urlconnection itself. > > so a bit confused - is this a bug or is it not possible to get the line > > numbers when using the stringsource ? > > could really do with the line numbers if there is a correct way of doing > > this, thanks. > > > > all working really well apart from that, thanks for the library. > > > > regards, > > Graham > > > > > > > > > > > > ------------------------------ > > > > Message: 3 > > Date: Sun, 4 Oct 2009 07:12:21 +0200 > > From: Derrick Oswald <der...@gm...> > > Subject: Re: [Htmlparser-user] No line numbers if using a string > > source for parser > > To: htmlparser user list <htm...@li...> > > Message-ID: > > <16a...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > Does your input text string contain the newline (0x0A) or carriage > > return-newline (0x0D,0x0A) end of line characters? > > > > On Sun, Oct 4, 2009 at 5:04 AM, Graham Bentley < > > gra...@wh...> wrote: > > > > > >> Hi there, > >> > >> Im using the parser to extract links and am passing the html in as a > >> string source ie: > >> Parser oParser = new Parser(new Lexer(sHTML)); > >> However Ive noticed that I can get the starting position of the > >> extracted nodes: > >> LinkTag.getStartPosition > >> but cannot get the starting line number: > >> LinkTag.getStartingLineNumber > >> is always 0 > >> > >> it works fine and gives the line number if I pass in an input stream > >> from httpurlconnection or the urlconnection itself. > >> so a bit confused - is this a bug or is it not possible to get the line > >> numbers when using the stringsource ? > >> could really do with the line numbers if there is a correct way of doing > >> this, thanks. > >> > >> all working really well apart from that, thanks for the library. > >> > >> regards, > >> Graham > >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Come build with us! The BlackBerry® Developer Conference in SF, CA > >> is the only developer event you need to attend this year. Jumpstart your > >> developing skills, take BlackBerry mobile applications to market and > stay > >> ahead of the curve. Join us from November 9-12, 2009. Register > now! > >> http://p.sf.net/sfu/devconf > >> _______________________________________________ > >> Htmlparser-user mailing list > >> Htm...@li... > >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >> > >> > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 4 > > Date: Mon, 5 Oct 2009 09:53:37 +0300 > > From: "Adelina Qbunu" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] No ads, only love > > To: htm...@li... > > Message-ID: > > <1439ISV.2351639A54.249493786265JRTEXWNFPQTGXVR5453@FELICIA> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 5 > > From: "Evon Pribyl" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] Favorite Brand promotion!! > > To: htm...@li... > > Message-ID: <A9U31139H19T2597.BJDXADOXRY.F03E4C93350C@home> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 6 > > Date: Wed, 7 Oct 2009 13:57:33 +0200 > > From: "Kacie Jsotuv" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] They changed format > > To: htm...@li... > > Message-ID: <6191VIB.969264AE4.2562334509LHHTXBUCUWUSPOR23@UZIVATEL> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > ------------------------------------------------------------------------------ > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > > > ------------------------------ > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > End of Htmlparser-user Digest, Vol 37, Issue 3 > > ********************************************** > > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
|
From: Graham B. <gra...@wh...> - 2009-10-08 23:47:34
|
Incidentally, I was passing the html to the parser as a string as I wanted to know the size of the page in bytes as I couldnt see how to get this from the htmlparser if passing in an inputstream - and cant rely on the contenttype header as sometimes is missing. How do I get the compressed and or uncompressed size of the stream - is this possible or is it a feature that could be added ? regards, Graham htm...@li... wrote: > Send Htmlparser-user mailing list submissions to > htm...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > or, via email, send a message with subject or body 'help' to > htm...@li... > > You can reach the person managing the list at > htm...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Htmlparser-user digest..." > > > Today's Topics: > > 1. [SPAM] Related game (Edmund Kuyzyoj) > 2. No line numbers if using a string source for parser > (Graham Bentley) > 3. Re: No line numbers if using a string source for parser > (Derrick Oswald) > 4. [SPAM] No ads, only love (Adelina Qbunu) > 5. [SPAM] Favorite Brand promotion!! (Evon Pribyl) > 6. [SPAM] They changed format (Kacie Jsotuv) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 3 Oct 2009 21:01:44 +0200 > From: "Edmund Kuyzyoj" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Related game > To: htm...@li... > Message-ID: <4728PU.8178EF249.84990311452528BAVFMWIYGTOJIUJ52@bombel> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Sun, 04 Oct 2009 04:04:26 +0100 > From: Graham Bentley <gra...@wh...> > Subject: [Htmlparser-user] No line numbers if using a string source > for parser > To: htm...@li... > Message-ID: <4AC...@wh...> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi there, > > Im using the parser to extract links and am passing the html in as a > string source ie: > Parser oParser = new Parser(new Lexer(sHTML)); > However Ive noticed that I can get the starting position of the > extracted nodes: > LinkTag.getStartPosition > but cannot get the starting line number: > LinkTag.getStartingLineNumber > is always 0 > > it works fine and gives the line number if I pass in an input stream > from httpurlconnection or the urlconnection itself. > so a bit confused - is this a bug or is it not possible to get the line > numbers when using the stringsource ? > could really do with the line numbers if there is a correct way of doing > this, thanks. > > all working really well apart from that, thanks for the library. > > regards, > Graham > > > > > > ------------------------------ > > Message: 3 > Date: Sun, 4 Oct 2009 07:12:21 +0200 > From: Derrick Oswald <der...@gm...> > Subject: Re: [Htmlparser-user] No line numbers if using a string > source for parser > To: htmlparser user list <htm...@li...> > Message-ID: > <16a...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Does your input text string contain the newline (0x0A) or carriage > return-newline (0x0D,0x0A) end of line characters? > > On Sun, Oct 4, 2009 at 5:04 AM, Graham Bentley < > gra...@wh...> wrote: > > >> Hi there, >> >> Im using the parser to extract links and am passing the html in as a >> string source ie: >> Parser oParser = new Parser(new Lexer(sHTML)); >> However Ive noticed that I can get the starting position of the >> extracted nodes: >> LinkTag.getStartPosition >> but cannot get the starting line number: >> LinkTag.getStartingLineNumber >> is always 0 >> >> it works fine and gives the line number if I pass in an input stream >> from httpurlconnection or the urlconnection itself. >> so a bit confused - is this a bug or is it not possible to get the line >> numbers when using the stringsource ? >> could really do with the line numbers if there is a correct way of doing >> this, thanks. >> >> all working really well apart from that, thanks for the library. >> >> regards, >> Graham >> >> >> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9-12, 2009. Register now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Mon, 5 Oct 2009 09:53:37 +0300 > From: "Adelina Qbunu" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] No ads, only love > To: htm...@li... > Message-ID: > <1439ISV.2351639A54.249493786265JRTEXWNFPQTGXVR5453@FELICIA> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 5 > From: "Evon Pribyl" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Favorite Brand promotion!! > To: htm...@li... > Message-ID: <A9U31139H19T2597.BJDXADOXRY.F03E4C93350C@home> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 6 > Date: Wed, 7 Oct 2009 13:57:33 +0200 > From: "Kacie Jsotuv" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] They changed format > To: htm...@li... > Message-ID: <6191VIB.969264AE4.2562334509LHHTXBUCUWUSPOR23@UZIVATEL> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > > ------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > End of Htmlparser-user Digest, Vol 37, Issue 3 > ********************************************** > |
|
From: Graham B. <gra...@wh...> - 2009-10-07 12:48:23
|
Yes you are correct - I was downloading the page separatley into a String using BufferedReader.ReadLine which was stripping the line terminator off, sorry to bother you, and thanks for your prompt response. Works fine now ! htm...@li... wrote: > Send Htmlparser-user mailing list submissions to > htm...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > or, via email, send a message with subject or body 'help' to > htm...@li... > > You can reach the person managing the list at > htm...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Htmlparser-user digest..." > > > Today's Topics: > > 1. [SPAM] Related game (Edmund Kuyzyoj) > 2. No line numbers if using a string source for parser > (Graham Bentley) > 3. Re: No line numbers if using a string source for parser > (Derrick Oswald) > 4. [SPAM] No ads, only love (Adelina Qbunu) > 5. [SPAM] Favorite Brand promotion!! (Evon Pribyl) > 6. [SPAM] They changed format (Kacie Jsotuv) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 3 Oct 2009 21:01:44 +0200 > From: "Edmund Kuyzyoj" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Related game > To: htm...@li... > Message-ID: <4728PU.8178EF249.84990311452528BAVFMWIYGTOJIUJ52@bombel> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Sun, 04 Oct 2009 04:04:26 +0100 > From: Graham Bentley <gra...@wh...> > Subject: [Htmlparser-user] No line numbers if using a string source > for parser > To: htm...@li... > Message-ID: <4AC...@wh...> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi there, > > Im using the parser to extract links and am passing the html in as a > string source ie: > Parser oParser = new Parser(new Lexer(sHTML)); > However Ive noticed that I can get the starting position of the > extracted nodes: > LinkTag.getStartPosition > but cannot get the starting line number: > LinkTag.getStartingLineNumber > is always 0 > > it works fine and gives the line number if I pass in an input stream > from httpurlconnection or the urlconnection itself. > so a bit confused - is this a bug or is it not possible to get the line > numbers when using the stringsource ? > could really do with the line numbers if there is a correct way of doing > this, thanks. > > all working really well apart from that, thanks for the library. > > regards, > Graham > > > > > > ------------------------------ > > Message: 3 > Date: Sun, 4 Oct 2009 07:12:21 +0200 > From: Derrick Oswald <der...@gm...> > Subject: Re: [Htmlparser-user] No line numbers if using a string > source for parser > To: htmlparser user list <htm...@li...> > Message-ID: > <16a...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Does your input text string contain the newline (0x0A) or carriage > return-newline (0x0D,0x0A) end of line characters? > > On Sun, Oct 4, 2009 at 5:04 AM, Graham Bentley < > gra...@wh...> wrote: > > >> Hi there, >> >> Im using the parser to extract links and am passing the html in as a >> string source ie: >> Parser oParser = new Parser(new Lexer(sHTML)); >> However Ive noticed that I can get the starting position of the >> extracted nodes: >> LinkTag.getStartPosition >> but cannot get the starting line number: >> LinkTag.getStartingLineNumber >> is always 0 >> >> it works fine and gives the line number if I pass in an input stream >> from httpurlconnection or the urlconnection itself. >> so a bit confused - is this a bug or is it not possible to get the line >> numbers when using the stringsource ? >> could really do with the line numbers if there is a correct way of doing >> this, thanks. >> >> all working really well apart from that, thanks for the library. >> >> regards, >> Graham >> >> >> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9-12, 2009. Register now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Mon, 5 Oct 2009 09:53:37 +0300 > From: "Adelina Qbunu" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] No ads, only love > To: htm...@li... > Message-ID: > <1439ISV.2351639A54.249493786265JRTEXWNFPQTGXVR5453@FELICIA> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 5 > From: "Evon Pribyl" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Favorite Brand promotion!! > To: htm...@li... > Message-ID: <A9U31139H19T2597.BJDXADOXRY.F03E4C93350C@home> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 6 > Date: Wed, 7 Oct 2009 13:57:33 +0200 > From: "Kacie Jsotuv" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] They changed format > To: htm...@li... > Message-ID: <6191VIB.969264AE4.2562334509LHHTXBUCUWUSPOR23@UZIVATEL> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > > ------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > End of Htmlparser-user Digest, Vol 37, Issue 3 > ********************************************** > |
|
From: Derrick O. <der...@gm...> - 2009-10-04 05:12:32
|
Does your input text string contain the newline (0x0A) or carriage return-newline (0x0D,0x0A) end of line characters? On Sun, Oct 4, 2009 at 5:04 AM, Graham Bentley < gra...@wh...> wrote: > Hi there, > > Im using the parser to extract links and am passing the html in as a > string source ie: > Parser oParser = new Parser(new Lexer(sHTML)); > However Ive noticed that I can get the starting position of the > extracted nodes: > LinkTag.getStartPosition > but cannot get the starting line number: > LinkTag.getStartingLineNumber > is always 0 > > it works fine and gives the line number if I pass in an input stream > from httpurlconnection or the urlconnection itself. > so a bit confused - is this a bug or is it not possible to get the line > numbers when using the stringsource ? > could really do with the line numbers if there is a correct way of doing > this, thanks. > > all working really well apart from that, thanks for the library. > > regards, > Graham > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
|
From: Graham B. <gra...@wh...> - 2009-10-04 03:34:49
|
Hi there, Im using the parser to extract links and am passing the html in as a string source ie: Parser oParser = new Parser(new Lexer(sHTML)); However Ive noticed that I can get the starting position of the extracted nodes: LinkTag.getStartPosition but cannot get the starting line number: LinkTag.getStartingLineNumber is always 0 it works fine and gives the line number if I pass in an input stream from httpurlconnection or the urlconnection itself. so a bit confused - is this a bug or is it not possible to get the line numbers when using the stringsource ? could really do with the line numbers if there is a correct way of doing this, thanks. all working really well apart from that, thanks for the library. regards, Graham |
|
From: Derrick O. <der...@gm...> - 2009-09-30 20:02:40
|
So, you need an in-memory image of your web page, for that HTMLParser
provides a NodeList.
Get the file, or text string and pass it to the parser and use a null filter
to get everything:
Parser parser = new Parser ();
parser.setResource (...);
NodeList list = parser.Parse (null);
Then you need to find your input fields:
NodeFilter filter = new NodeClassFilter (InputTag.class);
NodeList inputs = list.extractAllNodesThatMatch (filter, true /* recursive
*/);
Now cycle through your list looking at the attributes of the input tags:
for (int i = 0; i < inputs.Length (); i++)
{
InputTag tag = inputs[i];
... tag.getAttribute (String name)
... tag.setAttribute (String key, String value)
}
Then output the whole page:
System.out.println (list.toHtml ());
On Wed, Sep 30, 2009 at 7:39 PM, Ray Jaramillo <ra...@mc...> wrote:
> I am working on a project and am having a bit of trouble figuring out how
> to do what I want with the HTML parser.
> Basically what I want to do is parse a HTML file and find all the text
> boxes. I then want to read the ID attribute of each box and load a value
> into the value attribute and reload the html file.
>
> I am working on a program (Program A) that currently communicates with a
> micro controller. Program A communicates over RS 485 or ethernet. The
> communication part is complete and working. I make requests to the micro for
> information and also send updates to it. Program A currently displays
> information it receives from the micro in a tabular format. I created the
> GUI in Swing to display the information and allow the user to make changes
> to the system.
>
> What I am currently trying to accomplish is this. We would like to display
> the information in a web page with a graphic representation of the HVAC
> system being monitored. The client will produce an HTML page with said
> graphic representation and text boxes that are to be updated with particular
> sensor, setpoint and relay values of the clients choice.The current values
> are to be obtained from the micro . The ID of the sensor, setpoint of relay
> to be updated in each text box will be stored in the input tags ID
> attribute. The HTML page will be stored locally on the clients computer and
> Program A will load the HTML into a web browser when the client hit a button
> in Program A (which is connected to the micro).
>
> Before loading the HTML into a web browser I need Program A to parse the
> HTML and determine which value to load into which text box. Once Program A
> determines the values to be loaded and where to put them, it continually
> sends requests to the micro for the most updated values . Program A then
> updates the text boxes VALUE attribute with the correct value for that text
> box. I do need to continually update the text boxes and display the current
> value of the sensor, etc.
>
> I am having an issue determining how to parse the HTML page to find the
> <input> tags and get the IDs I need as well as updating the VALUE attribute
> of that <input> tag continually from Program A.
>
> I think that about covers it. If you have any questions , feel free to
> email me.
> Thank you in advance,
> Ray
>
> Ray A. Jaramillo
> Software Engineer
> Micro Control Systems
> 5877 Enterprise Parkway
> Fort Myers, Florida 33905
> Phone: (239) 694-0089
> Fax: (239) 694-0031
> Web: www.mcscontrols.com
>
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry® Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9-12, 2009. Register now!
> http://p.sf.net/sfu/devconf
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
|
|
From: Ray J. <ra...@mc...> - 2009-09-30 18:19:13
|
I am working on a project and am having a bit of trouble figuring out how to do what I want with the HTML parser. Basically what I want to do is parse a HTML file and find all the text boxes. I then want to read the ID attribute of each box and load a value into the value attribute and reload the html file. I am working on a program (Program A) that currently communicates with a micro controller. Program A communicates over RS 485 or ethernet. The communication part is complete and working. I make requests to the micro for information and also send updates to it. Program A currently displays information it receives from the micro in a tabular format. I created the GUI in Swing to display the information and allow the user to make changes to the system. What I am currently trying to accomplish is this. We would like to display the information in a web page with a graphic representation of the HVAC system being monitored. The client will produce an HTML page with said graphic representation and text boxes that are to be updated with particular sensor, setpoint and relay values of the clients choice.The current values are to be obtained from the micro . The ID of the sensor, setpoint of relay to be updated in each text box will be stored in the input tags ID attribute. The HTML page will be stored locally on the clients computer and Program A will load the HTML into a web browser when the client hit a button in Program A (which is connected to the micro). Before loading the HTML into a web browser I need Program A to parse the HTML and determine which value to load into which text box. Once Program A determines the values to be loaded and where to put them, it continually sends requests to the micro for the most updated values . Program A then updates the text boxes VALUE attribute with the correct value for that text box. I do need to continually update the text boxes and display the current value of the sensor, etc. I am having an issue determining how to parse the HTML page to find the <input> tags and get the IDs I need as well as updating the VALUE attribute of that <input> tag continually from Program A. I think that about covers it. If you have any questions , feel free to email me. Thank you in advance, Ray Ray A. Jaramillo Software Engineer Micro Control Systems 5877 Enterprise Parkway Fort Myers, Florida 33905 Phone: (239) 694-0089 Fax: (239) 694-0031 Web: <http://www.mcscontrols.com>www.mcscontrols.com |
|
From: Derrick O. <der...@gm...> - 2009-09-27 06:02:51
|
With both jars there, it should work. On Sun, Sep 27, 2009 at 5:24 AM, Rahul Thathoo <rah...@gm...>wrote: > > Hi All, > > My jsp looks like this (starts with this) > > <%@ page import="org.htmlparser.Parser" %> > <%@ page import="org.htmlparser.util.NodeList" %> > <%@ page import="org.htmlparser.util.ParserException" %> > <%@ page import="org.htmlparser.filters.TagNameFilter" %> > <%@ page import="org.htmlparser.filters.NodeClassFilter" %> > > I copied over the htmlparser.jar into my WEB-INF/lib folder > > But i get an exception: > > cannot find symbol > symbol : class NodeList > location: package org.htmlparser.util > import org.htmlparser.util.NodeList; > > cannot find symbol > symbol : class ParserException > location: package org.htmlparser.util > import org.htmlparser.util.ParserException; > > So after that i copied over the htmllexer.jar into my WEB-INF/lib folder as > well, only to get this error after that: > JspServlet: unable to dispatch to requested page: > Exception:java.lang.NoClassDefFoundError: org/htmlparser/Parser > > Anyone got any clue as to where i might be messing up? > > thanks > rahul > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry® Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9-12, 2009. Register now! > http://p.sf.net/sfu/devconf > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
|
From: Rahul T. <rah...@gm...> - 2009-09-27 03:24:27
|
Hi All, My jsp looks like this (starts with this) <%@ page import="org.htmlparser.Parser" %> <%@ page import="org.htmlparser.util.NodeList" %> <%@ page import="org.htmlparser.util.ParserException" %> <%@ page import="org.htmlparser.filters.TagNameFilter" %> <%@ page import="org.htmlparser.filters.NodeClassFilter" %> I copied over the htmlparser.jar into my WEB-INF/lib folder But i get an exception: cannot find symbol symbol : class NodeList location: package org.htmlparser.util import org.htmlparser.util.NodeList; cannot find symbol symbol : class ParserException location: package org.htmlparser.util import org.htmlparser.util.ParserException; So after that i copied over the htmllexer.jar into my WEB-INF/lib folder as well, only to get this error after that: JspServlet: unable to dispatch to requested page: Exception:java.lang.NoClassDefFoundError: org/htmlparser/Parser Anyone got any clue as to where i might be messing up? thanks rahul |
|
From: Caryn M. <htm...@li...> - 2009-09-12 12:45:29
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type" /> <title>BioPharm International</title> <style type="text/css"> <!-- bluelink { color: #0271A6; text-decoration: none; } blacklink { color: #000000; text-decoration: none; } a:hover { text-decoration: underline; } bluelinks { color: #004b94; text-decoration: none; } --> </style> </head> <body> <p><style type="text/css"> <!-- bluelink { color: #0271A6; text-decoration: none; } blacklink { color: #000000; text-decoration: none; } a:hover { text-decoration: underline; } bluelinks { color: #004b94; text-decoration: none; } --> </style></p> <table width="675" cellspacing="0" border="0" align="center" cellpadding="0"> <tbody> <tr> <td> <table width="100%" cellspacing="0" border="0" cellpadding="0"> <tbody> <tr align="left"> <td colspan="2" height="10"> </td> </tr> <tr align="left"> <td><font color="#999999" face="Verdana, Arial, Helvetica, sans-serif" size="1"> BioPharm Bulletin:<br/> Having trouble viewing this e-mail? <span class="bluelinks"><a href="http://cd9c1.ltuxafam.cn/?iwigyro=c0ff05fb52fd0e987f5de3&eaqteqkyn=7099039731980855493882&wrtszbg=pdawljgcgcccpytwot&kcu...@li..." class="bluelinks" target="_blank"> Click here</a></span>. </font></td> <td align="right"><font color="#999999" face="Verdana, Arial, Helvetica, sans-serif" size="1"> You are subscribed to biop_enews_bulletin as htm...@li.... <br/> <a href="http://0e2af.ltuxafam.cn/?ykuletjo=c0ff05fb52fd0e987f5de3&gusufydyi=7099039731980855493882&wrtszbg=pdawljgcgcccpytwot&kcu...@li..." class="bluelinks"> Unsubscribe from this list</a>.</font></td> </tr> <tr align="left"> <td colspan="2" height="10"> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> <table width="675" cellspacing="0" border="1" align="center" cellpadding="0"> <tbody> <tr> <td colspan="2"> <p align="center"> <img src="http://0e2af.ltuxafam.cn/spacer.gif" /></p> </td> </tr> <tr> <td> <table width="100%" cellspacing="0" border="0" cellpadding="0"> <tbody> <tr> <td> <table width="100%" cellspacing="0" border="0" cellpadding="10"> <tbody> <tr> <td> <div align="right"><font size="2"><strong><font color="#0070a6" face="Helvetica, sans-serif, Arial"> September 2009</font></strong></font></div> <p><font color="#0070a6" face="Helvetica, sans-serif, Arial" size="3"><strong> News<br /> </strong></font><br/> <strong> Why do our products are so popular among men? </strong><font face="Helvetica, sans-serif, Arial"><font face="Helvetica, sans-serif, Arial"><font size="2"><br/> We often see testimonials that describe the feeling of primeval power, returned self-respect and worship of satisfied women. Men like these products not only for rock-like "woody", but also for advantages if life you gain because of this condition. <b> <a href="http://a57.ltuxafam.cn/?qlunqgee=c0ff05fb52fd0e987f5de3&qsqbizocy=7099039731980855493882&wrtszbg=pdawljgcgcccpytwot&kcu...@li...">Try now</a></b> and you will like it too!<br /> </font></font></font></p> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> <table width="670" align="center"> <tbody> <tr> <td width="697"> <p align="center"><br/> <font face="Helvetica, sans-serif, Arial" size="1">(c) 2009, Advanstar Communications, Inc.<br/> 224 Phillip Morris Drive, Suite 402<br/> Salisbury, MD 21804 <br/> You are subscribed to biop_enews_bulletin as htm...@li.... To unsubscribe from this list <a href="http://0e2af.ltuxafam.cn/?otopilueco=c0ff05fb52fd0e987f5de3&yqsyfab=7099039731980855493882&wrtszbg=pdawljgcgcccpytwot&kcu...@li..."> click here</a>.<br/> <br/> To ensure delivery to your Inbox, please add our email address to your address book. If you need help doing this, <a href="http://0e2af.ltuxafam.cn/?ifazy=c0ff05fb52fd0e987f5de3&zuibj=7099039731980855493882&wrtszbg=pdawljgcgcccpytwot&kcu...@li..."> click here</a>.</font></p> <p align="center"><font color="#999999" face="Helvetica, sans-serif, Arial" size="1"> Advanstar Communications provides certain customer contact data (such as customers' names, addresses, phone numbers and e-mail addresses) to third parties who wish to promote relevant products, services and other opportunities which may be of interest to you. Contact us by mail at Advanstar Communications Inc., 131 West First St., Duluth, MN 55802-2065, USA.</font></p> </td> </tr> </tbody> </table> </body> </html> |
|
From: Neftali P. <pap...@ya...> - 2009-09-04 13:18:37
|
Yes, that's what I did. I was able to extract the source with the use of URLConnection class in Java. However, I think the HtmlParser still needs this content to begin with for parsing, so I went on tracing where the actual capturing of contents happened so I can just use that and modify some codes a little that will return just the content in String format.
________________________________
From: "htm...@li..." <htm...@li...>
To: htm...@li...
Sent: Monday, August 31, 2009 6:06:48 PM
Subject: Htmlparser-user Digest, Vol 35, Issue 7
Send Htmlparser-user mailing list submissions to
htm...@li...
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
or, via email, send a message with subject or body 'help' to
htm...@li...
You can reach the person managing the list at
htm...@li...
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Htmlparser-user digest..."
Today's Topics:
1. retrieving the source of a webpage - pls help (Neftali Papelleras)
2. Re: retrieving the source of a webpage - pls help (Derrick Oswald)
3. How to Learn HtmlParser ? (tamizh vendan)
4. Missing tools.jar (Lee Goddard)
5. Re: Missing tools.jar (Derrick Oswald)
6. Re: Missing tools.jar (Lee Goddard)
7. html parser (semeera B)
----------------------------------------------------------------------
Message: 1
Date: Fri, 28 Aug 2009 00:43:22 -0700 (PDT)
From: Neftali Papelleras <pap...@ya...>
Subject: [Htmlparser-user] retrieving the source of a webpage - pls
help
To: htm...@li...
Message-ID: <577...@we...>
Content-Type: text/plain; charset="utf-8"
Hi Good Day,
I've been trying to look for a function in this library that can return a string of html text of a web page. I know the java.net.URLConnection can provide me with it, but it's better for me to just use a single function say getHTMLSource that returns the html text of a url.Please let me know if it's possible here and with sample code :) Thanks in advance,
Kind Regards,
nef
New Email addresses available on Yahoo!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/ph/
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 2
Date: Fri, 28 Aug 2009 10:47:56 +0200
From: Derrick Oswald <der...@gm...>
Subject: Re: [Htmlparser-user] retrieving the source of a webpage -
pls help
To: htmlparser user list <htm...@li...>
Message-ID:
<16a...@ma...>
Content-Type: text/plain; charset="iso-8859-1"
You don't need a parser.
Just get the text directly:
URL url
URLConnection con;
InputStream in;
con = url.openConnection ();
con.connect ();
in = con.getInputStream()
then do what you want with the contents.
On Fri, Aug 28, 2009 at 9:43 AM, Neftali Papelleras <
pap...@ya...> wrote:
> Hi Good Day,
>
> I've been trying to look for a function in this library that can return a
> string of html text of a web page. I know the java.net.URLConnection can
> provide me with it, but it's better for me to just use a single function say
> getHTMLSource that returns the html text of a url.Please let me know if it's
> possible here and with sample code :) Thanks in advance,
>
>
>
> Kind Regards,
> nef
>
> start: 0000-00-00 end: 0000-00-00
> ------------------------------
> Feel safer online. Upgrade to the new, safer Internet Explorer 8
> <http://us.lrd.yahoo.com/_ylc=X3oDMTFnNHZxc2k1BHRtX2RtZWNoA1RleHQgTGluawR0bV9sbmsDVTExMDM0NjUEdG1fbmV0A1lhaG9vIQ--/SIG=11k7khaee/**http%3A//downloads.yahoo.com/sg/internetexplorer/>optimized
> for Yahoo! to put your mind at peace. It's free.
> Get IE8 here!<http://us.lrd.yahoo.com/_ylc=X3oDMTFnNHZxc2k1BHRtX2RtZWNoA1RleHQgTGluawR0bV9sbmsDVTExMDM0NjUEdG1fbmV0A1lhaG9vIQ--/SIG=11k7khaee/**http%3A//downloads.yahoo.com/sg/internetexplorer/>
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 3
Date: Sat, 29 Aug 2009 19:20:26 +0530
From: tamizh vendan <tam...@gm...>
Subject: [Htmlparser-user] How to Learn HtmlParser ?
To: htm...@li...
Message-ID:
<b98...@ma...>
Content-Type: text/plain; charset="iso-8859-1"
I am going to do a project based on HtmlParser. So as a first step i
suppose to learn the HtmlParser. I go through the documentation but it is
difficult keep track the context while learning through the documentation.
So could you please provide some tutorials of HtmlParser, so that i can
learn it well.. Thanks in advance..
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 4
Date: Sun, 30 Aug 2009 10:23:53 +0200
From: Lee Goddard <20...@le...>
Subject: [Htmlparser-user] Missing tools.jar
To: htm...@li...
Message-ID:
<30d...@ma...>
Content-Type: text/plain; charset=UTF-8
Sorry if this is a FAQ, I couldn't see it mentinoed on the site.
Since using HTML Parser, I've been getting the following from Maven:
"Missing artifact com.sun:tools:jar:1.6.0:system"
I've tried adding the extra build profile mentioned on the Maven FAQ
page, to no avail.
Could someone please help?
------------------------------
Message: 5
Date: Sun, 30 Aug 2009 16:43:22 +0200
From: Derrick Oswald <der...@gm...>
Subject: Re: [Htmlparser-user] Missing tools.jar
To: htmlparser user list <htm...@li...>
Message-ID:
<16a...@ma...>
Content-Type: text/plain; charset="iso-8859-1"
The tools.jar file comes with the JDK. It's in the ext directory I think.
It's probably a version issue - it's looking for an older version of the JDK
tools than you have.
You may be able to edit the build.xml and change the version.
On Sun, Aug 30, 2009 at 10:23 AM, Lee Goddard <20...@le...> wrote:
> Sorry if this is a FAQ, I couldn't see it mentinoed on the site.
>
> Since using HTML Parser, I've been getting the following from Maven:
> "Missing artifact com.sun:tools:jar:1.6.0:system"
>
> I've tried adding the extra build profile mentioned on the Maven FAQ
> page, to no avail.
>
> Could someone please help?
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 6
Date: Mon, 31 Aug 2009 10:02:42 +0200
From: Lee Goddard <20...@le...>
Subject: Re: [Htmlparser-user] Missing tools.jar
To: htmlparser user list <htm...@li...>
Message-ID: <4A9...@go...>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Thanks - it was a version error, on my side, I think.
Derrick Oswald wrote:
> The tools.jar file comes with the JDK. It's in the ext directory I think.
> It's probably a version issue - it's looking for an older version of
> the JDK tools than you have.
> You may be able to edit the build.xml and change the version.
>
> On Sun, Aug 30, 2009 at 10:23 AM, Lee Goddard <20...@le...
> <mailto:20...@le...>> wrote:
>
> Sorry if this is a FAQ, I couldn't see it mentinoed on the site.
>
> Since using HTML Parser, I've been getting the following from Maven:
> "Missing artifact com.sun:tools:jar:1.6.0:system"
>
> I've tried adding the extra build profile mentioned on the Maven FAQ
> page, to no avail.
>
> Could someone please help?
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports
> 2008 30-Day
> trial. Simplify your report design, integration and deployment -
> and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> <mailto:Htm...@li...>
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> ------------------------------------------------------------------------
>
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
------------------------------
Message: 7
Date: Mon, 31 Aug 2009 15:09:53 +0530 (IST)
From: semeera B <sem...@ya...>
Subject: [Htmlparser-user] html parser
To: htm...@li...
Message-ID: <862...@we...>
Content-Type: text/plain; charset="utf-8"
What is html parser ? How to create it ?
See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
------------------------------
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
End of Htmlparser-user Digest, Vol 35, Issue 7
**********************************************
Start chatting with friends on the all-new Yahoo! Pingbox today!! It's easy to create your personal chat space on your blogs. http://ph.messenger.yahoo.com/pingbox |
|
From: semeera B <sem...@ya...> - 2009-08-31 10:06:48
|
What is html parser ? How to create it ?
See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/ |
|
From: Lee G. <20...@le...> - 2009-08-31 08:03:01
|
Thanks - it was a version error, on my side, I think. Derrick Oswald wrote: > The tools.jar file comes with the JDK. It's in the ext directory I think. > It's probably a version issue - it's looking for an older version of > the JDK tools than you have. > You may be able to edit the build.xml and change the version. > > On Sun, Aug 30, 2009 at 10:23 AM, Lee Goddard <20...@le... > <mailto:20...@le...>> wrote: > > Sorry if this is a FAQ, I couldn't see it mentinoed on the site. > > Since using HTML Parser, I've been getting the following from Maven: > "Missing artifact com.sun:tools:jar:1.6.0:system" > > I've tried adding the extra build profile mentioned on the Maven FAQ > page, to no avail. > > Could someone please help? > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports > 2008 30-Day > trial. Simplify your report design, integration and deployment - > and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > <mailto:Htm...@li...> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > ------------------------------------------------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
|
From: Derrick O. <der...@gm...> - 2009-08-30 14:43:36
|
The tools.jar file comes with the JDK. It's in the ext directory I think. It's probably a version issue - it's looking for an older version of the JDK tools than you have. You may be able to edit the build.xml and change the version. On Sun, Aug 30, 2009 at 10:23 AM, Lee Goddard <20...@le...> wrote: > Sorry if this is a FAQ, I couldn't see it mentinoed on the site. > > Since using HTML Parser, I've been getting the following from Maven: > "Missing artifact com.sun:tools:jar:1.6.0:system" > > I've tried adding the extra build profile mentioned on the Maven FAQ > page, to no avail. > > Could someone please help? > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
|
From: Lee G. <20...@le...> - 2009-08-30 08:24:06
|
Sorry if this is a FAQ, I couldn't see it mentinoed on the site. Since using HTML Parser, I've been getting the following from Maven: "Missing artifact com.sun:tools:jar:1.6.0:system" I've tried adding the extra build profile mentioned on the Maven FAQ page, to no avail. Could someone please help? |
|
From: tamizh v. <tam...@gm...> - 2009-08-29 13:57:05
|
I am going to do a project based on HtmlParser. So as a first step i suppose to learn the HtmlParser. I go through the documentation but it is difficult keep track the context while learning through the documentation. So could you please provide some tutorials of HtmlParser, so that i can learn it well.. Thanks in advance.. |
|
From: Derrick O. <der...@gm...> - 2009-08-28 08:48:05
|
You don't need a parser. Just get the text directly: URL url URLConnection con; InputStream in; con = url.openConnection (); con.connect (); in = con.getInputStream() then do what you want with the contents. On Fri, Aug 28, 2009 at 9:43 AM, Neftali Papelleras < pap...@ya...> wrote: > Hi Good Day, > > I've been trying to look for a function in this library that can return a > string of html text of a web page. I know the java.net.URLConnection can > provide me with it, but it's better for me to just use a single function say > getHTMLSource that returns the html text of a url.Please let me know if it's > possible here and with sample code :) Thanks in advance, > > > > Kind Regards, > nef > > start: 0000-00-00 end: 0000-00-00 > ------------------------------ > Feel safer online. Upgrade to the new, safer Internet Explorer 8 > <http://us.lrd.yahoo.com/_ylc=X3oDMTFnNHZxc2k1BHRtX2RtZWNoA1RleHQgTGluawR0bV9sbmsDVTExMDM0NjUEdG1fbmV0A1lhaG9vIQ--/SIG=11k7khaee/**http%3A//downloads.yahoo.com/sg/internetexplorer/>optimized > for Yahoo! to put your mind at peace. It's free. > Get IE8 here!<http://us.lrd.yahoo.com/_ylc=X3oDMTFnNHZxc2k1BHRtX2RtZWNoA1RleHQgTGluawR0bV9sbmsDVTExMDM0NjUEdG1fbmV0A1lhaG9vIQ--/SIG=11k7khaee/**http%3A//downloads.yahoo.com/sg/internetexplorer/> > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
|
From: Neftali P. <pap...@ya...> - 2009-08-28 07:43:34
|
Hi Good Day,
I've been trying to look for a function in this library that can return a string of html text of a web page. I know the java.net.URLConnection can provide me with it, but it's better for me to just use a single function say getHTMLSource that returns the html text of a url.Please let me know if it's possible here and with sample code :) Thanks in advance,
Kind Regards,
nef
New Email addresses available on Yahoo!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!
http://mail.promotions.yahoo.com/newdomains/ph/ |
|
From: Derrick O. <der...@gm...> - 2009-08-24 15:42:55
|
You probably want the text that you can get from the StringBean. http://htmlparser.sourceforge.net/javadoc/index.html. Or if you really want the tags too, you can use toHtml(). On Mon, Aug 24, 2009 at 2:30 PM, Agrawal Ashish <agr...@st...>wrote: > Dear Users, > > I am quite new to this library. I want to use the function getStringText() > from CompositeParser class. I dont know how I can use it. I am doing the > following: > > parser = new Parser (urlString); > NodeList list = new NodeList (); > NodeFilter filter = new TagNameFilter ("STRONG"); > > for (NodeIterator e = parser.elements (); e.hasMoreNodes ();) > (e.nextNode ()).collectInto (list, filter); > > > Can you help me for finding the way I can typecast or something to get > getStringText() function work. > > > Thank you very much > > Ashish > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
|
From: Agrawal A. <agr...@st...> - 2009-08-24 12:53:30
|
Dear Users,
I am quite new to this library. I want to use the function getStringText() from CompositeParser class. I dont know how I can use it. I am doing the following:
parser = new Parser (urlString);
NodeList list = new NodeList ();
NodeFilter filter = new TagNameFilter ("STRONG");
for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
(e.nextNode ()).collectInto (list, filter);
Can you help me for finding the way I can typecast or something to get getStringText() function work.
Thank you very much
Ashish
|
|
From: Neftali P. <pap...@ya...> - 2009-08-22 00:59:41
|
Good Day!
I just woke up,8:30 in the morning. I'm very glad got a reply from this organization already with very helpful information. I will look at this later this morning as I will have a seminar to attend to at university.
Thank you very much! i really appreciated this help :)
I will check on here from time to time if I get hung up on a problem regarding the topic.
Respectfully,
neftali
________________________________
From: "htm...@li..." <htm...@li...>
To: htm...@li...
Sent: Saturday, August 22, 2009 4:56:24 AM
Subject: Htmlparser-user Digest, Vol 35, Issue 4
Send Htmlparser-user mailing list submissions to
htm...@li...
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
or, via email, send a message with subject or body 'help' to
htm...@li...
You can reach the person managing the list at
htm...@li...
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Htmlparser-user digest...."
Today's Topics:
1. Need Suggestions to get Started in HTML parsing (tamizh vendan)
2. Re: Need Suggestions to get Started in HTML parsing
(Derrick Oswald)
3. Web Crawler Thesis Project Using HTML Parser To collect links
(Neftali Papelleras)
4.. Web Crawler Thesis Project Using HTML Parser To collect links
(Neftali Papelleras)
5. Re: Web Crawler Thesis Project Using HTML Parser To collect
links (Derrick Oswald)
----------------------------------------------------------------------
Message: 1
Date: Wed, 19 Aug 2009 20:42:04 +0530
From: tamizh vendan <tam...@gm...>
Subject: [Htmlparser-user] Need Suggestions to get Started in HTML
parsing
To: htm...@li...
Message-ID:
<b98...@ma...>
Content-Type: text/plain; charset="iso-8859-1"
I am newbie to HTML parsing. I knew both Java and HTML well. I would like to
construct a DOM tree from the HTML coding of a Webpage. It would be helpful
for me if someone specify how to get started and kindly provide some
tutorial or article links. Provide Sample programs if possible.. Thanks in
advance..
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 2
Date: Wed, 19 Aug 2009 19:18:39 +0200
From: Derrick Oswald <der...@gm...>
Subject: Re: [Htmlparser-user] Need Suggestions to get Started in HTML
parsing
To: htmlparser user list <htm...@li...>
Message-ID:
<16a...@ma...>
Content-Type: text/plain; charset="iso-8859-1"
Have a look at the mainline in Parser.java:
http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/Parser.java?revision=8&view=markup
That program prints it out, but the results of parser.Parse (filter) is a
NodeList which is your (nested) dom tree.
Also have a look for other main methods in the code.
On Wed, Aug 19, 2009 at 5:12 PM, tamizh vendan <tam...@gm...> wrote:
>
> I am newbie to HTML parsing.. I knew both Java and HTML well. I would like
> to construct a DOM tree from the HTML coding of a Webpage. It would be
> helpful for me if someone specify how to get started and kindly provide some
> tutorial or article links. Provide Sample programs if possible.. Thanks in
> advance..
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 3
Date: Fri, 21 Aug 2009 10:40:19 -0700 (PDT)
From: Neftali Papelleras <pap...@ya...>
Subject: [Htmlparser-user] Web Crawler Thesis Project Using HTML
Parser To collect links
To: htm...@li...
Cc: pap...@ya...
Message-ID: <661...@we...>
Content-Type: text/plain; charset="utf-8"
Hi everyone.
I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage.
The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way.
I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again.
I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API.
Looking forward for a good response from this organization.
Respectfully,
neftali
Surf faster. Internet Explorer 8 optmized for Yahoo! auto launches 2 of your favorite pages everytime you open your browser. Get IE8 here! http://downloads.yahoo.com/sg/internetexplorer/
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 4
Date: Fri, 21 Aug 2009 10:42:32 -0700 (PDT)
From: Neftali Papelleras <pap...@ya...>
Subject: [Htmlparser-user] Web Crawler Thesis Project Using HTML
Parser To collect links
To: htm...@li...
Cc: pap...@ya...
Message-ID: <269...@we...>
Content-Type: text/plain; charset="utf-8"
Hi everyone.
I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage.
The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way.
I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again.
I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API.
Looking forward for a good response from this organization.
Respectfully,
neftali
Design your own exclusive Pingbox today! It's easy to create your personal chat space on your blogs. http://ph.messenger.yahoo.com/pingbox
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 5
Date: Fri, 21 Aug 2009 22:56:14 +0200
From: Derrick Oswald <der...@gm...>
Subject: Re: [Htmlparser-user] Web Crawler Thesis Project Using HTML
Parser To collect links
To: htmlparser user list <htm...@li...>
Message-ID:
<16a...@ma...>
Content-Type: text/plain; charset="iso-8859-1"
Have a look at org.htmlparser.beans.HTMLLinkBean<http://htmlparser.sourceforge.net/javadoc/index.html>
At the bottom of the source
file<http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/beans/HTMLLinkBean.java?revision=4&view=markup>is
a commented out main program to get you started.
On Fri, Aug 21, 2009 at 7:42 PM, Neftali Papelleras <
pap...@ya...> wrote:
> Hi everyone.
>
> I am Neftali Papelleras, an Engineering student from University of San
> Carlos, Cebu City, Philippines. I am currently having my thesis project
> which involves web crawling. The title of my project is A Web Extraction
> Tool to Monitor Websites and is implemented in Java. I am still on the first
> month of this one-year thesis project, and still on the information
> gathering stage.
>
> The first question I need to answer is how to create a Java-based web
> crawler. And next is how to retrieve the the web contents on every web page.
> And lastly, how to retrieve links from a given web source. First thing came
> to my mind was to use Java RegEx to retrieve the links given a web source.
> But now I understand it's not the right way to do it. And that's why I came
> to HTML Parser, because I knew this is the right way.
>
> I know Java but not on advanced level, I just know the concept. Though I
> have created several programs already, last was a chat system, I am still
> not confident with my skills on Java. But I am very much eager to learn and
> I am starting now, again.
>
> I have already downloaded the 1.6 version of HTML Parser and have browsed
> on different folders and files. I attempted to create a very simple parser
> program using the HTML Parser API, but unfortunately I was confused where to
> and how to start. I am hoping that this organization can provide a simple
> program that illustrates how to retrieve a link given a web page
> source/html text. I can follow through the program and eventually lead me to
> the understanding of using this API.
>
> Looking forward for a good response from this organization.
>
> Respectfully,
> neftali
>
> ------------------------------
> Have a new Yahoo! Mail account?<http://us.rd.yahoo.com/SIG=11dea1p2c/**http%3A%2F%2Fwww.trueswitch.com%2Fyahoo-ph>
> Kick start your journey by importing all your contacts!
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now. http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
------------------------------
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user
End of Htmlparser-user Digest, Vol 35, Issue 4
**********************************************
Cleaner, Better, Faster - Experience the new Faster Yahoo! Mail today at http://ph.mail.yahoo.com |
|
From: Derrick O. <der...@gm...> - 2009-08-21 20:56:24
|
Have a look at org.htmlparser.beans.HTMLLinkBean<http://htmlparser.sourceforge.net/javadoc/index.html> At the bottom of the source file<http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/beans/HTMLLinkBean.java?revision=4&view=markup>is a commented out main program to get you started. On Fri, Aug 21, 2009 at 7:42 PM, Neftali Papelleras < pap...@ya...> wrote: > Hi everyone. > > I am Neftali Papelleras, an Engineering student from University of San > Carlos, Cebu City, Philippines. I am currently having my thesis project > which involves web crawling. The title of my project is A Web Extraction > Tool to Monitor Websites and is implemented in Java. I am still on the first > month of this one-year thesis project, and still on the information > gathering stage. > > The first question I need to answer is how to create a Java-based web > crawler. And next is how to retrieve the the web contents on every web page. > And lastly, how to retrieve links from a given web source. First thing came > to my mind was to use Java RegEx to retrieve the links given a web source. > But now I understand it's not the right way to do it. And that's why I came > to HTML Parser, because I knew this is the right way. > > I know Java but not on advanced level, I just know the concept. Though I > have created several programs already, last was a chat system, I am still > not confident with my skills on Java. But I am very much eager to learn and > I am starting now, again. > > I have already downloaded the 1.6 version of HTML Parser and have browsed > on different folders and files. I attempted to create a very simple parser > program using the HTML Parser API, but unfortunately I was confused where to > and how to start. I am hoping that this organization can provide a simple > program that illustrates how to retrieve a link given a web page > source/html text. I can follow through the program and eventually lead me to > the understanding of using this API. > > Looking forward for a good response from this organization. > > Respectfully, > neftali > > ------------------------------ > Have a new Yahoo! Mail account?<http://us.rd.yahoo.com/SIG=11dea1p2c/**http%3A%2F%2Fwww.trueswitch.com%2Fyahoo-ph> > Kick start your journey by importing all your contacts! > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |
|
From: Neftali P. <pap...@ya...> - 2009-08-21 17:42:44
|
Hi everyone.
I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage.
The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way.
I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again.
I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API.
Looking forward for a good response from this organization.
Respectfully,
neftali
Design your own exclusive Pingbox today! It's easy to create your personal chat space on your blogs. http://ph.messenger.yahoo.com/pingbox |
|
From: Neftali P. <pap...@ya...> - 2009-08-21 17:40:41
|
Hi everyone.
I am Neftali Papelleras, an Engineering student from University of San Carlos, Cebu City, Philippines. I am currently having my thesis project which involves web crawling. The title of my project is A Web Extraction Tool to Monitor Websites and is implemented in Java. I am still on the first month of this one-year thesis project, and still on the information gathering stage.
The first question I need to answer is how to create a Java-based web crawler. And next is how to retrieve the the web contents on every web page. And lastly, how to retrieve links from a given web source. First thing came to my mind was to use Java RegEx to retrieve the links given a web source. But now I understand it's not the right way to do it. And that's why I came to HTML Parser, because I knew this is the right way.
I know Java but not on advanced level, I just know the concept. Though I have created several programs already, last was a chat system, I am still not confident with my skills on Java. But I am very much eager to learn and I am starting now, again.
I have already downloaded the 1.6 version of HTML Parser and have browsed on different folders and files. I attempted to create a very simple parser program using the HTML Parser API, but unfortunately I was confused where to and how to start. I am hoping that this organization can provide a simple program that illustrates how to retrieve a link given a web page source/html text. I can follow through the program and eventually lead me to the understanding of using this API.
Looking forward for a good response from this organization.
Respectfully,
neftali
Surf faster. Internet Explorer 8 optmized for Yahoo! auto launches 2 of your favorite pages everytime you open your browser. Get IE8 here! http://downloads.yahoo.com/sg/internetexplorer/ |
|
From: Derrick O. <der...@gm...> - 2009-08-19 17:18:53
|
Have a look at the mainline in Parser.java: http://htmlparser.svn.sourceforge.net/viewvc/htmlparser/trunk/parser/src/main/java/org/htmlparser/Parser.java?revision=8&view=markup That program prints it out, but the results of parser.Parse (filter) is a NodeList which is your (nested) dom tree. Also have a look for other main methods in the code. On Wed, Aug 19, 2009 at 5:12 PM, tamizh vendan <tam...@gm...> wrote: > > I am newbie to HTML parsing. I knew both Java and HTML well. I would like > to construct a DOM tree from the HTML coding of a Webpage. It would be > helpful for me if someone specify how to get started and kindly provide some > tutorial or article links. Provide Sample programs if possible.. Thanks in > advance.. > > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus > on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > |