Re: [Htmlparser-user] Htmlparser-user Digest, Vol 37, Issue 3
Brought to you by:
derrickoswald
From: Graham B. <gra...@wh...> - 2009-10-08 23:47:34
|
Incidentally, I was passing the html to the parser as a string as I wanted to know the size of the page in bytes as I couldnt see how to get this from the htmlparser if passing in an inputstream - and cant rely on the contenttype header as sometimes is missing. How do I get the compressed and or uncompressed size of the stream - is this possible or is it a feature that could be added ? regards, Graham htm...@li... wrote: > Send Htmlparser-user mailing list submissions to > htm...@li... > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > or, via email, send a message with subject or body 'help' to > htm...@li... > > You can reach the person managing the list at > htm...@li... > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Htmlparser-user digest..." > > > Today's Topics: > > 1. [SPAM] Related game (Edmund Kuyzyoj) > 2. No line numbers if using a string source for parser > (Graham Bentley) > 3. Re: No line numbers if using a string source for parser > (Derrick Oswald) > 4. [SPAM] No ads, only love (Adelina Qbunu) > 5. [SPAM] Favorite Brand promotion!! (Evon Pribyl) > 6. [SPAM] They changed format (Kacie Jsotuv) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 3 Oct 2009 21:01:44 +0200 > From: "Edmund Kuyzyoj" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Related game > To: htm...@li... > Message-ID: <4728PU.8178EF249.84990311452528BAVFMWIYGTOJIUJ52@bombel> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Sun, 04 Oct 2009 04:04:26 +0100 > From: Graham Bentley <gra...@wh...> > Subject: [Htmlparser-user] No line numbers if using a string source > for parser > To: htm...@li... > Message-ID: <4AC...@wh...> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Hi there, > > Im using the parser to extract links and am passing the html in as a > string source ie: > Parser oParser = new Parser(new Lexer(sHTML)); > However Ive noticed that I can get the starting position of the > extracted nodes: > LinkTag.getStartPosition > but cannot get the starting line number: > LinkTag.getStartingLineNumber > is always 0 > > it works fine and gives the line number if I pass in an input stream > from httpurlconnection or the urlconnection itself. > so a bit confused - is this a bug or is it not possible to get the line > numbers when using the stringsource ? > could really do with the line numbers if there is a correct way of doing > this, thanks. > > all working really well apart from that, thanks for the library. > > regards, > Graham > > > > > > ------------------------------ > > Message: 3 > Date: Sun, 4 Oct 2009 07:12:21 +0200 > From: Derrick Oswald <der...@gm...> > Subject: Re: [Htmlparser-user] No line numbers if using a string > source for parser > To: htmlparser user list <htm...@li...> > Message-ID: > <16a...@ma...> > Content-Type: text/plain; charset="iso-8859-1" > > Does your input text string contain the newline (0x0A) or carriage > return-newline (0x0D,0x0A) end of line characters? > > On Sun, Oct 4, 2009 at 5:04 AM, Graham Bentley < > gra...@wh...> wrote: > > >> Hi there, >> >> Im using the parser to extract links and am passing the html in as a >> string source ie: >> Parser oParser = new Parser(new Lexer(sHTML)); >> However Ive noticed that I can get the starting position of the >> extracted nodes: >> LinkTag.getStartPosition >> but cannot get the starting line number: >> LinkTag.getStartingLineNumber >> is always 0 >> >> it works fine and gives the line number if I pass in an input stream >> from httpurlconnection or the urlconnection itself. >> so a bit confused - is this a bug or is it not possible to get the line >> numbers when using the stringsource ? >> could really do with the line numbers if there is a correct way of doing >> this, thanks. >> >> all working really well apart from that, thanks for the library. >> >> regards, >> Graham >> >> >> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry® Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9-12, 2009. Register now! >> http://p.sf.net/sfu/devconf >> _______________________________________________ >> Htmlparser-user mailing list >> Htm...@li... >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Mon, 5 Oct 2009 09:53:37 +0300 > From: "Adelina Qbunu" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] No ads, only love > To: htm...@li... > Message-ID: > <1439ISV.2351639A54.249493786265JRTEXWNFPQTGXVR5453@FELICIA> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 5 > From: "Evon Pribyl" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] Favorite Brand promotion!! > To: htm...@li... > Message-ID: <A9U31139H19T2597.BJDXADOXRY.F03E4C93350C@home> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 6 > Date: Wed, 7 Oct 2009 13:57:33 +0200 > From: "Kacie Jsotuv" <htm...@li...> > Subject: [Htmlparser-user] [SPAM] They changed format > To: htm...@li... > Message-ID: <6191VIB.969264AE4.2562334509LHHTXBUCUWUSPOR23@UZIVATEL> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > > ------------------------------ > > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > End of Htmlparser-user Digest, Vol 37, Issue 3 > ********************************************** > |