Re: [Htmlparser-user] Htmlparser-user Digest, Vol 37, Issue 3
Brought to you by:
derrickoswald
From: Derrick O. <der...@gm...> - 2009-10-09 04:48:06
|
The Page class maintains this type of information. When the source is exhausted it could record the cursor position. I dont think it does now. The PageIndex has the position of each end-of-line except possibly the last.I think that adding one for the end-of-file wouldn't hurt. Then the size (in characters) would be that last cursor position. The size in bytes depends on the encoding, as mentioned here.. http://htmlparser.sourceforge.net/faq.html#byte On Fri, Oct 9, 2009 at 1:47 AM, Graham Bentley < gra...@wh...> wrote: > Incidentally, I was passing the html to the parser as a string as I > wanted to know the size of the page in bytes as I couldnt see > how to get this from the htmlparser if passing in an inputstream - and > cant rely on the contenttype header as sometimes is missing. > How do I get the compressed and or uncompressed size of the stream - is > this possible or is it a feature that could be added ? > regards, > Graham > > htm...@li... wrote: > > Send Htmlparser-user mailing list submissions to > > htm...@li... > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > or, via email, send a message with subject or body 'help' to > > htm...@li... > > > > You can reach the person managing the list at > > htm...@li... > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Htmlparser-user digest..." > > > > > > Today's Topics: > > > > 1. [SPAM] Related game (Edmund Kuyzyoj) > > 2. No line numbers if using a string source for parser > > (Graham Bentley) > > 3. Re: No line numbers if using a string source for parser > > (Derrick Oswald) > > 4. [SPAM] No ads, only love (Adelina Qbunu) > > 5. [SPAM] Favorite Brand promotion!! (Evon Pribyl) > > 6. [SPAM] They changed format (Kacie Jsotuv) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sat, 3 Oct 2009 21:01:44 +0200 > > From: "Edmund Kuyzyoj" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] Related game > > To: htm...@li... > > Message-ID: <4728PU.8178EF249.84990311452528BAVFMWIYGTOJIUJ52@bombel> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 2 > > Date: Sun, 04 Oct 2009 04:04:26 +0100 > > From: Graham Bentley <gra...@wh...> > > Subject: [Htmlparser-user] No line numbers if using a string source > > for parser > > To: htm...@li... > > Message-ID: <4AC...@wh...> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Hi there, > > > > Im using the parser to extract links and am passing the html in as a > > string source ie: > > Parser oParser = new Parser(new Lexer(sHTML)); > > However Ive noticed that I can get the starting position of the > > extracted nodes: > > LinkTag.getStartPosition > > but cannot get the starting line number: > > LinkTag.getStartingLineNumber > > is always 0 > > > > it works fine and gives the line number if I pass in an input stream > > from httpurlconnection or the urlconnection itself. > > so a bit confused - is this a bug or is it not possible to get the line > > numbers when using the stringsource ? > > could really do with the line numbers if there is a correct way of doing > > this, thanks. > > > > all working really well apart from that, thanks for the library. > > > > regards, > > Graham > > > > > > > > > > > > ------------------------------ > > > > Message: 3 > > Date: Sun, 4 Oct 2009 07:12:21 +0200 > > From: Derrick Oswald <der...@gm...> > > Subject: Re: [Htmlparser-user] No line numbers if using a string > > source for parser > > To: htmlparser user list <htm...@li...> > > Message-ID: > > <16a...@ma...> > > Content-Type: text/plain; charset="iso-8859-1" > > > > Does your input text string contain the newline (0x0A) or carriage > > return-newline (0x0D,0x0A) end of line characters? > > > > On Sun, Oct 4, 2009 at 5:04 AM, Graham Bentley < > > gra...@wh...> wrote: > > > > > >> Hi there, > >> > >> Im using the parser to extract links and am passing the html in as a > >> string source ie: > >> Parser oParser = new Parser(new Lexer(sHTML)); > >> However Ive noticed that I can get the starting position of the > >> extracted nodes: > >> LinkTag.getStartPosition > >> but cannot get the starting line number: > >> LinkTag.getStartingLineNumber > >> is always 0 > >> > >> it works fine and gives the line number if I pass in an input stream > >> from httpurlconnection or the urlconnection itself. > >> so a bit confused - is this a bug or is it not possible to get the line > >> numbers when using the stringsource ? > >> could really do with the line numbers if there is a correct way of doing > >> this, thanks. > >> > >> all working really well apart from that, thanks for the library. > >> > >> regards, > >> Graham > >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Come build with us! The BlackBerry® Developer Conference in SF, CA > >> is the only developer event you need to attend this year. Jumpstart your > >> developing skills, take BlackBerry mobile applications to market and > stay > >> ahead of the curve. Join us from November 9-12, 2009. Register > now! > >> http://p.sf.net/sfu/devconf > >> _______________________________________________ > >> Htmlparser-user mailing list > >> Htm...@li... > >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user > >> > >> > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 4 > > Date: Mon, 5 Oct 2009 09:53:37 +0300 > > From: "Adelina Qbunu" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] No ads, only love > > To: htm...@li... > > Message-ID: > > <1439ISV.2351639A54.249493786265JRTEXWNFPQTGXVR5453@FELICIA> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 5 > > From: "Evon Pribyl" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] Favorite Brand promotion!! > > To: htm...@li... > > Message-ID: <A9U31139H19T2597.BJDXADOXRY.F03E4C93350C@home> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > Message: 6 > > Date: Wed, 7 Oct 2009 13:57:33 +0200 > > From: "Kacie Jsotuv" <htm...@li...> > > Subject: [Htmlparser-user] [SPAM] They changed format > > To: htm...@li... > > Message-ID: <6191VIB.969264AE4.2562334509LHHTXBUCUWUSPOR23@UZIVATEL> > > Content-Type: text/plain; charset="us-ascii" > > > > An HTML attachment was scrubbed... > > > > ------------------------------ > > > > > ------------------------------------------------------------------------------ > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > > > ------------------------------ > > > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > End of Htmlparser-user Digest, Vol 37, Issue 3 > > ********************************************** > > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |