htmlparser-user Mailing List for HTML Parser (Page 15)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The Page class maintains this type of information.
When the source is exhausted it could record the cursor position. I dont
think it does now.
The PageIndex has the position of each end-of-line except possibly the
last.I think that adding one for the end-of-file wouldn't hurt.
Then the size (in characters) would be that last cursor position.
The size in bytes depends on the encoding, as mentioned here..
http://htmlparser.sourceforge.net/faq.html#byte

On Fri, Oct 9, 2009 at 1:47 AM, Graham Bentley <
gra...@wh...> wrote:

> Incidentally, I was passing the html to the parser as a string as I
> wanted to know the size of the page in bytes as I couldnt see
> how to get this from the htmlparser if passing in an inputstream - and
> cant rely on the contenttype header as sometimes is missing.
> How do I get the compressed and or uncompressed size of the stream - is
> this possible or is it a feature that could be added ?
> regards,
> Graham
>
> htm...@li... wrote:
> > Send Htmlparser-user mailing list submissions to
> >       htm...@li...
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >       https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > or, via email, send a message with subject or body 'help' to
> >       htm...@li...
> >
> > You can reach the person managing the list at
> >       htm...@li...
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Htmlparser-user digest..."
> >
> >
> > Today's Topics:
> >
> >    1. [SPAM] Related game (Edmund Kuyzyoj)
> >    2. No line numbers if using a string source for    parser
> >       (Graham Bentley)
> >    3. Re: No line numbers if using a string source for        parser
> >       (Derrick Oswald)
> >    4. [SPAM] No ads, only love (Adelina Qbunu)
> >    5. [SPAM] Favorite Brand promotion!! (Evon Pribyl)
> >    6. [SPAM] They changed format (Kacie Jsotuv)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Sat, 3 Oct 2009 21:01:44 +0200
> > From: "Edmund Kuyzyoj" <htm...@li...>
> > Subject: [Htmlparser-user] [SPAM] Related game
> > To: htm...@li...
> > Message-ID: <4728PU.8178EF249.84990311452528BAVFMWIYGTOJIUJ52@bombel>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Sun, 04 Oct 2009 04:04:26 +0100
> > From: Graham Bentley <gra...@wh...>
> > Subject: [Htmlparser-user] No line numbers if using a string source
> >       for     parser
> > To: htm...@li...
> > Message-ID: <4AC...@wh...>
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >
> > Hi there,
> >
> > Im using the parser to extract links and am passing the html in as a
> > string source ie:
> > Parser oParser = new Parser(new Lexer(sHTML));
> > However Ive noticed that I can get the starting position of the
> > extracted nodes:
> > LinkTag.getStartPosition
> > but cannot get the starting line number:
> > LinkTag.getStartingLineNumber
> > is always 0
> >
> > it works fine and gives the line number if I pass in an input stream
> > from httpurlconnection or the urlconnection itself.
> > so a bit confused - is this a bug or is it not possible to get the line
> > numbers when using the stringsource ?
> > could really do with the line numbers if there is a correct way of doing
> > this, thanks.
> >
> > all working really well apart from that, thanks for the library.
> >
> > regards,
> > Graham
> >
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Sun, 4 Oct 2009 07:12:21 +0200
> > From: Derrick Oswald <der...@gm...>
> > Subject: Re: [Htmlparser-user] No line numbers if using a string
> >       source for      parser
> > To: htmlparser user list <htm...@li...>
> > Message-ID:
> >       <16a...@ma...>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Does your input text string contain the newline (0x0A) or carriage
> > return-newline (0x0D,0x0A) end of line characters?
> >
> > On Sun, Oct 4, 2009 at 5:04 AM, Graham Bentley <
> > gra...@wh...> wrote:
> >
> >
> >> Hi there,
> >>
> >> Im using the parser to extract links and am passing the html in as a
> >> string source ie:
> >> Parser oParser = new Parser(new Lexer(sHTML));
> >> However Ive noticed that I can get the starting position of the
> >> extracted nodes:
> >> LinkTag.getStartPosition
> >> but cannot get the starting line number:
> >> LinkTag.getStartingLineNumber
> >> is always 0
> >>
> >> it works fine and gives the line number if I pass in an input stream
> >> from httpurlconnection or the urlconnection itself.
> >> so a bit confused - is this a bug or is it not possible to get the line
> >> numbers when using the stringsource ?
> >> could really do with the line numbers if there is a correct way of doing
> >> this, thanks.
> >>
> >> all working really well apart from that, thanks for the library.
> >>
> >> regards,
> >> Graham
> >>
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> >> is the only developer event you need to attend this year. Jumpstart your
> >> developing skills, take BlackBerry mobile applications to market and
> stay
> >> ahead of the curve. Join us from November 9&#45;12, 2009. Register
> now&#33;
> >> http://p.sf.net/sfu/devconf
> >> _______________________________________________
> >> Htmlparser-user mailing list
> >> Htm...@li...
> >> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >>
> >>
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Mon, 5 Oct 2009 09:53:37 +0300
> > From: "Adelina Qbunu" <htm...@li...>
> > Subject: [Htmlparser-user] [SPAM] No ads, only love
> > To: htm...@li...
> > Message-ID:
> >       <1439ISV.2351639A54.249493786265JRTEXWNFPQTGXVR5453@FELICIA>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> > Message: 5
> > From: "Evon Pribyl" <htm...@li...>
> > Subject: [Htmlparser-user] [SPAM] Favorite Brand promotion!!
> > To: htm...@li...
> > Message-ID: <A9U31139H19T2597.BJDXADOXRY.F03E4C93350C@home>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> > Message: 6
> > Date: Wed, 7 Oct 2009 13:57:33 +0200
> > From: "Kacie Jsotuv" <htm...@li...>
> > Subject: [Htmlparser-user] [SPAM] They changed format
> > To: htm...@li...
> > Message-ID: <6191VIB.969264AE4.2562334509LHHTXBUCUWUSPOR23@UZIVATEL>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > An HTML attachment was scrubbed...
> >
> > ------------------------------
> >
> >
> ------------------------------------------------------------------------------
> > Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> > is the only developer event you need to attend this year. Jumpstart your
> > developing skills, take BlackBerry mobile applications to market and stay
> > ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> > http://p.sf.net/sfu/devconference
> >
> > ------------------------------
> >
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> >
> >
> > End of Htmlparser-user Digest, Vol 37, Issue 3
> > **********************************************
> >
>
>
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2002	Jan (7)	Feb	Mar (9)	Apr (50)	May (20)	Jun (47)	Jul (37)	Aug (32)	Sep (30)	Oct (11)	Nov (37)	Dec (47)
2003	Jan (31)	Feb (70)	Mar (67)	Apr (34)	May (66)	Jun (25)	Jul (48)	Aug (43)	Sep (58)	Oct (25)	Nov (10)	Dec (25)
2004	Jan (38)	Feb (17)	Mar (24)	Apr (25)	May (11)	Jun (6)	Jul (24)	Aug (42)	Sep (13)	Oct (17)	Nov (13)	Dec (44)
2005	Jan (10)	Feb (16)	Mar (16)	Apr (23)	May (6)	Jun (19)	Jul (39)	Aug (15)	Sep (40)	Oct (49)	Nov (29)	Dec (41)
2006	Jan (28)	Feb (24)	Mar (52)	Apr (41)	May (31)	Jun (34)	Jul (22)	Aug (12)	Sep (11)	Oct (11)	Nov (11)	Dec (4)
2007	Jan (39)	Feb (13)	Mar (16)	Apr (24)	May (13)	Jun (12)	Jul (21)	Aug (61)	Sep (31)	Oct (13)	Nov (32)	Dec (15)
2008	Jan (7)	Feb (8)	Mar (14)	Apr (12)	May (23)	Jun (20)	Jul (9)	Aug (6)	Sep (2)	Oct (7)	Nov (3)	Dec (2)
2009	Jan (5)	Feb (8)	Mar (10)	Apr (22)	May (85)	Jun (82)	Jul (45)	Aug (28)	Sep (26)	Oct (50)	Nov (8)	Dec (16)
2010	Jan (3)	Feb (11)	Mar (39)	Apr (56)	May (80)	Jun (64)	Jul (49)	Aug (48)	Sep (16)	Oct (3)	Nov (5)	Dec (5)
2011	Jan (13)	Feb	Mar (1)	Apr (7)	May (7)	Jun (7)	Jul (7)	Aug (8)	Sep	Oct (6)	Nov (2)	Dec
2012	Jan (5)	Feb	Mar (3)	Apr (3)	May (4)	Jun (8)	Jul (1)	Aug (5)	Sep (10)	Oct (3)	Nov (2)	Dec (4)
2013	Jan (4)	Feb (2)	Mar (7)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar (1)	Apr	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (4)	Nov (2)	Dec (4)
2015	Jan (4)	Feb (2)	Mar (8)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug (1)	Sep (1)	Oct (4)	Nov (3)	Dec (4)
2016	Jan (4)	Feb (6)	Mar (9)	Apr (9)	May (6)	Jun (1)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2017	Jan	Feb (1)	Mar (3)	Apr (1)	May	Jun (1)	Jul (2)	Aug (3)	Sep (6)	Oct (3)	Nov (2)	Dec (5)
2018	Jan (3)	Feb (13)	Mar (28)	Apr (5)	May (4)	Jun (2)	Jul (2)	Aug (8)	Sep (2)	Oct (1)	Nov (5)	Dec (1)
2019	Jan (8)	Feb (1)	Mar	Apr (1)	May (4)	Jun	Jul (1)	Aug	Sep	Oct	Nov (2)	Dec (2)
2020	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (2)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec (1)
2021	Jan (3)	Feb (2)	Mar (1)	Apr (1)	May (2)	Jun (1)	Jul (2)	Aug (1)	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2024	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct (1)	Nov	Dec

htmlparser-user Mailing List for HTML Parser (Page 15)

htmlparser-user — The user mailing list for users of the htmlparser library