htmlparser-user Mailing List for HTML Parser (Page 80)

Brought to you by: derrickoswald

htmlparser-user — The user mailing list for users of the htmlparser library

You can subscribe to this list here.

2001	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (1)	Dec
2002	Jan (7)	Feb	Mar (9)	Apr (50)	May (20)	Jun (47)	Jul (37)	Aug (32)	Sep (30)	Oct (11)	Nov (37)	Dec (47)
2003	Jan (31)	Feb (70)	Mar (67)	Apr (34)	May (66)	Jun (25)	Jul (48)	Aug (43)	Sep (58)	Oct (25)	Nov (10)	Dec (25)
2004	Jan (38)	Feb (17)	Mar (24)	Apr (25)	May (11)	Jun (6)	Jul (24)	Aug (42)	Sep (13)	Oct (17)	Nov (13)	Dec (44)
2005	Jan (10)	Feb (16)	Mar (16)	Apr (23)	May (6)	Jun (19)	Jul (39)	Aug (15)	Sep (40)	Oct (49)	Nov (29)	Dec (41)
2006	Jan (28)	Feb (24)	Mar (52)	Apr (41)	May (31)	Jun (34)	Jul (22)	Aug (12)	Sep (11)	Oct (11)	Nov (11)	Dec (4)
2007	Jan (39)	Feb (13)	Mar (16)	Apr (24)	May (13)	Jun (12)	Jul (21)	Aug (61)	Sep (31)	Oct (13)	Nov (32)	Dec (15)
2008	Jan (7)	Feb (8)	Mar (14)	Apr (12)	May (23)	Jun (20)	Jul (9)	Aug (6)	Sep (2)	Oct (7)	Nov (3)	Dec (2)
2009	Jan (5)	Feb (8)	Mar (10)	Apr (22)	May (85)	Jun (82)	Jul (45)	Aug (28)	Sep (26)	Oct (50)	Nov (8)	Dec (16)
2010	Jan (3)	Feb (11)	Mar (39)	Apr (56)	May (80)	Jun (64)	Jul (49)	Aug (48)	Sep (16)	Oct (3)	Nov (5)	Dec (5)
2011	Jan (13)	Feb	Mar (1)	Apr (7)	May (7)	Jun (7)	Jul (7)	Aug (8)	Sep	Oct (6)	Nov (2)	Dec
2012	Jan (5)	Feb	Mar (3)	Apr (3)	May (4)	Jun (8)	Jul (1)	Aug (5)	Sep (10)	Oct (3)	Nov (2)	Dec (4)
2013	Jan (4)	Feb (2)	Mar (7)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug	Sep (1)	Oct	Nov	Dec
2014	Jan	Feb (2)	Mar (1)	Apr	May (3)	Jun (1)	Jul	Aug	Sep (1)	Oct (4)	Nov (2)	Dec (4)
2015	Jan (4)	Feb (2)	Mar (8)	Apr (7)	May (6)	Jun (7)	Jul (3)	Aug (1)	Sep (1)	Oct (4)	Nov (3)	Dec (4)
2016	Jan (4)	Feb (6)	Mar (9)	Apr (9)	May (6)	Jun (1)	Jul (1)	Aug	Sep	Oct (1)	Nov (1)	Dec (1)
2017	Jan	Feb (1)	Mar (3)	Apr (1)	May	Jun (1)	Jul (2)	Aug (3)	Sep (6)	Oct (3)	Nov (2)	Dec (5)
2018	Jan (3)	Feb (13)	Mar (28)	Apr (5)	May (4)	Jun (2)	Jul (2)	Aug (8)	Sep (2)	Oct (1)	Nov (5)	Dec (1)
2019	Jan (8)	Feb (1)	Mar	Apr (1)	May (4)	Jun	Jul (1)	Aug	Sep	Oct	Nov (2)	Dec (2)
2020	Jan	Feb	Mar (1)	Apr (1)	May (1)	Jun (2)	Jul (1)	Aug (1)	Sep (1)	Oct	Nov (1)	Dec (1)
2021	Jan (3)	Feb (2)	Mar (1)	Apr (1)	May (2)	Jun (1)	Jul (2)	Aug (1)	Sep	Oct	Nov	Dec
2022	Jan	Feb	Mar	Apr (1)	May (1)	Jun (1)	Jul	Aug (1)	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug (1)	Sep	Oct	Nov	Dec
2024	Jan (2)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 78 79 80 81 82 .. 99 > >> (Page 80 of 99)

Re: [Htmlparser-user] Integration Release 1.3-20030330 is out

From: Rich W. <ri...@wi...> - 2003-04-02 22:05:56

What is needed is cookiejar functionality.. Something that will give and
accept cookies when making requests. There is no other way around it. Many
sites use cookies to deter spidering..

rw


----- Original Message -----
From: "Navid H.Langaroudi" <na...@ya...>
To: <htm...@li...>
Sent: Wednesday, April 02, 2003 3:45 PM
Subject: Re: [Htmlparser-user] Integration Release 1.3-20030330 is out


> Hi Somik and everybody else,
> Things are really going fast and interesting here. It
> is a great job. I hope once my program is completed, I
> can share it with others.
>
> Well, I faced a new problem yesterday. It may not be
> very much related to HTMLParser, but I appreciate it
> if any one could give me a hint.
>
> My program uses HTMLparser classes to access sites and
> extract all urls, and then in another run, using those
> urls, it extract data from pages of those urls.
>
> There is this site which uses MicorsoftCommerc Server
> 2000, and attaches the cookie to url, if request is
> not from a Browser:
> some thing like this.
>
> http://www.shoemall.com/product.asp?family%5Fid=2543&type=0&cat%5Fid=
>
0&MSCSProfile=61E4CECF7275066FD87B9817DA5865CBE5EA506A04C53D8558451EC3D02BB5
7732
>
7CA398F52348946BD1631D503EA92FF120A8E45A336FAD8E7E4E31B1356470B79DDD041A4F98
A5B4
> 03FC86D8A52985761A9F6CEA80
>
> And once I try to access the same page with same url,
> every time I get a differnt page!!!
>
> Can anybody tell me why this is so? and How can I
> change my java program to avoid it, or recieve the
> correct page.
>
> I am also using
> connectionnew.setRequestProperty
> ("User-Agent","Mozilla/3.0(Windows NT 4.0; U) Opera
> 6.0 [en]");
>
> but still this does help!
>
> Thank you
> Navid
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms, and more
> http://tax.yahoo.com
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: ValueWeb:
> Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
> No other company gives more support or power for your dedicated server
> http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>

Re: [Htmlparser-user] Integration Release 1.3-20030330 is out

From: Navid H.L. <na...@ya...> - 2003-04-02 20:45:44

Hi Somik and everybody else,
Things are really going fast and interesting here. It
is a great job. I hope once my program is completed, I
can share it with others.

Well, I faced a new problem yesterday. It may not be
very much related to HTMLParser, but I appreciate it
if any one could give me a hint.

My program uses HTMLparser classes to access sites and
extract all urls, and then in another run, using those
urls, it extract data from pages of those urls.

There is this site which uses MicorsoftCommerc Server
2000, and attaches the cookie to url, if request is
not from a Browser:
some thing like this.

http://www.shoemall.com/product.asp?family%5Fid=2543&type=0&cat%5Fid=
0&MSCSProfile=61E4CECF7275066FD87B9817DA5865CBE5EA506A04C53D8558451EC3D02BB57732
7CA398F52348946BD1631D503EA92FF120A8E45A336FAD8E7E4E31B1356470B79DDD041A4F98A5B4
03FC86D8A52985761A9F6CEA80

And once I try to access the same page with same url,
every time I get a differnt page!!! 

Can anybody tell me why this is so? and How can I
change my java program to avoid it, or recieve the
correct page.

I am also using 
connectionnew.setRequestProperty
("User-Agent","Mozilla/3.0(Windows NT 4.0; U) Opera
6.0 [en]");

but still this does help!

Thank you 
Navid

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

[Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #228 - 1 msg

From: ope t. <op...@ho...> - 2003-03-31 21:08:04

Thanks a lot, it worked!
Sincerely,
Ope






>From: htm...@li...
>Reply-To: htm...@li...
>To: htm...@li...
>Subject: Htmlparser-user digest, Vol 1 #228 - 1 msg
>Date: Sun, 30 Mar 2003 12:09:36 -0800
>
>Send Htmlparser-user mailing list submissions to
>	htm...@li...
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>or, via email, send a message with subject or body 'help' to
>	htm...@li...
>
>You can reach the person managing the list at
>	htm...@li...
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Htmlparser-user digest..."
>
>
>Today's Topics:
>
>    1. Re: Re: Htmlparser-user digest, Vol 1 #226 - 2 msgs (Somik Raha)
>
>--__--__--
>
>Message: 1
>From: "Somik Raha" <so...@ya...>
>To: <htm...@li...>
>Subject: Re: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2 
>msgs
>Date: Sat, 29 Mar 2003 22:18:18 -0800
>Reply-To: htm...@li...
>
>FYI, I've just found that the CompositeTagScanner had a bug, due to which
>the filters were not being set. Ope -->
>node.collectInto(nodeList, LinkTag.LINK_TAG_FILTER);
>
>will work in the next integration release.
>
>Regards,
>Somik
>----- Original Message -----
>From: "Somik Raha" <so...@ya...>
>To: <htm...@li...>
>Sent: Thursday, March 27, 2003 2:38 PM
>Subject: RE: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2
>msgs
>
>
> > Instead of this,
> > > node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);
> > use:
> >
> > node.collectInto(nodeList,LinkTag.class);
> >
> > Regards,
> > Somik
> > --- Marc Novakowski <ma...@ke...> wrote:
> > > Try removing the following line from your code:
> > >
> > > nodeList.add(node);
> > >
> > > It's most likely adding non-LinkTag nodes into
> > > nodeList which causes the ClassCastException later
> > > on.
> > >
> > > Marc
> > >
> > > -----Original Message-----
> > > From: ope tomori [mailto:op...@ho...]
> > > Sent: Thursday, March 27, 2003 1:31 PM
> > > To: htm...@li...
> > > Subject: [Htmlparser-user] Re: Htmlparser-user
> > > digest, Vol 1 #226 - 2
> > > msgs
> > >
> > >
> > > I figured out the part using the
> > > nodeList.collectInto. My debug output shows
> > > the right output, put when i try to process the link
> > > information, i get this
> > > error (this is part of the error):
> > >
> > > Exception occurred during event dispatching:
> > > java.lang.ClassCastException:
> > > org.htmlparser.tags.DoctypeTag
> > >
> > >
> > > Thanks in advance for your help
> > >
> > > Sincerely,
> > > Ope T.
> > >
> > >
> > > This is my code below:
> > > try{
> > > //create the parser with the url to be parsed
> > > parser = new Parser(urlAddressComplete,new
> > > DefaultParserFeedback());
> > > parser.registerScanners();
> > > nodeList = new NodeList();
> > >
> > > //to extratct all the embedded links and images
> > >
> > > for (NodeIterator e =
> > > parser.elements();e.hasMoreNodes();) {
> > > Node node = (Node)e.nextNode();
> > > nodeList.add(node);
> > >
> > //node.collectInto(nodeList,ImageTag.IMAGE_TAG_FILTER);
> > > node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);
> > >
> > > }//for
> > >
> > > System.out.print("CHECKING NODES.. " +
> > > nodeList.toString()+ "\n");
> > >
> > > //now process the links and images
> > > //this is the part that doesnt seem to work
> > >
> > > for (SimpleNodeIterator e =
> > > nodeList.elements();e.hasMoreNodes();) {
> > > LinkTag linkTag = (LinkTag)e.nextNode();
> > >
> > > //put the links and their texts into vectors
> > > allTextLinkVector.addElement(linkTag.getLinkText());
> > > allLinkVector.addElement(linkTag.getLink());
> > > }
> > > // System.out.print( "All Links " + "Size: "+
> > > allTextLinkVector.size() + "
> > > "+ allTextLinkVector.toString()+ "\n");
> > >
> > > }//inner try
> > >
> > > catch (ParserException e) {
> > > System.err.println("Error, could not create parser
> > > object");
> > > e.printStackTrace();
> > > }//catch
> > > }// outer try
> > > catch(IOException ex) { ex.printStackTrace(); }
> > >
> > >
> > >
> > >
> > >
> > >
> > > >From: htm...@li...
> > > Reply-To:
> > > >htm...@li... To:
> > > >htm...@li... Subject:
> > > Htmlparser-user digest, Vol
> > > >1 #226 - 2 msgs Date: Thu, 27 Mar 2003 12:49:39
> > > -0800
> > > >
> > > >Send Htmlparser-user mailing list submissions to
> > > >htm...@li...
> > > >
> > > >To subscribe or unsubscribe via the World Wide Web,
> > > visit
> > >
> > >https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > > or, via email,
> > > >send a message with subject or body 'help' to
> > > >htm...@li...
> > > >
> > > >You can reach the person managing the list at
> > > >htm...@li...
> > > >
> > > >When replying, please edit your Subject line so it
> > > is more specific than
> > > >"Re: Contents of Htmlparser-user digest..."
> > > >
> > > >
> > > >Today's Topics:
> > > >
> > > >1. Help with method --> node.collectInto() (ope
> > > tomori) 2. RE: Help with
> > > >method --> node.collectInto() (Marc Novakowski)
> > > >
> > > >-- __--__--
> > > >
> > > >Message: 1 From: "ope tomori" To:
> > > htm...@li...
> > > >Date: Thu, 27 Mar 2003 15:00:17 +0000 Subject:
> > > [Htmlparser-user] Help with
> > > >method --> node.collectInto() Reply-To:
> > > >htm...@li...
> > > >
> > > >
> > > >Hi Im trying to use the method
> > > node.collectInto(...) to extract embedded
> > > >links and images on webpages. Im using the latest
> > > integration release which
> > > >means its now Parser, not HTMLParser, nodeIterator,
> > > etc and all the other
> > > >changes.
> > > >
> > > >
> > > >
> > > >I followed the sample code:
> > > >
> > > >HTMLParser parser = new
> > > HTMLParser("http://www.yahoo.com");
> > > >parser.registerScanners(); int i = 0; Vector
> > > collectionVector = new
> > > >Vector(); HTMLNode node; for (HTMLEnumeration e =
> > > >parser.elements();e.hasMoreNodes();) { node =
> > > e.nextHTMLNode();
> > >
> > >node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
> > > } // All
> > > >items in the collection vector should be links for
> > > (Enumeration e =
> > > >collectionVector.elements();e.hasMoreElements();) {
> > > HTMLLinkTag linkTag =
> > > >(HTMLLinkTag)e.nextElement(); // you can now
> > > process the links as you like
> > > >}
> > >
> > ***********************************************************
> > > >
> > > >
> > > >Im getting an error because this line:
> > > >
> > >
> > >node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
> > > requires a
> > > >nodeList and not a vector, ive tried changing it
> > > without any success:
> > > >Creating a nodelist instead of a vector,
> > > >
> > > >can u please help me!!
> > > >
> > > >Thanks Ope
> > > >
> > > >
> > >
> > >_________________________________________________________________
> > > The new
> > > >MSN 8: advanced junk mail protection and 2 months
> > > FREE*
> > > >http://join.msn.com/?page=features/junkmail
> > > >
> > > >
> > > >
> > > >-- __--__--
> > > >
> > > >Message: 2 Subject: RE: [Htmlparser-user] Help with
> > > method -->
> > > >node.collectInto() Date: Thu, 27 Mar 2003 08:30:54
> > > -0800 From: "Marc
> > > >Novakowski" To: Reply-To:
> > > htm...@li...
> > > >
> > > >If you can paste the actual code you're trying to
> > > compile, I'd be more =
> > > >than happy to take a look at it.
> > > >
> > > >Marc
> > > >
> > > >-----Original Message----- From: ope tomori
> > > [mailto:op...@ho...]
> > > >Sent: Thursday, March 27, 2003 7:00 AM To:
> > > >htm...@li... Subject:
> > > [Htmlparser-user]
> > === message truncated ===
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
> > http://platinum.yahoo.com
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by:
> > The Definitive IT and Networking Event. Be There!
> > NetWorld+Interop Las Vegas 2003 -- Register today!
> > http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
> > _______________________________________________
> > Htmlparser-user mailing list
> > Htm...@li...
> > https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>
>
>--__--__--
>
>_______________________________________________
>Htmlparser-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>End of Htmlparser-user Digest


_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail

[Htmlparser-user] Integration Release 1.3-20030330 is out

From: Somik R. <so...@ya...> - 2003-03-31 04:43:54

Hi Folks,
    This week's integration release is packed with goodies!

From the change log:
Integration Build 1.3 - 20030330
--------------------------------
[1] fixed bug (an enhancement really) 694477 quotes in content-type header
[2] fix bug #699886 and #707447 by using a buffered stream reader with
infinite mark
[3] fixed bug in CompositeTagScanner, filter not being set correctly
[4] fixed thread safety issue in TagParser (bug 711073)
[5] fixed out of memory error when parsing custom composite tags (bug
709152)
[6] fixed bug 701159, 696455 - redesigned script scanner.
Javascript parsing is now much more robust.

As you can see, a lot of bug fixes have gone in. There are three major
fixes - one by Derrick Oswald (#2) addresses the charset issue. The parser
should now be able to handle different charsets dynamically. We hope you can
test this and give us feedback.

The second big change is a redesign of the way Javascript is handled by the
parser. It had been riddled with problems for some time, so we've changed
its internals. The new implementation is much more robust, and hopefully we
can get some feedback on that too.

There were some thread safety issues (thanks to Joe Robbins for reporting
this). These have been addressed in this release, and the parser should be
totally thread-safe now.

Regards,
Somik

Re: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2 msgs

From: Somik R. <so...@ya...> - 2003-03-30 06:16:42

FYI, I've just found that the CompositeTagScanner had a bug, due to which
the filters were not being set. Ope -->
node.collectInto(nodeList, LinkTag.LINK_TAG_FILTER);

will work in the next integration release.

Regards,
Somik
----- Original Message -----
From: "Somik Raha" <so...@ya...>
To: <htm...@li...>
Sent: Thursday, March 27, 2003 2:38 PM
Subject: RE: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2
msgs


> Instead of this,
> > node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);
> use:
>
> node.collectInto(nodeList,LinkTag.class);
>
> Regards,
> Somik
> --- Marc Novakowski <ma...@ke...> wrote:
> > Try removing the following line from your code:
> >
> > nodeList.add(node);
> >
> > It's most likely adding non-LinkTag nodes into
> > nodeList which causes the ClassCastException later
> > on.
> >
> > Marc
> >
> > -----Original Message-----
> > From: ope tomori [mailto:op...@ho...]
> > Sent: Thursday, March 27, 2003 1:31 PM
> > To: htm...@li...
> > Subject: [Htmlparser-user] Re: Htmlparser-user
> > digest, Vol 1 #226 - 2
> > msgs
> >
> >
> > I figured out the part using the
> > nodeList.collectInto. My debug output shows
> > the right output, put when i try to process the link
> > information, i get this
> > error (this is part of the error):
> >
> > Exception occurred during event dispatching:
> > java.lang.ClassCastException:
> > org.htmlparser.tags.DoctypeTag
> >
> >
> > Thanks in advance for your help
> >
> > Sincerely,
> > Ope T.
> >
> >
> > This is my code below:
> > try{
> > //create the parser with the url to be parsed
> > parser = new Parser(urlAddressComplete,new
> > DefaultParserFeedback());
> > parser.registerScanners();
> > nodeList = new NodeList();
> >
> > //to extratct all the embedded links and images
> >
> > for (NodeIterator e =
> > parser.elements();e.hasMoreNodes();) {
> > Node node = (Node)e.nextNode();
> > nodeList.add(node);
> >
> //node.collectInto(nodeList,ImageTag.IMAGE_TAG_FILTER);
> > node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);
> >
> > }//for
> >
> > System.out.print("CHECKING NODES.. " +
> > nodeList.toString()+ "\n");
> >
> > //now process the links and images
> > //this is the part that doesnt seem to work
> >
> > for (SimpleNodeIterator e =
> > nodeList.elements();e.hasMoreNodes();) {
> > LinkTag linkTag = (LinkTag)e.nextNode();
> >
> > //put the links and their texts into vectors
> > allTextLinkVector.addElement(linkTag.getLinkText());
> > allLinkVector.addElement(linkTag.getLink());
> > }
> > // System.out.print( "All Links " + "Size: "+
> > allTextLinkVector.size() + "
> > "+ allTextLinkVector.toString()+ "\n");
> >
> > }//inner try
> >
> > catch (ParserException e) {
> > System.err.println("Error, could not create parser
> > object");
> > e.printStackTrace();
> > }//catch
> > }// outer try
> > catch(IOException ex) { ex.printStackTrace(); }
> >
> >
> >
> >
> >
> >
> > >From: htm...@li...
> > Reply-To:
> > >htm...@li... To:
> > >htm...@li... Subject:
> > Htmlparser-user digest, Vol
> > >1 #226 - 2 msgs Date: Thu, 27 Mar 2003 12:49:39
> > -0800
> > >
> > >Send Htmlparser-user mailing list submissions to
> > >htm...@li...
> > >
> > >To subscribe or unsubscribe via the World Wide Web,
> > visit
> >
> >https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> > or, via email,
> > >send a message with subject or body 'help' to
> > >htm...@li...
> > >
> > >You can reach the person managing the list at
> > >htm...@li...
> > >
> > >When replying, please edit your Subject line so it
> > is more specific than
> > >"Re: Contents of Htmlparser-user digest..."
> > >
> > >
> > >Today's Topics:
> > >
> > >1. Help with method --> node.collectInto() (ope
> > tomori) 2. RE: Help with
> > >method --> node.collectInto() (Marc Novakowski)
> > >
> > >--__--__--
> > >
> > >Message: 1 From: "ope tomori" To:
> > htm...@li...
> > >Date: Thu, 27 Mar 2003 15:00:17 +0000 Subject:
> > [Htmlparser-user] Help with
> > >method --> node.collectInto() Reply-To:
> > >htm...@li...
> > >
> > >
> > >Hi Im trying to use the method
> > node.collectInto(...) to extract embedded
> > >links and images on webpages. Im using the latest
> > integration release which
> > >means its now Parser, not HTMLParser, nodeIterator,
> > etc and all the other
> > >changes.
> > >
> > >
> > >
> > >I followed the sample code:
> > >
> > >HTMLParser parser = new
> > HTMLParser("http://www.yahoo.com");
> > >parser.registerScanners(); int i = 0; Vector
> > collectionVector = new
> > >Vector(); HTMLNode node; for (HTMLEnumeration e =
> > >parser.elements();e.hasMoreNodes();) { node =
> > e.nextHTMLNode();
> >
> >node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
> > } // All
> > >items in the collection vector should be links for
> > (Enumeration e =
> > >collectionVector.elements();e.hasMoreElements();) {
> > HTMLLinkTag linkTag =
> > >(HTMLLinkTag)e.nextElement(); // you can now
> > process the links as you like
> > >}
> >
> ***********************************************************
> > >
> > >
> > >Im getting an error because this line:
> > >
> >
> >node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
> > requires a
> > >nodeList and not a vector, ive tried changing it
> > without any success:
> > >Creating a nodelist instead of a vector,
> > >
> > >can u please help me!!
> > >
> > >Thanks Ope
> > >
> > >
> >
> >_________________________________________________________________
> > The new
> > >MSN 8: advanced junk mail protection and 2 months
> > FREE*
> > >http://join.msn.com/?page=features/junkmail
> > >
> > >
> > >
> > >--__--__--
> > >
> > >Message: 2 Subject: RE: [Htmlparser-user] Help with
> > method -->
> > >node.collectInto() Date: Thu, 27 Mar 2003 08:30:54
> > -0800 From: "Marc
> > >Novakowski" To: Reply-To:
> > htm...@li...
> > >
> > >If you can paste the actual code you're trying to
> > compile, I'd be more =
> > >than happy to take a look at it.
> > >
> > >Marc
> > >
> > >-----Original Message----- From: ope tomori
> > [mailto:op...@ho...]
> > >Sent: Thursday, March 27, 2003 7:00 AM To:
> > >htm...@li... Subject:
> > [Htmlparser-user]
> === message truncated ===
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
> http://platinum.yahoo.com
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by:
> The Definitive IT and Networking Event. Be There!
> NetWorld+Interop Las Vegas 2003 -- Register today!
> http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user

RE: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2 msgs

From: Somik R. <so...@ya...> - 2003-03-27 22:38:36

Instead of this,
> node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);
use:

node.collectInto(nodeList,LinkTag.class);

Regards,
Somik
--- Marc Novakowski <ma...@ke...> wrote:
> Try removing the following line from your code:
> 
> nodeList.add(node);
> 
> It's most likely adding non-LinkTag nodes into
> nodeList which causes the ClassCastException later
> on.
> 
> Marc
> 
> -----Original Message-----
> From: ope tomori [mailto:op...@ho...]
> Sent: Thursday, March 27, 2003 1:31 PM
> To: htm...@li...
> Subject: [Htmlparser-user] Re: Htmlparser-user
> digest, Vol 1 #226 - 2
> msgs
> 
> 
> I figured out the part using the
> nodeList.collectInto. My debug output shows 
> the right output, put when i try to process the link
> information, i get this 
> error (this is part of the error):
> 
> Exception occurred during event dispatching:
> java.lang.ClassCastException:
> org.htmlparser.tags.DoctypeTag
> 
> 
> Thanks in advance for your help
> 
> Sincerely,
> Ope T.
> 
> 
> This is my code below:
> try{
> //create the parser with the url to be parsed
> parser = new Parser(urlAddressComplete,new
> DefaultParserFeedback());
> parser.registerScanners();
> nodeList = new NodeList();
> 
> //to extratct all the embedded links and images
> 
> for (NodeIterator e =
> parser.elements();e.hasMoreNodes();) {
> Node node = (Node)e.nextNode();
> nodeList.add(node);
>
//node.collectInto(nodeList,ImageTag.IMAGE_TAG_FILTER);
> node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);
> 
> }//for
> 
> System.out.print("CHECKING NODES.. " +
> nodeList.toString()+ "\n");
> 
> //now process the links and images
> //this is the part that doesnt seem to work
> 
> for (SimpleNodeIterator e =
> nodeList.elements();e.hasMoreNodes();) {
> LinkTag linkTag = (LinkTag)e.nextNode();
> 
> //put the links and their texts into vectors
> allTextLinkVector.addElement(linkTag.getLinkText());
> allLinkVector.addElement(linkTag.getLink());
> }
> // System.out.print( "All Links " + "Size: "+
> allTextLinkVector.size() + " 
> "+ allTextLinkVector.toString()+ "\n");
> 
> }//inner try
> 
> catch (ParserException e) {
> System.err.println("Error, could not create parser
> object");
> e.printStackTrace();
> }//catch
> }// outer try
> catch(IOException ex) { ex.printStackTrace(); }
> 
> 
> 
> 
> 
> 
> >From: htm...@li...
> Reply-To: 
> >htm...@li... To: 
> >htm...@li... Subject:
> Htmlparser-user digest, Vol 
> >1 #226 - 2 msgs Date: Thu, 27 Mar 2003 12:49:39
> -0800
> >
> >Send Htmlparser-user mailing list submissions to 
> >htm...@li...
> >
> >To subscribe or unsubscribe via the World Wide Web,
> visit 
>
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
> or, via email, 
> >send a message with subject or body 'help' to 
> >htm...@li...
> >
> >You can reach the person managing the list at 
> >htm...@li...
> >
> >When replying, please edit your Subject line so it
> is more specific than 
> >"Re: Contents of Htmlparser-user digest..."
> >
> >
> >Today's Topics:
> >
> >1. Help with method --> node.collectInto() (ope
> tomori) 2. RE: Help with 
> >method --> node.collectInto() (Marc Novakowski)
> >
> >--__--__--
> >
> >Message: 1 From: "ope tomori" To:
> htm...@li... 
> >Date: Thu, 27 Mar 2003 15:00:17 +0000 Subject:
> [Htmlparser-user] Help with 
> >method --> node.collectInto() Reply-To: 
> >htm...@li...
> >
> >
> >Hi Im trying to use the method
> node.collectInto(...) to extract embedded 
> >links and images on webpages. Im using the latest
> integration release which 
> >means its now Parser, not HTMLParser, nodeIterator,
> etc and all the other 
> >changes.
> >
> >
> >
> >I followed the sample code:
> >
> >HTMLParser parser = new
> HTMLParser("http://www.yahoo.com"); 
> >parser.registerScanners(); int i = 0; Vector
> collectionVector = new 
> >Vector(); HTMLNode node; for (HTMLEnumeration e = 
> >parser.elements();e.hasMoreNodes();) { node =
> e.nextHTMLNode(); 
>
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
> } // All 
> >items in the collection vector should be links for
> (Enumeration e = 
> >collectionVector.elements();e.hasMoreElements();) {
> HTMLLinkTag linkTag = 
> >(HTMLLinkTag)e.nextElement(); // you can now
> process the links as you like 
> >}
>
***********************************************************
> >
> >
> >Im getting an error because this line:
> >
>
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
> requires a 
> >nodeList and not a vector, ive tried changing it
> without any success: 
> >Creating a nodelist instead of a vector,
> >
> >can u please help me!!
> >
> >Thanks Ope
> >
> >
>
>_________________________________________________________________
> The new 
> >MSN 8: advanced junk mail protection and 2 months
> FREE* 
> >http://join.msn.com/?page=features/junkmail
> >
> >
> >
> >--__--__--
> >
> >Message: 2 Subject: RE: [Htmlparser-user] Help with
> method --> 
> >node.collectInto() Date: Thu, 27 Mar 2003 08:30:54
> -0800 From: "Marc 
> >Novakowski" To: Reply-To:
> htm...@li...
> >
> >If you can paste the actual code you're trying to
> compile, I'd be more = 
> >than happy to take a look at it.
> >
> >Marc
> >
> >-----Original Message----- From: ope tomori
> [mailto:op...@ho...] 
> >Sent: Thursday, March 27, 2003 7:00 AM To: 
> >htm...@li... Subject:
> [Htmlparser-user] 
=== message truncated ===


__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

RE: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2 msgs

From: Marc N. <ma...@ke...> - 2003-03-27 22:19:58

Try removing the following line from your code:

nodeList.add(node);

It's most likely adding non-LinkTag nodes into nodeList which causes the =
ClassCastException later on.

Marc

-----Original Message-----
From: ope tomori [mailto:op...@ho...]
Sent: Thursday, March 27, 2003 1:31 PM
To: htm...@li...
Subject: [Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2
msgs


I figured out the part using the nodeList.collectInto. My debug output =
shows=20
the right output, put when i try to process the link information, i get =
this=20
error (this is part of the error):

Exception occurred during event dispatching:
java.lang.ClassCastException: org.htmlparser.tags.DoctypeTag


Thanks in advance for your help

Sincerely,
Ope T.


This is my code below:
try{
//create the parser with the url to be parsed
parser =3D new Parser(urlAddressComplete,new DefaultParserFeedback());
parser.registerScanners();
nodeList =3D new NodeList();

//to extratct all the embedded links and images

for (NodeIterator e =3D parser.elements();e.hasMoreNodes();) {
Node node =3D (Node)e.nextNode();
nodeList.add(node);
//node.collectInto(nodeList,ImageTag.IMAGE_TAG_FILTER);
node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);

}//for

System.out.print("CHECKING NODES.. " + nodeList.toString()+ "\n");

//now process the links and images
//this is the part that doesnt seem to work

for (SimpleNodeIterator e =3D nodeList.elements();e.hasMoreNodes();) {
LinkTag linkTag =3D (LinkTag)e.nextNode();

//put the links and their texts into vectors
allTextLinkVector.addElement(linkTag.getLinkText());
allLinkVector.addElement(linkTag.getLink());
}
// System.out.print( "All Links " + "Size: "+ allTextLinkVector.size() + =
"=20
"+ allTextLinkVector.toString()+ "\n");

}//inner try

catch (ParserException e) {
System.err.println("Error, could not create parser object");
e.printStackTrace();
}//catch
}// outer try
catch(IOException ex) { ex.printStackTrace(); }






>From: htm...@li... Reply-To:=20
>htm...@li... To:=20
>htm...@li... Subject: Htmlparser-user digest, =
Vol=20
>1 #226 - 2 msgs Date: Thu, 27 Mar 2003 12:49:39 -0800
>
>Send Htmlparser-user mailing list submissions to=20
>htm...@li...
>
>To subscribe or unsubscribe via the World Wide Web, visit=20
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user or, via =
email,=20
>send a message with subject or body 'help' to=20
>htm...@li...
>
>You can reach the person managing the list at=20
>htm...@li...
>
>When replying, please edit your Subject line so it is more specific =
than=20
>"Re: Contents of Htmlparser-user digest..."
>
>
>Today's Topics:
>
>1. Help with method --> node.collectInto() (ope tomori) 2. RE: Help =
with=20
>method --> node.collectInto() (Marc Novakowski)
>
>--__--__--
>
>Message: 1 From: "ope tomori" To: htm...@li... =

>Date: Thu, 27 Mar 2003 15:00:17 +0000 Subject: [Htmlparser-user] Help =
with=20
>method --> node.collectInto() Reply-To:=20
>htm...@li...
>
>
>Hi Im trying to use the method node.collectInto(...) to extract =
embedded=20
>links and images on webpages. Im using the latest integration release =
which=20
>means its now Parser, not HTMLParser, nodeIterator, etc and all the =
other=20
>changes.
>
>
>
>I followed the sample code:
>
>HTMLParser parser =3D new HTMLParser("http://www.yahoo.com");=20
>parser.registerScanners(); int i =3D 0; Vector collectionVector =3D new =

>Vector(); HTMLNode node; for (HTMLEnumeration e =3D=20
>parser.elements();e.hasMoreNodes();) { node =3D e.nextHTMLNode();=20
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER); } // =
All=20
>items in the collection vector should be links for (Enumeration e =3D=20
>collectionVector.elements();e.hasMoreElements();) { HTMLLinkTag linkTag =
=3D=20
>(HTMLLinkTag)e.nextElement(); // you can now process the links as you =
like=20
>} ***********************************************************
>
>
>Im getting an error because this line:
>
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER); =
requires a=20
>nodeList and not a vector, ive tried changing it without any success:=20
>Creating a nodelist instead of a vector,
>
>can u please help me!!
>
>Thanks Ope
>
>
>_________________________________________________________________ The =
new=20
>MSN 8: advanced junk mail protection and 2 months FREE*=20
>http://join.msn.com/?page=3Dfeatures/junkmail
>
>
>
>--__--__--
>
>Message: 2 Subject: RE: [Htmlparser-user] Help with method -->=20
>node.collectInto() Date: Thu, 27 Mar 2003 08:30:54 -0800 From: "Marc=20
>Novakowski" To: Reply-To: htm...@li...
>
>If you can paste the actual code you're trying to compile, I'd be more =
=3D=20
>than happy to take a look at it.
>
>Marc
>
>-----Original Message----- From: ope tomori [mailto:op...@ho...] =

>Sent: Thursday, March 27, 2003 7:00 AM To:=20
>htm...@li... Subject: [Htmlparser-user] Help =
with=20
>method --> node.collectInto()
>
>
>
>Hi Im trying to use the method node.collectInto(...) to extract =
embedded =3D
>
>links and images on webpages. Im using the latest integration release =
which=20
>means its now Parser, not=3D20 HTMLParser, nodeIterator, etc and all =
the=20
>other changes.
>
>
>
>I followed the sample code:
>
>HTMLParser parser =3D3D new HTMLParser("http://www.yahoo.com");=20
>parser.registerScanners(); int i =3D3D 0; Vector collectionVector =3D3D =
new=20
>Vector(); HTMLNode node; for (HTMLEnumeration e =3D3D=20
>parser.elements();e.hasMoreNodes();) { node =3D3D e.nextHTMLNode();=20
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER); } // =
All=20
>items in the collection vector should be links for (Enumeration e =3D3D =
=3D=20
>collectionVector.elements();e.hasMoreElements();) { HTMLLinkTag linkTag =
=3D3D=20
>(HTMLLinkTag)e.nextElement(); // you can now process the links as you =
like=20
>} ***********************************************************
>
>
>Im getting an error because this line:
>
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER); =
requires a=20
>nodeList and not a vector, ive tried changing it without any=3D20 =
success:=20
>Creating a nodelist instead of a vector,
>
>can u please help me!!
>
>Thanks Ope
>
>
>_________________________________________________________________ The =
new=20
>MSN 8: advanced junk mail protection and 2 months FREE* =3D20=20
>http://join.msn.com/?page=3D3Dfeatures/junkmail
>
>
>
>------------------------------------------------------- This SF.net =
email=20
>is sponsored by: The Definitive IT and Networking Event. Be There!=20
>NetWorld+Interop Las Vegas 2003 -- Register today!=20
>http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en=20
>_______________________________________________ Htmlparser-user mailing =

>list Htm...@li...=20
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>
>--__--__--
>
>_______________________________________________ Htmlparser-user mailing =

>list Htm...@li...=20
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>End of Htmlparser-user Digest

_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE* =20
http://join.msn.com/?page=3Dfeatures/junkmail



-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user

[Htmlparser-user] Re: Htmlparser-user digest, Vol 1 #226 - 2 msgs

From: ope t. <op...@ho...> - 2003-03-27 21:30:53

I figured out the part using the nodeList.collectInto. My debug output shows 
the right output, put when i try to process the link information, i get this 
error (this is part of the error):

Exception occurred during event dispatching:
java.lang.ClassCastException: org.htmlparser.tags.DoctypeTag


Thanks in advance for your help

Sincerely,
Ope T.


This is my code below:
try{
//create the parser with the url to be parsed
parser = new Parser(urlAddressComplete,new DefaultParserFeedback());
parser.registerScanners();
nodeList = new NodeList();

//to extratct all the embedded links and images

for (NodeIterator e = parser.elements();e.hasMoreNodes();) {
Node node = (Node)e.nextNode();
nodeList.add(node);
//node.collectInto(nodeList,ImageTag.IMAGE_TAG_FILTER);
node.collectInto(nodeList,LinkTag.LINK_TAG_FILTER);

}//for

System.out.print("CHECKING NODES.. " + nodeList.toString()+ "\n");

//now process the links and images
//this is the part that doesnt seem to work

for (SimpleNodeIterator e = nodeList.elements();e.hasMoreNodes();) {
LinkTag linkTag = (LinkTag)e.nextNode();

//put the links and their texts into vectors
allTextLinkVector.addElement(linkTag.getLinkText());
allLinkVector.addElement(linkTag.getLink());
}
// System.out.print( "All Links " + "Size: "+ allTextLinkVector.size() + " 
"+ allTextLinkVector.toString()+ "\n");

}//inner try

catch (ParserException e) {
System.err.println("Error, could not create parser object");
e.printStackTrace();
}//catch
}// outer try
catch(IOException ex) { ex.printStackTrace(); }






>From: htm...@li... Reply-To: 
>htm...@li... To: 
>htm...@li... Subject: Htmlparser-user digest, Vol 
>1 #226 - 2 msgs Date: Thu, 27 Mar 2003 12:49:39 -0800
>
>Send Htmlparser-user mailing list submissions to 
>htm...@li...
>
>To subscribe or unsubscribe via the World Wide Web, visit 
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user or, via email, 
>send a message with subject or body 'help' to 
>htm...@li...
>
>You can reach the person managing the list at 
>htm...@li...
>
>When replying, please edit your Subject line so it is more specific than 
>"Re: Contents of Htmlparser-user digest..."
>
>
>Today's Topics:
>
>1. Help with method --> node.collectInto() (ope tomori) 2. RE: Help with 
>method --> node.collectInto() (Marc Novakowski)
>
>--__--__--
>
>Message: 1 From: "ope tomori" To: htm...@li... 
>Date: Thu, 27 Mar 2003 15:00:17 +0000 Subject: [Htmlparser-user] Help with 
>method --> node.collectInto() Reply-To: 
>htm...@li...
>
>
>Hi Im trying to use the method node.collectInto(...) to extract embedded 
>links and images on webpages. Im using the latest integration release which 
>means its now Parser, not HTMLParser, nodeIterator, etc and all the other 
>changes.
>
>
>
>I followed the sample code:
>
>HTMLParser parser = new HTMLParser("http://www.yahoo.com"); 
>parser.registerScanners(); int i = 0; Vector collectionVector = new 
>Vector(); HTMLNode node; for (HTMLEnumeration e = 
>parser.elements();e.hasMoreNodes();) { node = e.nextHTMLNode(); 
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER); } // All 
>items in the collection vector should be links for (Enumeration e = 
>collectionVector.elements();e.hasMoreElements();) { HTMLLinkTag linkTag = 
>(HTMLLinkTag)e.nextElement(); // you can now process the links as you like 
>} ***********************************************************
>
>
>Im getting an error because this line:
>
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER); requires a 
>nodeList and not a vector, ive tried changing it without any success: 
>Creating a nodelist instead of a vector,
>
>can u please help me!!
>
>Thanks Ope
>
>
>_________________________________________________________________ The new 
>MSN 8: advanced junk mail protection and 2 months FREE* 
>http://join.msn.com/?page=features/junkmail
>
>
>
>--__--__--
>
>Message: 2 Subject: RE: [Htmlparser-user] Help with method --> 
>node.collectInto() Date: Thu, 27 Mar 2003 08:30:54 -0800 From: "Marc 
>Novakowski" To: Reply-To: htm...@li...
>
>If you can paste the actual code you're trying to compile, I'd be more = 
>than happy to take a look at it.
>
>Marc
>
>-----Original Message----- From: ope tomori [mailto:op...@ho...] 
>Sent: Thursday, March 27, 2003 7:00 AM To: 
>htm...@li... Subject: [Htmlparser-user] Help with 
>method --> node.collectInto()
>
>
>
>Hi Im trying to use the method node.collectInto(...) to extract embedded =
>
>links and images on webpages. Im using the latest integration release which 
>means its now Parser, not=20 HTMLParser, nodeIterator, etc and all the 
>other changes.
>
>
>
>I followed the sample code:
>
>HTMLParser parser =3D new HTMLParser("http://www.yahoo.com"); 
>parser.registerScanners(); int i =3D 0; Vector collectionVector =3D new 
>Vector(); HTMLNode node; for (HTMLEnumeration e =3D 
>parser.elements();e.hasMoreNodes();) { node =3D e.nextHTMLNode(); 
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER); } // All 
>items in the collection vector should be links for (Enumeration e =3D = 
>collectionVector.elements();e.hasMoreElements();) { HTMLLinkTag linkTag =3D 
>(HTMLLinkTag)e.nextElement(); // you can now process the links as you like 
>} ***********************************************************
>
>
>Im getting an error because this line:
>
>node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER); requires a 
>nodeList and not a vector, ive tried changing it without any=20 success: 
>Creating a nodelist instead of a vector,
>
>can u please help me!!
>
>Thanks Ope
>
>
>_________________________________________________________________ The new 
>MSN 8: advanced junk mail protection and 2 months FREE* =20 
>http://join.msn.com/?page=3Dfeatures/junkmail
>
>
>
>------------------------------------------------------- This SF.net email 
>is sponsored by: The Definitive IT and Networking Event. Be There! 
>NetWorld+Interop Las Vegas 2003 -- Register today! 
>http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en 
>_______________________________________________ Htmlparser-user mailing 
>list Htm...@li... 
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>
>--__--__--
>
>_______________________________________________ Htmlparser-user mailing 
>list Htm...@li... 
>https://lists.sourceforge.net/lists/listinfo/htmlparser-user
>
>
>End of Htmlparser-user Digest

_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail

RE: [Htmlparser-user] Help with method --> node.collectInto()

From: Marc N. <ma...@ke...> - 2003-03-27 16:31:00

If you can paste the actual code you're trying to compile, I'd be more =
than happy to take a look at it.

Marc

-----Original Message-----
From: ope tomori [mailto:op...@ho...]
Sent: Thursday, March 27, 2003 7:00 AM
To: htm...@li...
Subject: [Htmlparser-user] Help with method --> node.collectInto()



Hi Im trying to use the method node.collectInto(...) to extract embedded =

links and images on webpages.
Im using the latest integration release which means its now Parser, not=20
HTMLParser, nodeIterator, etc and all the other changes.



I followed the sample code:

HTMLParser parser =3D new HTMLParser("http://www.yahoo.com");
  parser.registerScanners();
  int i =3D 0;
  Vector collectionVector =3D new Vector();
  HTMLNode node;
  for (HTMLEnumeration e =3D parser.elements();e.hasMoreNodes();) {
    node =3D e.nextHTMLNode();
    node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
  }
  // All items in the collection vector should be links
  for (Enumeration e =3D =
collectionVector.elements();e.hasMoreElements();) {
    HTMLLinkTag linkTag =3D (HTMLLinkTag)e.nextElement();
    // you can now process the links as you like
  }
***********************************************************


Im getting an error because this line:

node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
requires a nodeList and not a vector, ive tried changing it without any=20
success: Creating a nodelist instead of a vector,

can u please help me!!

Thanks
Ope


_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE* =20
http://join.msn.com/?page=3Dfeatures/junkmail



-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user

[Htmlparser-user] Help with method --> node.collectInto()

From: ope t. <op...@ho...> - 2003-03-27 15:00:29

Hi Im trying to use the method node.collectInto(...) to extract embedded 
links and images on webpages.
Im using the latest integration release which means its now Parser, not 
HTMLParser, nodeIterator, etc and all the other changes.



I followed the sample code:

HTMLParser parser = new HTMLParser("http://www.yahoo.com");
  parser.registerScanners();
  int i = 0;
  Vector collectionVector = new Vector();
  HTMLNode node;
  for (HTMLEnumeration e = parser.elements();e.hasMoreNodes();) {
    node = e.nextHTMLNode();
    node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
  }
  // All items in the collection vector should be links
  for (Enumeration e = collectionVector.elements();e.hasMoreElements();) {
    HTMLLinkTag linkTag = (HTMLLinkTag)e.nextElement();
    // you can now process the links as you like
  }
***********************************************************


Im getting an error because this line:

node.collectInto(collectionVector,HTMLLinkTag.LINK_TAG_FILTER);
requires a nodeList and not a vector, ive tried changing it without any 
success: Creating a nodelist instead of a vector,

can u please help me!!

Thanks
Ope


_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail

RE: [Htmlparser-user] Integration Release 1.3-20030323 is out

From: Marc N. <ma...@ke...> - 2003-03-24 23:23:45

Somik,

Thanks for fixing 702614!  Unfortunately I can't seem to get the latest =
build to work.  It's throwing an OOM exception in my own code when using =
the NodeIterator returned by parser.elements().  I'm looking into this =
to make sure I'm not doing something stupid in my code.  However, the =
library seems to be acting differently than previous releases even =
out-of-the-box.  For example, the following used to return a list of the =
links on Yahoo (in the 0302 release):

java -jar ./htmlparser.jar http://www.yahoo.com -l

In the 0323 release, however, it returns nothing.

Marc

-----Original Message-----
From: Somik Raha [mailto:so...@ya...]
Sent: Sunday, March 23, 2003 5:24 PM
To: HTMLParser Announcement List; HTMLParser User List; HTMLParser
Developer List
Subject: [Htmlparser-user] Integration Release 1.3-20030323 is out


Hi Folks,
    This week's integration release has two important fixes :

Integration build 1.3 - 20030323
--------------------------------
[1] Fixed bug 702547 - single quotes parsed more robustly now
[2] Fixed bug 702614 - empty tags handled correctly now. Tag now has a
method isEmptyXmlTag().

#2 refers to tags like <tag/>.

Thanks to Joe Robbins for a fine bug report that helped in putting in =
the
fix for #1 faster. Thanks also to Marc Novakowski for the other report.

Thanks are also due to Huang-Chun Yu for uncovering a serious bug with =
the
script scanning mechanism. The parser can currently handle script tags =
like
:

<script>
<!--
    code here
-->
</script>

But when the tags are like:
<script>
    code here
</script>

the parser is unable to identify the code and treats it like regular =
tags.
Such pages are quite widespread and ought to be supported. I was curious =
if
anyone has ideas on solving this - given the existing design - fresh =
ideas
often lead to a better perspective. If you have some ideas, feel free to
join the developer list
(http://lists.sourceforge.net/lists/listinfo/htmlparser-developer) and =
post.

Regards,
Somik



-------------------------------------------------------
This SF.net email is sponsored by:Crypto Challenge is now open!=20
Get cracking and register here for some mind boggling fun and=20
the chance of winning an Apple iPod:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
_______________________________________________
Htmlparser-user mailing list
Htm...@li...
https://lists.sourceforge.net/lists/listinfo/htmlparser-user

[Htmlparser-user] Re: Re: META-Charset-crash

From: mohammad a. <re...@em...> - 2003-03-24 11:45:15

I didnt mean to jump on u or anyone else, or evene complaining. i totaly understand your situation, its same for me, i only have my free time to work on my personal projects.

what i meant was that, this kind of bugs, if u can call it a bug, should be easier and faster to fix, but thats only how i see it, it may be more complicated.

what i've understand av the source code is that this only happens once, when meta-scanninga starts, and therfore it should be fixed easily to let the meta-tag use different charsets. when i say "stupid bug", i mean it shouldnt be there at all, i can't understand why the designers and developers would consider every page use ISO-charsets, when there are som many of them. but thats just my opinion. i hope u dont missunderstood me about "put everything down and fix the bug" thing, its just i see it as "easy to fix" and really would help me, but thats just my opinion.

i've seen a new "Integration Releaset, but what a dissapointment that the cahrset-bug is not fixed!

i hope everyone have noticed the bug report for META-charset bug. as i said before, my solution was just temporary and is not a good one of 2 reasons: i dont have enough skills in this matter to come with good solutions, and i hav'nt yet checked through the whole code, as i consider it important to be able to suggest fixes.

i hope the bug report is enough to fix the probelm.

rezamotori, Sweden
--
_______________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

[Htmlparser-user] Integration Release 1.3-20030323 is out

From: Somik R. <so...@ya...> - 2003-03-24 01:22:13

Hi Folks,
    This week's integration release has two important fixes :

Integration build 1.3 - 20030323
--------------------------------
[1] Fixed bug 702547 - single quotes parsed more robustly now
[2] Fixed bug 702614 - empty tags handled correctly now. Tag now has a
method isEmptyXmlTag().

#2 refers to tags like <tag/>.

Thanks to Joe Robbins for a fine bug report that helped in putting in the
fix for #1 faster. Thanks also to Marc Novakowski for the other report.

Thanks are also due to Huang-Chun Yu for uncovering a serious bug with the
script scanning mechanism. The parser can currently handle script tags like
:

<script>
<!--
    code here
-->
</script>

But when the tags are like:
<script>
    code here
</script>

the parser is unable to identify the code and treats it like regular tags.
Such pages are quite widespread and ought to be supported. I was curious if
anyone has ideas on solving this - given the existing design - fresh ideas
often lead to a better perspective. If you have some ideas, feel free to
join the developer list
(http://lists.sourceforge.net/lists/listinfo/htmlparser-developer) and post.

Regards,
Somik

Re: [Htmlparser-user] META-Charset-crash

From: Somik R. <so...@ya...> - 2003-03-23 16:41:07

mohammad azadi wrote:
> I really think it's an stupid bug that all pages must use ISO-charset!
cant u just fix the damn thing and make it as a patch so we can continue
with our work??

If you're objecting to my request to file a bug report- then pls note that I
cannot devote weekdays to the project, only my personal time on weekends.
And when I do get the time, I do not prefer to search all emails on the user
list to find what bugs need to be tackled. As far as the bug in question
being stupid-  all bugs are stupid, its just that one person does not have
the time to find them all, and code is often written by more than one
person. There are also development priorities - certain bugs take
precendence - in my opinion, which I often base on feedback. Since this is
not a paid project, you cannot expect me or any other developer to jump on
an incomplete bug report - the least we expect is the community to help out.

However, if a certain bug hurts you, and needs fixing, you could always make
a polite request. Or solve it yourself and give it to the community, for
which all of us will be grateful.

> my suggestion is to have an String[] containing all the common charsets,
and enable it to expand for new charsets.
> I don't think it should take long to fix it, i've tried myself, but it
just was a temperary fix.

Thank you for the suggestion. Perhaps you can give us the patch in question.
And just so you don't think I am being sarcastic, I'd be happy to have you
on our developer team - anyone who wants to improve the system earns a right
to be on the dev team.

In general - I think it will be good to have guidelines for posting
questions to make us a more effective community. I try to follow this Eric
Raymond's well-written paper-
http://www.catb.org/%7Eesr/faqs/smart-questions.html

Regards,
Somik

[Htmlparser-user] META-Charset-crash

From: mohammad a. <re...@em...> - 2003-03-23 14:09:14

I really think it's an stupid bug that all pages must use ISO-charset! cant u just fix the damn thing and make it as a patch so we can continue with our work??

my suggestion is to have an String[] containing all the common charsets, and enable it to expand for new charsets.

I don't think it should take long to fix it, i've tried myself, but it just was a temperary fix.

Rezamotori, Sweden
-- 
_______________________________________________
Sign-up for your own FREE Personalized E-mail at Mail.com
http://www.mail.com/?sr=signup

Re: [Htmlparser-user] Re: How to filter the "WARNING:" message!

From: Somik R. <so...@ya...> - 2003-03-21 19:48:40

You should be able to suppress all the feedback. Check
http://htmlparser.sourceforge.net/docs/index.php/FeedbackMechanism

Regards,
Somik
--- Sean_Syslab <se...@sy...> wrote:
>   Sorry, I misunderstand the return strings. The
> WARNING messsges are not within the return strings
> of the methods, but are shown after that.
> 
>   Dear all:
> 
>   When I used the sample program to extract links or
> strings, there were sometimes WARNING messages shown
> within the return strings. I don't want those
> WARNING strings accompanied with the return value.
> What should I do...
> 
>                                                     
>                  Yours, Sean
> 

__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

[Htmlparser-user] Re: How to filter the "WARNING:" message!

From: Sean_Syslab <se...@sy...> - 2003-03-21 19:14:46

  Sorry, I misunderstand the return strings. The WARNING messsges are =
not within the return strings of the methods, but are shown after that.

  Dear all:

  When I used the sample program to extract links or strings, there were =
sometimes WARNING messages shown within the return strings. I don't want =
those WARNING strings accompanied with the return value. What should I =
do...

                                                                      =
Yours, Sean

[Htmlparser-user] How to filter the "WARNING:" message!

From: Sean_Syslab <se...@sy...> - 2003-03-21 18:30:17

Dear all:

When I used the sample program to extract links or strings, there were =
sometimes WARNING messages shown within the return strings. I don't want =
those WARNING strings accompanied with the return value. What should I =
do...

                                                                    =
Yours, Sean