Thread: Re: [Htmlparser-user] Hints on how to change image tag locations and writeoutdocument
Brought to you by:
derrickoswald
From: Raghavender S. <kin...@ho...> - 2002-05-08 19:54:09
Attachments:
yahoo.txt
Parser.java
|
Hi Somik, I was using the 1.1 version of htmlparser. I save the www.yahoo.com content in a flat file yahoo.txt. and I run the parser against this. throws a nullpointerexception in HTMLScriptScanner. this seems to be a new addition for 1.1. I will send the stacktrace, the main program and the yahoo.txt. actually I cannot send the stacktrace. I made some changes and the line numbers dont match. but if you run this program you would see the nullpointerexception. Thanks, Raghav >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >Subject: Re: [Htmlparser-user] Hints on how to change image tag locations >and writeoutdocument >Date: Mon, 6 May 2002 13:59:11 +0900 > >Hi Raghav, > I sent another mail sometime back to you - > >"HTMLLinkTag.linkData() - this gives you an enumeration - and in the >enumeration will be your HTMLImageTag." >HTMLNode node; >HTMLImageTag imageTag; >for (Enumeration e = linkTag.linkData();e.hasMoreElements();) { > node = (HTMLNode)e.nextElement(); > if (node instanceof HTMLImageTag) { > imageTag = (HTMLImageTag)node; > // your code here > } >} > >Regards, >Somik >----- Original Message ----- >From: "Raghavender Srimantula" <kin...@ho...> >To: <htm...@li...> >Sent: Monday, May 06, 2002 10:43 AM >Subject: Re: [Htmlparser-user] Hints on how to change image tag locations >and writeoutdocument > > > > Hi Somik, > > this question is regarding "not all images are being retrieved". I mean >the > > images under <a tag. I did try to open the attachment you sent me. I >could > > not find anything. but seeing the previous mails I could read that it is >not > > a bug. but still if I do want to retrieve all the images how do I do it. > > Thanks, > > Raghav > > > > > > >From: "Somik Raha" <so...@ya...> > > >Reply-To: htm...@li... > > >To: <htm...@li...> > > >Subject: Re: [Htmlparser-user] Hints on how to change image tag >locations > > >and write outdocument > > >Date: Tue, 30 Apr 2002 11:37:26 +0900 > > > > > >Hi Raghav, > > > Ah - this was a question by Annette Doyle (titled "Not all image >tags > > >are returned"). I am attaching my reply. > > > > > >Regards > > >Somik > > > > > >----- Original Message ----- > > >From: "Raghavender Srimantula" <kin...@ho...> > > >To: <htm...@li...> > > >Sent: Tuesday, April 30, 2002 11:16 AM > > >Subject: Re: [Htmlparser-user] Hints on how to change image tag >locations > > >and write outdocument > > > > > > > > > > hi Somik, > > > > I found one more interesting thing here. when I am trying to get all >the > > > > images the image scanner would give me images > > > > <img >src="http://us.i1.yimg.com/us.yimg.com/i/mntl/sh/mom02/title4.gif" > > > > width=296 height=27 border=0 usemap=#tm> > > > > so if I do a imagetag.getImageLocation(), I would get > > > > http://us.i1.yimg.com/us.yimg.com/i/mntl/sh/mom02/title4.gif > > > > > > > > but is the html content is like this > > > > <a href=s/6006><img > > >src=http://us.i1.yimg.com/us.yimg.com/i/us/hj/hjys.gif > > > > border=0 width=70 height=22></a> > > > > which starts with <a and ends with </a>, then the image scanner will >not > > > > give me http://us.i1.yimg.com/us.yimg.com/i/us/hj/hjys.gif when I do >a > > > > imagetag.getImageLocation(). this is not even classified as an >ImageTag. > > > > this is classified as LinkTag. how to get this image. > > > > > > > > the above content is from www.yahoo.com. on the netscape browser if >you > > >goto > > > > view-->pageinfo, you will see a bunch of images. > > > > but when you run the htmlparser you can get only one image. > > > > > > > > Thanks, > > > > Raghav > > > > > > > > > > > > >From: "Somik Raha" <so...@ya...> > > > > >Reply-To: htm...@li... > > > > >To: <htm...@li...> > > > > >Subject: Re: [Htmlparser-user] Hints on how to change image tag > > >locations > > > > >and write outdocument > > > > >Date: Tue, 30 Apr 2002 09:15:38 +0900 > > > > > > > > > >Can you describe your application ? Was it parsing a single page >when > > >the > > > > >problem occurred ? > > > > > > > > > >Regards, > > > > >Somik > > > > >----- Original Message ----- > > > > >From: "Raghavender Srimantula" <kin...@ho...> > > > > >To: <htm...@li...> > > > > >Cc: <htm...@li...> > > > > >Sent: Tuesday, April 30, 2002 8:36 AM > > > > >Subject: Re: [Htmlparser-user] Hints on how to change image tag > > >locations > > > > >and write outdocument > > > > > > > > > > > > > > > > Hi Somik, > > > > > > I encountered a strange problem today. while I was running > > > > >htmlparser...I > > > > > > got a java.lang.OutOfMemoryError. seems that lot of objects are > > >being > > > > > > allocated. where exactly is this happening. I mean could you >give >me > > >an > > > > >idea > > > > > > where or in which file the potential problem could be. > > > > > > Raghav > > > > > > > > > > > > > > > > > > >From: "Somik Raha" <so...@ya...> > > > > > > >Reply-To: htm...@li... > > > > > > >To: <htm...@li...> > > > > > > >CC: <htm...@li...> > > > > > > >Subject: Re: [Htmlparser-user] Hints on how to change image tag > > > > >locations > > > > > > >and write out document > > > > > > >Date: Sat, 27 Apr 2002 18:22:34 +0900 > > > > > > > > > > > > > >Hi Annette, > > > > > > > Pls find attached a program to get you started. This >program > > >will > > > > >do > > > > > > >what you want - you will need to modify the construct that >checks > > >for > > > > >the > > > > > > >image tag - and replace it with the location of your choice. > > > > > > > Also - I found one bug thanks to this requirement - image >tags > > > > >params > > > > > > >were not being correctly put in. Though it needs a deeper look, >I > > >have > > > > >done > > > > > > >a quick fix for now, and all test cases are passing (with one >test > > >case > > > > >in > > > > > > >HTMLImageScannerTest trapping this bug). > > > > > > > Please check out the latest html parser source code from >CVS. > > > > > > > > > > > > > >Regards, > > > > > > >Somik > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: Doyle, Annette > > > > > > > To: htm...@li... > > > > > > > Sent: Friday, April 26, 2002 10:08 PM > > > > > > > Subject: [Htmlparser-user] Hints on how to change image tag > > > > >locations > > > > > > >and write out document > > > > > > > > > > > > > > > > > > > > > Could you please give me some hints as how to change only >image > > >tag > > > > > > >locations and then, (or at the same time) write out the html > > >document > > > > >to > > > > > > >file (with new image tag locations)? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks- > > > > > > > > > > > > > > Annette Doyle > > > > > > > > > > > > > ><< ImageTagRetriever.java >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >_________________________________________________________________ > > > > > > Join the world's largest e-mail service with MSN Hotmail. > > > > > > http://www.hotmail.com > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Htmlparser-user mailing list > > > > > > Htm...@li... > > > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > > > > > > >_______________________________________________ > > > > >Htmlparser-user mailing list > > > > >Htm...@li... > > > > >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > > > > > > > > > > > > _________________________________________________________________ > > > > Send and receive Hotmail on your mobile device: >http://mobile.msn.com > > > > > > > > > > > > _______________________________________________ > > > > Htmlparser-user mailing list > > > > Htm...@li... > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > ><< > > > >[Htmlparser-developer]Re_[Htmlparser-user]Notallimagetagsarereturned[NotaBu >g].eml > > > >> > > > > > > > > > > _________________________________________________________________ > > MSN Photos is the easiest way to share and print your photos: > > http://photos.msn.com/support/worldwide.aspx > > > > > > _______________________________________________________________ > > > > Have big pipes? SourceForge.net is looking for download mirrors. We >supply > > the hardware. You get the recognition. Email Us: >ban...@so... > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. |
From: Somik R. <so...@ya...> - 2002-05-09 15:43:29
|
Hi Raghav, On analyzing yahoo.txt, I found that you have incorrect html. There = is a script tag that has not been closed. So naturally the script = scanner goes bonkers. Rename the extension to .html, and open this file = in IE, and you will find that IE also cant handle this. I verified from www.yahoo.com, and found that they do have the = correct </script> tag provided. So I guess your yahoo.txt file is = faulty. Regards, Somik ----- Original Message -----=20 From: Raghavender Srimantula=20 To: htm...@li...=20 Sent: Thursday, May 09, 2002 4:53 AM Subject: Re: [Htmlparser-user] Hints on how to change image tag = locations andwriteoutdocument Hi Somik, I was using the 1.1 version of htmlparser. I save the www.yahoo.com = content=20 in a flat file yahoo.txt. and I run the parser against this. throws a=20 nullpointerexception in HTMLScriptScanner. this seems to be a new = addition=20 for 1.1. I will send the stacktrace, the main program and the = yahoo.txt. actually I cannot send the stacktrace. I made some changes and the = line=20 numbers dont match. but if you run this program you would see the=20 nullpointerexception. Thanks, Raghav >From: "Somik Raha" <so...@ya...> >Reply-To: htm...@li... >To: <htm...@li...> >Subject: Re: [Htmlparser-user] Hints on how to change image tag = locations=20 >and writeoutdocument >Date: Mon, 6 May 2002 13:59:11 +0900 > >Hi Raghav, > I sent another mail sometime back to you - > >"HTMLLinkTag.linkData() - this gives you an enumeration - and in the >enumeration will be your HTMLImageTag." >HTMLNode node; >HTMLImageTag imageTag; >for (Enumeration e =3D linkTag.linkData();e.hasMoreElements();) { > node =3D (HTMLNode)e.nextElement(); > if (node instanceof HTMLImageTag) { > imageTag =3D (HTMLImageTag)node; > // your code here > } >} > >Regards, >Somik >----- Original Message ----- >From: "Raghavender Srimantula" <kin...@ho...> >To: <htm...@li...> >Sent: Monday, May 06, 2002 10:43 AM >Subject: Re: [Htmlparser-user] Hints on how to change image tag = locations >and writeoutdocument > > > > Hi Somik, > > this question is regarding "not all images are being retrieved". I = mean >the > > images under <a tag. I did try to open the attachment you sent me. = I=20 >could > > not find anything. but seeing the previous mails I could read that = it is >not > > a bug. but still if I do want to retrieve all the images how do I = do it. > > Thanks, > > Raghav > > > > > > >From: "Somik Raha" <so...@ya...> > > >Reply-To: htm...@li... > > >To: <htm...@li...> > > >Subject: Re: [Htmlparser-user] Hints on how to change image tag=20 >locations > > >and write outdocument > > >Date: Tue, 30 Apr 2002 11:37:26 +0900 > > > > > >Hi Raghav, > > > Ah - this was a question by Annette Doyle (titled "Not all = image >tags > > >are returned"). I am attaching my reply. > > > > > >Regards > > >Somik > > > > > >----- Original Message ----- > > >From: "Raghavender Srimantula" <kin...@ho...> > > >To: <htm...@li...> > > >Sent: Tuesday, April 30, 2002 11:16 AM > > >Subject: Re: [Htmlparser-user] Hints on how to change image tag=20 >locations > > >and write outdocument > > > > > > > > > > hi Somik, > > > > I found one more interesting thing here. when I am trying to = get all >the > > > > images the image scanner would give me images > > > > <img >src=3D"http://us.i1.yimg.com/us.yimg.com/i/mntl/sh/mom02/title4.gif" > > > > width=3D296 height=3D27 border=3D0 usemap=3D#tm> > > > > so if I do a imagetag.getImageLocation(), I would get > > > > http://us.i1.yimg.com/us.yimg.com/i/mntl/sh/mom02/title4.gif > > > > > > > > but is the html content is like this > > > > <a href=3Ds/6006><img > > >src=3Dhttp://us.i1.yimg.com/us.yimg.com/i/us/hj/hjys.gif > > > > border=3D0 width=3D70 height=3D22></a> > > > > which starts with <a and ends with </a>, then the image = scanner will >not > > > > give me http://us.i1.yimg.com/us.yimg.com/i/us/hj/hjys.gif = when I do=20 >a > > > > imagetag.getImageLocation(). this is not even classified as an >ImageTag. > > > > this is classified as LinkTag. how to get this image. > > > > > > > > the above content is from www.yahoo.com. on the netscape = browser if >you > > >goto > > > > view-->pageinfo, you will see a bunch of images. > > > > but when you run the htmlparser you can get only one image. > > > > > > > > Thanks, > > > > Raghav > > > > > > > > > > > > >From: "Somik Raha" <so...@ya...> > > > > >Reply-To: htm...@li... > > > > >To: <htm...@li...> > > > > >Subject: Re: [Htmlparser-user] Hints on how to change image = tag > > >locations > > > > >and write outdocument > > > > >Date: Tue, 30 Apr 2002 09:15:38 +0900 > > > > > > > > > >Can you describe your application ? Was it parsing a single = page=20 >when > > >the > > > > >problem occurred ? > > > > > > > > > >Regards, > > > > >Somik > > > > >----- Original Message ----- > > > > >From: "Raghavender Srimantula" <kin...@ho...> > > > > >To: <htm...@li...> > > > > >Cc: <htm...@li...> > > > > >Sent: Tuesday, April 30, 2002 8:36 AM > > > > >Subject: Re: [Htmlparser-user] Hints on how to change image = tag > > >locations > > > > >and write outdocument > > > > > > > > > > > > > > > > Hi Somik, > > > > > > I encountered a strange problem today. while I was running > > > > >htmlparser...I > > > > > > got a java.lang.OutOfMemoryError. seems that lot of = objects are > > >being > > > > > > allocated. where exactly is this happening. I mean could = you=20 >give >me > > >an > > > > >idea > > > > > > where or in which file the potential problem could be. > > > > > > Raghav > > > > > > > > > > > > > > > > > > >From: "Somik Raha" <so...@ya...> > > > > > > >Reply-To: htm...@li... > > > > > > >To: <htm...@li...> > > > > > > >CC: <htm...@li...> > > > > > > >Subject: Re: [Htmlparser-user] Hints on how to change = image tag > > > > >locations > > > > > > >and write out document > > > > > > >Date: Sat, 27 Apr 2002 18:22:34 +0900 > > > > > > > > > > > > > >Hi Annette, > > > > > > > Pls find attached a program to get you started. This = >program > > >will > > > > >do > > > > > > >what you want - you will need to modify the construct = that=20 >checks > > >for > > > > >the > > > > > > >image tag - and replace it with the location of your = choice. > > > > > > > Also - I found one bug thanks to this requirement - = image >tags > > > > >params > > > > > > >were not being correctly put in. Though it needs a deeper = look,=20 >I > > >have > > > > >done > > > > > > >a quick fix for now, and all test cases are passing (with = one >test > > >case > > > > >in > > > > > > >HTMLImageScannerTest trapping this bug). > > > > > > > Please check out the latest html parser source code = from >CVS. > > > > > > > > > > > > > >Regards, > > > > > > >Somik > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > From: Doyle, Annette > > > > > > > To: htm...@li... > > > > > > > Sent: Friday, April 26, 2002 10:08 PM > > > > > > > Subject: [Htmlparser-user] Hints on how to change = image tag > > > > >locations > > > > > > >and write out document > > > > > > > > > > > > > > > > > > > > > Could you please give me some hints as how to change = only >image > > >tag > > > > > > >locations and then, (or at the same time) write out the = html > > >document > > > > >to > > > > > > >file (with new image tag locations)? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks- > > > > > > > > > > > > > > Annette Doyle > > > > > > > > > > > > > ><< ImageTagRetriever.java >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >=20 >_________________________________________________________________ > > > > > > Join the world's largest e-mail service with MSN Hotmail. > > > > > > http://www.hotmail.com > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Htmlparser-user mailing list > > > > > > Htm...@li... > > > > > > = https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > > > > > > >_______________________________________________ > > > > >Htmlparser-user mailing list > > > > >Htm...@li... > > > > >https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > > > > > > > > > > > > > > > > > > = _________________________________________________________________ > > > > Send and receive Hotmail on your mobile device:=20 >http://mobile.msn.com > > > > > > > > > > > > _______________________________________________ > > > > Htmlparser-user mailing list > > > > Htm...@li... > > > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > ><< > > > = >[Htmlparser-developer]Re_[Htmlparser-user]Notallimagetagsarereturned[Not= aBu >g].eml > > > >> > > > > > > > > > > _________________________________________________________________ > > MSN Photos is the easiest way to share and print your photos: > > http://photos.msn.com/support/worldwide.aspx > > > > > > _______________________________________________________________ > > > > Have big pipes? SourceForge.net is looking for download mirrors. = We=20 >supply > > the hardware. You get the recognition. Email Us:=20 >ban...@so... > > _______________________________________________ > > Htmlparser-user mailing list > > Htm...@li... > > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > > > >_______________________________________________ >Htmlparser-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlparser-user _________________________________________________________________ Get your FREE download of MSN Explorer at = http://explorer.msn.com/intl.asp. |