Re: [Htmlparser-user] RE: Hints on how to change image tag locations and write o
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-06-17 06:32:11
|
Dear Rob, From your first mail : > Am I correct in thinking that in both an HTMLTag and an HTMLImageTag object > are created for each image tag encountered when using HTMLImageScanner? If > so, does the HTMLTag object get populated with the usual data? Functionally, you only get one tag object. If you havent registered the concerned scanner (HTMLImageScanner in this case), you will get an HTMLTag object. If you have, then you will get an HTMLImageTag object. Technically, internally, first an HTMLTag object gets created. Then control passes to registered scanners to see if this tag can be upgraded. If so, the new sublcassed tag object (HTMLImageTag, for example) gets created and returned in place of the original HTMLTag. > I didn't give it a > filter because I don't understand what the filter does. > A filter is not required - it is only for using it from the command line - allows us to check parse results easily and dump it to a file. You can ignore it for your app - the following will work : parser.addScanner(new HTMLImageScanner("")); > HTMLLinkProcessor linkProcessor = new HTMLLinkProcessor(); Why are you declaring a linkProcessor ? > HTMLImageTag imgtag = (HTMLImageTag) node; > String imgsrc = imgtag.getImageLocation(); > if(imgsrc.indexOf("http://") == -1){ > file://relative src > imgsrc = base.toString() + imgsrc; > } This is not necessary. The base url that you specify in the parser, will automatically be used to resolve relative links. Check out the testcases : testRelativeImageScan, testRelativeImageScan2, testRelativeImageScan3 in com.kizna.htmlTests.scannerTests.HTMLImageScannerTest I can also see that you are trying to reconstruct the html tag without changing its contents - you can do this with imageTag.toRawString() if you are using HTMLParser v1.2 upwards. However, this will provide you with the relative link (not resolved absolute link). Perhaps, if you need it, we can modify the toRawString() method, and get it to return absolute links ?? > 1) is this the only way to get all the attributes in the img tag? No. There's a much easier way - just do : imageTag.getParameter("alt"); If you want to get the keys, I think this should work : imageTag.getParsed().keys() [Maybe the name of this method should be changed to be easier to figure out]. > I need to do this or can I just omit the Content-length field and avoid > using the StringBuffer? Hmm.. Its not mandatory to send the content-length, but some servers expect it. To make life easier, you should use toRawString() to get the html tags out uniformly. Since this applied to a node, you dont have to write code for different types of nodes. So sb.append(node.toRawString()) is good enough (perhaps) for all nodes. The only one where there might be an issue is the HTMLImageTag for reasons that I mentioned above. You can probably rewrite the toRawString() method in HTMLImageTag for your purposes and that should solve your problem neatly. Feel free to post any further questions that you have. Regards, Somik |