[Htmlparser-user] RE: Hints on how to change image tag locations and write o
Brought to you by:
derrickoswald
From: Rob S. <bob...@ho...> - 2002-06-16 22:05:30
|
Hi, I managed to get something working like this: sb is a StringBuffer and base is a URL (the source of the document) i'm just using just a single scanner - HTMLImageScanner. I didn't give it a filter because I don't understand what the filter does. HTMLNode node; // Run through an enumeration of html elements HTMLLinkProcessor linkProcessor = new HTMLLinkProcessor(); for (Enumeration e=parser.elements();e.hasMoreElements();) { node = (HTMLNode)e.nextElement(); // Cast the element to HTMLNode if (node instanceof HTMLStringNode) { HTMLStringNode stringNode = (HTMLStringNode)node; sb.append(stringNode.getText()); } else if (node instanceof HTMLTag){ HTMLTag tag = (HTMLTag) node; if (node instanceof HTMLImageTag) { HTMLImageTag imgtag = (HTMLImageTag) node; String imgsrc = imgtag.getImageLocation(); if(imgsrc.indexOf("http://") == -1){ //relative src imgsrc = base.toString() + imgsrc; } sb.append("<img src=\"" + imgsrc + "\""); Hashtable h = imgtag.parseParameters(); for (Enumeration e2=h.keys();e2.hasMoreElements();) { String key = (String)e2.nextElement(); sb.append(" " + key + "=\"" + h.get(key) + "\""); } sb.append(">"); } else { sb.append("<" + tag.getText() + ">"); } } else if (node instanceof HTMLEndTag){ HTMLEndTag tag = (HTMLEndTag) node; sb.append("</" + tag.getContents() + ">"); } } Just a couple of questions if you don't mind. 1) is this the only way to get all the attributes in the img tag? 2) can you see any problems or suggest improvements? 3) (HTTP question) I'm adding all the output to a StringBuffer so that I can convert it to a byte array using sb.toString().getBytes() - I need to do this so that I can get the length of the byte array for use in the Content-length HTTP header field (the output is sent back to a browser). Do I need to do this or can I just omit the Content-length field and avoid using the StringBuffer? Another thing, I was testing the app on google.com and I noticed it has a strange image tag : < img width=1 height=1 alt="" > (no SRC attribute) Although the parser recognised it as an image tag, it didn't seem to pick up on the attributes. Is this a bug? > >Hi all, > >I'm new to the list today after following the thread 'Hints on how to >change image tag locations and write out document' in the archives. I'm >trying to make an application that changes all relative img src attributes >to absolute before writing out the entire document. I'd be very interested >to see some of the code from the attachments from Somik Raha if somebody >could post them. The archives don't seem to keep attachments. > >I just started using HTMLParser today and I'm currently stuck trying figure >out how to get the complete IMG tag string when using an HTMLImageScanner. >Am I correct in thinking that in both an HTMLTag and an HTMLImageTag object >are created for each image tag encountered when using HTMLImageScanner? If >so, does the HTMLTag object get populated with the usual data? > > >Thanks and regards, >Rob Shields > _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. |