Thread: [Htmlparser-user] RE: Hints on how to change image tag locations and write o

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I managed to get something working like this:
sb is a StringBuffer and base is a URL (the source of the document)
i'm just using just a single scanner - HTMLImageScanner. I didn't give it a 
filter because I don't understand what the filter does.

   HTMLNode node;
    // Run through an enumeration of html elements
    HTMLLinkProcessor linkProcessor = new HTMLLinkProcessor();
    for (Enumeration e=parser.elements();e.hasMoreElements();) {
      node = (HTMLNode)e.nextElement();    // Cast the element to HTMLNode
      if (node instanceof HTMLStringNode) {
      	HTMLStringNode stringNode = (HTMLStringNode)node;
      	sb.append(stringNode.getText());
      } else if (node instanceof HTMLTag){
      	HTMLTag tag = (HTMLTag) node;
      	if (node instanceof HTMLImageTag) {
          HTMLImageTag imgtag = (HTMLImageTag) node;
          String imgsrc = imgtag.getImageLocation();
          if(imgsrc.indexOf("http://") == -1){
          	//relative src
          	imgsrc = base.toString() + imgsrc;
          }
          sb.append("<img src=\"" + imgsrc + "\"");
          Hashtable h = imgtag.parseParameters();
          for (Enumeration e2=h.keys();e2.hasMoreElements();) {
          	String key = (String)e2.nextElement();
          	sb.append(" " + key + "=\"" + h.get(key) + "\"");
          }
          sb.append(">");
        } else {
          sb.append("<" + tag.getText() + ">");
        }
      } else if (node instanceof HTMLEndTag){
      	HTMLEndTag tag = (HTMLEndTag) node;
      	sb.append("</" + tag.getContents() + ">");
      }
    }

Just a couple of questions if you don't mind.

1) is this the only way to get all the attributes in the img tag?
2) can you see any problems or suggest improvements?
3) (HTTP question) I'm adding all the output to a StringBuffer so that I can 
convert it to a byte array using sb.toString().getBytes() - I need to do 
this so that I can get the length of the byte array for use in the 
Content-length HTTP header field (the output is sent back to a browser). Do 
I need to do this or can I just omit the Content-length field and avoid 
using the StringBuffer?

Another thing, I was testing the app on google.com and I noticed it has a 
strange image tag : < img width=1 height=1 alt="" > (no SRC attribute) 
Although the parser recognised it as an image tag, it didn't seem to pick up 
on the attributes. Is this a bug?

>
>Hi all,
>
>I'm new to the list today after following the thread 'Hints on how to 
>change image tag locations and write out document' in the archives. I'm 
>trying to make an application that changes all relative img src attributes 
>to absolute before writing out the entire document. I'd be very interested 
>to see some of the code from the attachments from Somik Raha if somebody 
>could post them. The archives don't seem to keep attachments.
>
>I just started using HTMLParser today and I'm currently stuck trying figure 
>out how to get the complete IMG tag string when using an HTMLImageScanner. 
>Am I correct in thinking that in both an HTMLTag and an HTMLImageTag object 
>are created for each image tag encountered when using HTMLImageScanner? If 
>so, does the HTMLTag object get populated with the usual data?
>
>
>Thanks and regards,
>Rob Shields
>

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.

Thread: [Htmlparser-user] RE: Hints on how to change image tag locations and write o

htmlparser-user