Thread: [Htmlparser-user] Bug in HTMLFrameTag ?
Brought to you by:
derrickoswald
From: Elodie T. <et...@in...> - 2003-02-05 08:24:01
|
Hi, I want to modify attributes of some HTMLTags that I get. I'm using the following code just to see how the parser works. The problem is that it doesn't detect the <frame> tags, although it gets the other tags (a, img, frameset). I tested it with a HTML document that have 2 or 3 frame tags, and it sees none of them ! What can I do ? Thanx in advance for your help. HTMLReader htmlReader = new HTMLReader ( buffer, len ); HTMLParser parser = new HTMLParser(htmlReader); parser.registerScanners(); HTMLEnumeration e = parser.elements(); while ( e.hasMoreNodes() ) { HTMLNode node = e.nextHTMLNode(); if ( node instanceof HTMLLinkTag) { logger.debug ( " href = " + ((HTMLLinkTag) node).getLink() ); } else { if ( node instanceof HTMLImageTag) { logger.debug ( " srcImg = " + ((HTMLImageTag) node).getImageURL() ); } else { if ( node instanceof HTMLFrameSetTag) { logger.debug ( " srcFrameSet = " + ((HTMLFrameSetTag) node).getFrameLocation() ); } else { if ( node instanceof HTMLFrameTag) { logger.debug ( " srcFrame = " + ((HTMLFrameSetTag) node).getFrameLocation() ); } else { if ( node instanceof HTMLTag) logger.debug ( " HTMLTag = " + ( (HTMLTag) node).toHTML() ); } } } } } |
From: Elodie T. <et...@in...> - 2003-02-05 14:36:23
|
I answer to myself ;o) I think I've found the source of my problem. I've added : parser.addScanner( new HTMLFrameSetScanner() ); parser.addScanner( new HTMLFrameScanner() ); to my code, and it seems to work now. But I've discored another problem : if there is one or many <frameset> tags included in a <frameset>, they are not detected (only the <frame> tags). Could someone confirm me that, or I'm totally wrong ? Thanx. > Hi, > > I want to modify attributes of some HTMLTags that I get. I'm using the following code just to see how the parser works. The problem is that it doesn't detect the <frame> tags, although it gets the other tags (a, img, frameset). I tested it with a HTML document that have 2 or 3 frame tags, and it sees none of them ! > > What can I do ? > > Thanx in advance for your help. > > > > HTMLReader htmlReader = new HTMLReader ( buffer, len ); > HTMLParser parser = new HTMLParser(htmlReader); > parser.registerScanners(); > > HTMLEnumeration e = parser.elements(); > while ( e.hasMoreNodes() ) { > > HTMLNode node = e.nextHTMLNode(); > > if ( node instanceof HTMLLinkTag) { > logger.debug ( " href = " + ((HTMLLinkTag) node).getLink() ); > } else { > if ( node instanceof HTMLImageTag) { > logger.debug ( " srcImg = " + ((HTMLImageTag) node).getImageURL() ); > } else { > if ( node instanceof HTMLFrameSetTag) { > logger.debug ( " srcFrameSet = " + ((HTMLFrameSetTag) node).getFrameLocation() ); > } else { > if ( node instanceof HTMLFrameTag) { > logger.debug ( " srcFrame = " + ((HTMLFrameSetTag) node).getFrameLocation() ); > } else { > if ( node instanceof HTMLTag) > logger.debug ( " HTMLTag = " + ( (HTMLTag) node).toHTML() ); > } > } > } > } > } > > > ------------------------------------------------------- > This SF.NET email is sponsored by: > SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! > http://www.vasoftware.com > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user > |
From: Somik R. <so...@ya...> - 2003-02-05 18:37:07
|
--- Elodie Tasia <et...@in...> wrote: > I answer to myself ;o) I think I've found the source > of my problem. > I've added : > > parser.addScanner( new HTMLFrameSetScanner() ); > parser.addScanner( new HTMLFrameScanner() ); > > to my code, and it seems to work now. > But I've discored another problem : if there is one > or many <frameset> tags included in a <frameset>, > they are not detected (only the <frame> tags). > > Could someone confirm me that, or I'm totally wrong > ? There are many ways to get to the child tags. Here are some : Assuming you have got the first frameset tag for (SimpleEnumeration e = firstFrameSetTag.children(); e.hasMoreNodes(); ) { HTMLNode node = e.nextNode(); if (node instanceof HTMLFrameSetTag) { HTMLFrameSetTag frameSetTag = (HTMLFrameSetTag)node; } if (node instanceof HTMLFrameTag) { HTMLFrameTag frameTag = (HTMLFrameTag)node; } } ALTERNATIVELY: If you are only interested in frameset tags, parser = new HTMLParser(..); parser.registerScanners(); HTMLNode [] frameSetTags = parser.extractAllNodesThatAre(HTMLFrameSetTag.class); If you are interested in both frameset and frame, a cleaner approach is : public class MyParserVisitor extends HTMLVisitor { public void visitTag(HTMLTag tag) { if (tag.getTagName().equals("FRAMESET") || tag.getTagName().equals("FRAME") { // Do what you want to do. } } public void get..() { } } parser = new HTMLParser(..); MyParserVisitor myParserVisitor = new MyParserVisitor(); parser.visitAllNodesWith(myParserVisitor); myParserVisitor.get..(); Regards, Somik __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |
From: Elodie T. <et...@in...> - 2003-02-06 08:04:40
|
Hi, I noticed that some of the Tag classes have a method that permit to modify (or I guess they do) the "source" attribute (like href or src). These methods are, for example : setBaseURL, setImageURL, setLink... It seems perfect to me, as I have to modify all relative path in a html... but I can't find method that set source location in a frame tag, nor in an input tag (when type=image). What can I do ? Would it be too complex for me if I tried to add such a method in the HTMLFrameTag class ? Regards, Elodie |
From: Aminudin K. <ami...@mi...> - 2003-02-06 09:15:02
|
Hi, Is there any way / class that could strip all comments from HTML source and produce plain and clean HTML source without any comment . Thanks |