why should we treat the ImageTag and LinkTag so specially in the NodeVisitor ? they have their own method (visitImageTag() and visitLinkTag() ). Please clarify me this.
Thank you for providing this package, it really easy my work
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The visitImageTag, visitLinkTag and visitTitle tag were apparently added later, after the initial visitor pattern was added, just because of their usefulness.
They may be removed in the ongoing refactoring.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-10-19
Is it an intention or a bug that the NodeIterator.nextNode() return the Node and also remove it from the Node list since I cannot go throught the Nodes with two Visitor. The second visitor seem to visit no node.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Both iterators will be able to step through all nodes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-10-19
String newHtml = "";
theParser.visitAllNodesWith(copyImageVisitor);
for (NodeIterator i = parser.elements(); i.hasMoreNodes()) {
System.out.println("In the for loop");//for testing
Node aNode = i.nextNode();
newHtml = newHtml + aNode.toHtml();
}
this is my piece of codes which use as visitor to copy all the image from the page (and change the src) then do the Reverse Html Rendering to have a local page. However, after the visitor visit the nodes the next for loop process no Nodes (no print out), if I comment off the visit of the visitor then I have my page back. Inspect the code i found that the new node is return by NodeReader and each call elements() on parser return an new InteratorImpl but they all have a same reader from which they get the node (it seem like by the peek() method) and the peek() method return null when no node left. So I thinks the problem come from there. Still I'm not very much confident, Can you check that please.
Vinh
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The base tag should be honoured if you've called parser.registerScanners(), since the link scanner registers the BaseHrefScanner, which sets the link processor base URL, from which all relative URL's are anchored.
If it isn't, yes it's a bug. Please file a testcase if possible with your bug report.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-10-20
If I use registerScanners() then the Base url is honoured but then I cannot get my image tag since it is wrap in a link tag or sometime a <TD> tag.
I try to do
addScanner(new BaseHrefScanner());
addScanner(new ImageScanner());
but it throw an ParserException when parse the base tag.
any suggestion ?
Vinh
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-10-19
this method doesnt work for me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just have a question about the NodeVisitor
why should we treat the ImageTag and LinkTag so specially in the NodeVisitor ? they have their own method (visitImageTag() and visitLinkTag() ). Please clarify me this.
Thank you for providing this package, it really easy my work
The visitImageTag, visitLinkTag and visitTitle tag were apparently added later, after the initial visitor pattern was added, just because of their usefulness.
They may be removed in the ongoing refactoring.
Is it an intention or a bug that the NodeIterator.nextNode() return the Node and also remove it from the Node list since I cannot go throught the Nodes with two Visitor. The second visitor seem to visit no node.
The iterator, like all iterators, is used up by the operations that step through it, and is useless after stepping through all nodes.
It does not remove the nodes from the list. If you make another iterator for the same list you can traverse them again.
NodeList list;
NodeIterator iterator1 = list.elements();
NodeIterator iterator2 = list.elements();
Both iterators will be able to step through all nodes.
String newHtml = "";
theParser.visitAllNodesWith(copyImageVisitor);
for (NodeIterator i = parser.elements(); i.hasMoreNodes()) {
System.out.println("In the for loop");//for testing
Node aNode = i.nextNode();
newHtml = newHtml + aNode.toHtml();
}
this is my piece of codes which use as visitor to copy all the image from the page (and change the src) then do the Reverse Html Rendering to have a local page. However, after the visitor visit the nodes the next for loop process no Nodes (no print out), if I comment off the visit of the visitor then I have my page back. Inspect the code i found that the new node is return by NodeReader and each call elements() on parser return an new InteratorImpl but they all have a same reader from which they get the node (it seem like by the peek() method) and the peek() method return null when no node left. So I thinks the problem come from there. Still I'm not very much confident, Can you check that please.
Vinh
Yes, you'll have to reopen the URL with a new Parser, or at a minimum, reopen the URL and pass it in to the existing parser.
the ImageTag.getImageURL() havent been written to the case that base tag exist, it seem to rely on the URL of the html page only.
The base tag should be honoured if you've called parser.registerScanners(), since the link scanner registers the BaseHrefScanner, which sets the link processor base URL, from which all relative URL's are anchored.
If it isn't, yes it's a bug. Please file a testcase if possible with your bug report.
If I use registerScanners() then the Base url is honoured but then I cannot get my image tag since it is wrap in a link tag or sometime a <TD> tag.
I try to do
addScanner(new BaseHrefScanner());
addScanner(new ImageScanner());
but it throw an ParserException when parse the base tag.
any suggestion ?
Vinh
this method doesnt work for me
You're the second person that reported that.
Try the static constructor instead:
public static Parser createParser(String inputHTML);
Thank you :)