HTML Parser / Discussion / Help: Just a question

Anonymous - 2003-10-18

I just have a question about the NodeVisitor

why should we treat the ImageTag and LinkTag so specially in the NodeVisitor ? they have their own method (visitImageTag() and visitLinkTag() ). Please clarify me this.

Thank you for providing this package, it really easy my work

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2003-10-18
  
  The visitImageTag, visitLinkTag and visitTitle tag were apparently added later, after the initial visitor pattern was added, just because of their usefulness.
  
  They may be removed in the ongoing refactoring.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2003-10-19
  
  Is it an intention or a bug that the NodeIterator.nextNode() return the Node and also remove it from the Node list since I cannot go throught the Nodes with two Visitor. The second visitor seem to visit no node.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Derrick Oswald - 2003-10-19
    
    The iterator, like all iterators, is used up by the operations that step through it, and is useless after stepping through all nodes.
    
    It does not remove the nodes from the list. If you make another iterator for the same list you can traverse them again.
    
    NodeList list;
    NodeIterator iterator1 = list.elements();
    NodeIterator iterator2 = list.elements();
    
    Both iterators will be able to step through all nodes.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Anonymous - 2003-10-19
      
      String newHtml = "";
      
      theParser.visitAllNodesWith(copyImageVisitor);
      
      for (NodeIterator i = parser.elements(); i.hasMoreNodes()) {
      
          System.out.println("In the for loop");//for testing
          Node aNode = i.nextNode();
          newHtml = newHtml + aNode.toHtml();
      }
      
      this is my piece of codes which use as visitor to copy all the image from the page (and change the src) then do the Reverse Html Rendering to have a local page. However, after the visitor visit the nodes the next for loop process no Nodes (no print out), if I comment off the visit of the visitor then I have my page back. Inspect the code i found that the new node is return by NodeReader and each call elements() on parser return an new InteratorImpl but they all have a same reader from which they get the node (it seem like by the peek() method) and the peek() method return null when no node left. So I thinks the problem come from there. Still I'm not very much confident, Can you check that please.
      
      Vinh
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Derrick Oswald - 2003-10-19
        
        Yes, you'll have to reopen the URL with a new Parser, or at a minimum, reopen the URL and pass it in to the existing parser.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2003-10-19
  
  the ImageTag.getImageURL() havent been written to the case that base tag exist, it seem to rely on the URL of the html page only.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Derrick Oswald - 2003-10-19
    
    The base tag should be honoured if you've called parser.registerScanners(), since the link scanner registers the BaseHrefScanner, which sets the link processor base URL, from which all relative URL's are anchored.
    
    If it isn't, yes it's a bug. Please file a testcase if possible with your bug report.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Anonymous - 2003-10-20
      
      If I use registerScanners() then the Base url is honoured but then I cannot get my image tag since it is wrap in a link tag or sometime a <TD> tag.
      
      I try to do
      addScanner(new BaseHrefScanner());
      addScanner(new ImageScanner());
      
      but it throw an ParserException when parse the base tag.
      
      any suggestion ?
      
      Vinh
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2003-10-19
  
  this method doesnt work for me
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Derrick Oswald - 2003-10-19
    
    You're the second person that reported that.
    Try the static constructor instead:
    
    public static Parser createParser(String inputHTML);
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2003-10-19
  
  Thank you :)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Just a question

Forums

Help

Just a question document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Just a question