HTML Parser / Discussion / Help: How to retrieve Nodes event based

cult_of_excellence - 2006-03-29

I need to retrieve nodes in a sequential order and render them back. is there an easy way to do this? Is it possible to get all the content in a paragraph, or in a list and then handle them to process each of the children. A paragraph may contain bold text, images etc. All i need is an aggregator which collects all the text inside the paragraph or a list and store start and end positions of each new item in the paragraph. if anyone who has played around with this parser substantially can help it would be nice.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2006-03-29
  
  From the Parser, each tag contains all of it's children. You can recursively examine the children in a depth first order looking for TextNodes. The TextNodes also have their start and end position (as does every node from the parser). See the StringBean class for an example.
  
  If you want a purely sequential list of basic nodes you can use the Lexer. This sounds more like what you want.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- cult_of_excellence - 2006-03-30
  
  HI Derrick,
  Thanks a lot for your reply. So do we have any sample application or code which i can look at for the lexer. For rendering stuff i will need to know what tag i am handling now and it has to be event based.
  
  For ex. If i encounter a <b> tag which is standalone i just need to call renderBoldString();
  
  If i encounter <p> tag it can contain child and i need to assemble and build a paragraph structure which will hold information about all nested tags inside this <p> tag.
  
  similarly for <ul> it can contain multiple <LI>'s which i need to build.
  
  Do you think you can help me with a sample code fragment which could help me figure out how to go about this?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Derrick Oswald - 2006-03-31
    
    Sorry, I don't have any similar code.
    Sounds like you are writing a browser.
    Based on your requirements I would think you want to stick with the parser, or if it must be event based, look at the partially complete SAX support in the org.htmlparser.sax package. There is a related test case in src/org/htmlparser/tests/SAXTest.java, but you may not need it if you know how sax works.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

How to retrieve Nodes event based

Forums

Help

How to retrieve Nodes event based document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

How to retrieve Nodes event based