HTML Parser / Discussion / Help: Problem about parsing composittag

wjsjw - 2007-11-15

Hi all

 I am a newbie to HtmlParser.I really dont know how to cleanup the composit tag eg ：paragraph tag ""," ","&nbsp" just like that:&nbsp text1 &nbsp text2 . Can anybody tell me how to do that? I try my best but failure:
here is my class:
public void getText(){
 try {
 Parser parser=new Parser("http://money.finance.sina.com.cn/corp/view/vCB_AllBulletinDetail.php?stockid=000002&id={3DAAE7D0-EB20-66F9-E040-640A12016145}");
 parser.setEncoding("gb2312");
 HasAttributeFilter attributeFilter=new HasAttributeFilter("id","content");
 NodeFilter filter= new AndFilter(new TagNameFilter("div"),attributeFilter);
 NodeList nodeList=parser.extractAllNodesThatMatch(filter);
 for (int i = 0; i < nodeList.size(); i++) {
 String notices = ((Div)nodeList.elementAt(i)).getStringText();
 System.out.println(notices);
 }
 } catch (ParserException e) {
 e.printStackTrace();
 }
 }


If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- wjsjw - 2007-11-15
 
 Any help would be appreciate!
 wjsjw
 2007-11-15
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Derrick Oswald - 2007-11-15
 
 Use a StringBean as a NodeVisitor on the nodeList to eliminate the tags.
 Then apply Translate.decode to change the character references to real characters.
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- wjsjw - 2007-11-16
 
 dear Derrick Oswald:
 Thank you for your help.It works well. :)
 code:
 StringBean sb= new StringBean();
 for (int i = 0; i < nodeList.size(); i++) {
 // String notices = ((Div)nodeList.elementAt(i)).getStringText();
 nodeList.visitAllNodesWith(sb);
 System.out.println(sb.getStrings());
 }
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
 - Derrick Oswald - 2007-11-16
 
 You shouldn't need to loop over the nodeList, the nodeList.visitAllNodesWith(sb); should do it.
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- wjsjw - 2007-11-17
 
 Dear Oswald:
 Thank you very much. I get it.
 
 code
 
 StringBean sb= new StringBean();
 nodeList.visitAllNodesWith(sb);
 System.out.println(sb.getStrings());
 
 best wishes
 
 wjsjw
 2007-11-17
 
 If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Problem about parsing composittag

Forums

Help

Problem about parsing composittag document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Problem about parsing composittag