Menu

Problem about parsing composittag

Help
wjsjw
2007-11-15
2013-04-27
  • wjsjw

    wjsjw - 2007-11-15

    Hi all

       I am a newbie to HtmlParser.I really dont know how to cleanup the composit tag eg :paragraph tag "<p>","<br>","&nbsp" just  like that:<p>&nbsp text1<br>&nbsp text2</p> . Can anybody tell me how to do that? I try my best but failure:
    here is my class:
    public void getText(){
            try {
                Parser parser=new Parser("http://money.finance.sina.com.cn/corp/view/vCB_AllBulletinDetail.php?stockid=000002&id={3DAAE7D0-EB20-66F9-E040-640A12016145}");
                parser.setEncoding("gb2312");
                HasAttributeFilter attributeFilter=new HasAttributeFilter("id","content");
                NodeFilter filter= new AndFilter(new TagNameFilter("div"),attributeFilter);
                NodeList nodeList=parser.extractAllNodesThatMatch(filter);
                for (int i = 0; i < nodeList.size(); i++) {
                    String notices = ((Div)nodeList.elementAt(i)).getStringText();
                    System.out.println(notices);
                }
            } catch (ParserException e) {
                e.printStackTrace();
            }
        }
      

     
    • wjsjw

      wjsjw - 2007-11-15

      Any help would be appreciate!
      wjsjw
      2007-11-15

       
    • Derrick Oswald

      Derrick Oswald - 2007-11-15

      Use a StringBean as a NodeVisitor on the nodeList to eliminate the tags.
      Then apply Translate.decode to change the character references to real characters.

       
    • wjsjw

      wjsjw - 2007-11-16

      dear Derrick Oswald:
         Thank you for your help.It works well. :)
      code:
      StringBean sb= new StringBean();
      for (int i = 0; i < nodeList.size(); i++) {
      //                String notices = ((Div)nodeList.elementAt(i)).getStringText();
                      nodeList.visitAllNodesWith(sb);
                      System.out.println(sb.getStrings());
                  }

       
      • Derrick Oswald

        Derrick Oswald - 2007-11-16

        You shouldn't need to loop over the nodeList, the nodeList.visitAllNodesWith(sb); should do it.

         
    • wjsjw

      wjsjw - 2007-11-17

      Dear Oswald:
        Thank you very much. I  get it.

      code

      StringBean sb= new StringBean();
      nodeList.visitAllNodesWith(sb);
      System.out.println(sb.getStrings());

      best wishes

      wjsjw
      2007-11-17

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.