Menu

Extract Strings - Pass html String to parser?

Help
mscwd
2007-09-05
2013-04-27
  • mscwd

    mscwd - 2007-09-05

    I want to extract all the words from a webpage (like the stringextractor example does). However instead of passing a URL to to the parser I want to send a full String representation of the html page to the parser.

    Something like: Parser parser = new Parser("<html><div><b>text</b><img src\&quot;ytuy\&quot;/></div></html>");

    Would return: text

    Is this possible or can it only take a URL or a file location?

    If it is possible how do I get the Strings returned? What method of the parser class returns the words from the page?

    Thanks

     
    • mscwd

      mscwd - 2007-09-05

      Just to add to my previous post, if you are unable to pass the html String to the parser can someone give me an idea of how I would specify a file location and retrieve all words in a page. I need a "as simple as possible" method to either pass a html String or file location and retrieve all words on a page.

      Many thanks

       
    • Dejan Miljkovic

      Dejan Miljkovic - 2008-03-28

      Hi There,

      Did you figured out how to pass String to parser i.e. not URL or file? I would really appreciated help.

      Thanks,

      Dejan

       
    • Dejan Miljkovic

      Dejan Miljkovic - 2008-03-28

      Hi,

      Here is the solution which is posted on this forum.

      String htmlString = "<html><div><b>text</b><img src\&quot;ytuy\&quot;/></div></html>";
      Parser par = new Parser(htmlString);
      //Parser par = new Parser()
      //par.setInputHTML(htmlString);
      StringBean sb = new StringBean();
      sb.setLinks (false);
      par.visitAllNodesWith(sb);
      System.out.println(sb.getStrings());

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.