I want to extract all the words from a webpage (like the stringextractor example does). However instead of passing a URL to to the parser I want to send a full String representation of the html page to the parser.
Something like: Parser parser = new Parser("<html><div><b>text</b><img src\"ytuy\"/></div></html>");
Would return: text
Is this possible or can it only take a URL or a file location?
If it is possible how do I get the Strings returned? What method of the parser class returns the words from the page?
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just to add to my previous post, if you are unable to pass the html String to the parser can someone give me an idea of how I would specify a file location and retrieve all words in a page. I need a "as simple as possible" method to either pass a html String or file location and retrieve all words on a page.
Many thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here is the solution which is posted on this forum.
String htmlString = "<html><div><b>text</b><img src\"ytuy\"/></div></html>";
Parser par = new Parser(htmlString);
//Parser par = new Parser()
//par.setInputHTML(htmlString);
StringBean sb = new StringBean();
sb.setLinks (false);
par.visitAllNodesWith(sb);
System.out.println(sb.getStrings());
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I want to extract all the words from a webpage (like the stringextractor example does). However instead of passing a URL to to the parser I want to send a full String representation of the html page to the parser.
Something like: Parser parser = new Parser("<html><div><b>text</b><img src\"ytuy\"/></div></html>");
Would return: text
Is this possible or can it only take a URL or a file location?
If it is possible how do I get the Strings returned? What method of the parser class returns the words from the page?
Thanks
Just to add to my previous post, if you are unable to pass the html String to the parser can someone give me an idea of how I would specify a file location and retrieve all words in a page. I need a "as simple as possible" method to either pass a html String or file location and retrieve all words on a page.
Many thanks
Hi There,
Did you figured out how to pass String to parser i.e. not URL or file? I would really appreciated help.
Thanks,
Dejan
Hi,
Here is the solution which is posted on this forum.
String htmlString = "<html><div><b>text</b><img src\"ytuy\"/></div></html>";
Parser par = new Parser(htmlString);
//Parser par = new Parser()
//par.setInputHTML(htmlString);
StringBean sb = new StringBean();
sb.setLinks (false);
par.visitAllNodesWith(sb);
System.out.println(sb.getStrings());