I am using HtmlUnit for text extraction.
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://www.latimes.com/topic/politics/government/barack-obama-PEPLT007408-topic.html");
final String pageAsText = page.asText();
System.out.println(pageAsText);
It is giving me error:
Exception in thread "main" ======= EXCEPTION START ========
EcmaError: lineNumber=[11] column=[0] lineSource=[<no source="">]</no> name=[TypeError] sourceName=[http://www.trbas.com/jive/prod/common/javascripts/mainInit.1q2w3_13a2f8e11fc6a6ec0e18d71a6f3e8dd9.min.js] message=[TypeError: Cannot find function addEventListener in object [object HTMLDocument]. (http://www.trbas.com/jive/prod/common/javascripts/mainInit.1q2w3_13a2f8e11fc6a6ec0e18d71a6f3e8dd9.min.js#11)]
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function addEventListener in object [object HTMLDocument]. (http://www.trbas.com/jive/prod/common/javascripts/mainInit.1q2w3_13a2f8e11fc6a6ec0e18d71a6f3e8dd9.min.js#11)
at
IE8 doesn't support it.
Please use:
Hey,
Thanks for prompt reply, when I try with
WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_11);
final HtmlPage page = webClient.getPage("http://www.latimes.com/topic/politics/government/barack-obama-PEPLT007408-topic.html");
System.out.println(page.asText());
It returns:
Please update to a modern browser for the best Los Angeles Times viewing experience.
or, you can view an alternate view of this site on your current browser by clicking here.
Is that is the case that HtmlUnit works for particular browser?
I see, it would be better if you can isolate the root as hinted in http://htmlunit.sourceforge.net/submittingJSBugs.html
Diff:
Have done a test with the latest code from SVN. Now HtmlUnit is able to get the content of the page.
I am still getting the error above. Using 2.19-SNAPSHOT. Using the same test page you used above. Looks like this is still a bug.
Please consider reopening this ticket.
Or if you like I will open another.
Last edit: Vernon Singleton 2015-11-10
You have to specifiy a different browser (see Ahmed's comment above).
I have tried with INTERNET_EXPLORER_11 and FIREFOX_31, as noted below, both fail with the same error noted above using 2.19-SNAPSHOT.
Last edit: Vernon Singleton 2015-11-10
Sorry Vernon, but i can't reproduce your problem with your code. Your sampel code works fine for me. Can you please check your classpath.
Wow, please excuse my stupid. It was my fault. In addition to the above code, I also overlooked this block of code in my class:
Obviously, this causes the issue noted in this thread. As you said, you need to use a browser other than the default. Thank you for the quick response. So please ignore my previous request and keep this ticket closed as solved.
Ok no problem. Enjoy using HtmlUnit.
Am 11. November 2015 03:10:31 MEZ, schrieb Vernon Singleton vsingleton@users.sf.net:
WETATOR
Smart Web Application Testing
www.wetetor .org
Related
Bugs:
#1615