I am attempting to scrape a third party web site which uses Polyfill.
As reported in https://sourceforge.net/p/htmlunit/bugs/1930/, HtmlUnit does not appear to support web pages that render Polyfill components.
The following minimal test (with HtmlUnit 2.33) illustrates the issue.
import static com.gargoylesoftware.htmlunit.BrowserVersion.BEST_SUPPORTED;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.io.IOException;
import static org.hamcrest.CoreMatchers.containsString;
import static org.junit.Assert.assertThat;
import org.junit.Test;
public final class PolymerTest {
@Test
public void homePage() {
try (final WebClient wc = new WebClient(BEST_SUPPORTED)) {
wc.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage result = wc.getPage(
"http://webcomponents.github.io/hello-world-polymer/bower_components/hello-world-polymer/");
wc.waitForBackgroundJavaScript(10000);
wc.waitForBackgroundJavaScriptStartingBefore(10000);
assertThat(result.asText(), containsString("Hello Unicorn :)"));
} catch (final IOException ex) {
throw new IllegalStateException(ex);
}
}
}
Any advice much appreciated.
Can I just ask the developers if this something likely to be resolvable in the near term? The other third party sites scraped by this project all use HtmlUnit with great success, so I'm reluctant to add another dependency if it might be feasible with HtmlUnit. Thanks.
This is an duplicate of https://github.com/HtmlUnit/htmlunit/issues/23. Will track the status on GitHub.