[Htmlunit-user] Getting text from AngularJS site

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

We extract text from various websites, including a Danish site for recipies etc. Using HtmlUnit 2.34.1 we got stopped by NPE but this is now fixed [2].

However, we still cannot get text from the HTML. With debug logging on, we do see the recipe text of [1] being downloaded from some API [3]. But then it is up to Javascript to inject the text into the DOM, which doesn't appear to happen.

I tried many variations in the code, different waits, very long waits but nothing seems to be working here. There are exceptions but they seem unrelated.

    client = new WebClient(BrowserVersion.CHROME);
    client.getOptions().setThrowExceptionOnScriptError(false);
    client.getOptions().setCssEnabled(false);
    client.getOptions().setJavaScriptEnabled(true);
    client.getOptions().setDownloadImages(false);
    client.getOptions().setThrowExceptionOnFailingStatusCode(false); 
    client.getOptions().setPrintContentOnFailingStatusCode(false);
    client.getOptions().setUseInsecureSSL(true);
    client.getOptions().setRedirectEnabled(false);
    client.setJavaScriptTimeout(15000);
    client.waitForBackgroundJavaScript(10000l);
    client.waitForBackgroundJavaScriptStartingBefore(10000l);
    page = client.getPage(url);
    synchronized(page) {
      try {
        page.wait(conf.getInt("htmlunit.javascript.timeout", 15000));
      } catch (Exception e) {}
    }
    client.waitForBackgroundJavaScript(10000l);
    client.waitForBackgroundJavaScriptStartingBefore(10000l);
    webResponse = page.getWebResponse();

The text i am interested in is in element <section id="sectionDetailsMain"> but it is never created/added to the DOM. Can anyone help me get the HTML properly filled by Javascript?

Many thanks,
Markus

[1] https://www.aarstiderne.com/find-din-maaltidskasse/kvikkassen
[2] https://sourceforge.net/p/htmlunit/bugs/2008/
[3] https://www.aarstiderne.com/umbraco/api/productapi/Products?url=maaltidskasser

[Htmlunit-user] Getting text from AngularJS site

Java GUI-Less browser, supporting JavaScript, to run against web pages

[Htmlunit-user] Getting text from AngularJS site