Thread: [Htmlunit-user] Previously working connection no longer works

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I have been trying to build a web scraping tool to obtain a string from a
dynamically-loaded webpage.

My objective was to obtain the string COC1=CC2=C(C=C1)NC3=C2CCNC3 from the
section titled "Canonical SMILES" from the following website:
https://pubchem.ncbi.nlm.nih.gov/compound/1868

Previously, I had working HTMLUnit code to access the above (thanks
Ronald), but now the code is not working!  Whereas before I would get
printouts to the screen of my target information (COC1=CC2=C(C=C1)NC3=C2CCNC3
from the "Canonical Smiles" section of the above website - this information
is displayed dynamically on the website), now instead, HTMLUnit returns the
following error:

*SEVERE: ReferenceError: "fetch" is not defined*

Looking this error up, it seems this "fetch" has something to do with
requests and responses across the network when accessing the webpage.  In
short, it seems like it's something implemented on the end of the owner of
the web page, not something I can inadvertently modify, and viewing the
webpage on a normal browser, the website looks fine, just as it always has
in the past.

Would someone please tell me what is going on here?  The code was working
perfectly one minute, but then yielded the above fetch error the next.

Here is the main method of my previously functional code:

public static void main(String[] args) throws Exception {
        String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868";

        try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX))
{
          // do not stop on js errors
          webClient.getOptions().setThrowExceptionOnScriptError(false);
          // do not log js errors (usually using this is a bad idea, at
least
          // if you are hunting for problems).
          webClient.setJavaScriptErrorListener(new
SilentJavaScriptErrorListener());

          HtmlPage page = webClient.getPage(uri);
          webClient.waitForBackgroundJavaScriptStartingBefore(10_000);

          final DomNodeList<DomNode> divs =
page.querySelectorAll("#Canonical-SMILES
.section-content .section-content-item p");
          for (DomNode div : divs) {
              System.out.println("----------------");
              System.out.println(div.asXml());
              System.out.println("----------------");
              System.out.println(div.asText());
              System.out.println("----------------");
          }
        }
    }

Oscar B.

Thread: [Htmlunit-user] Previously working connection no longer works

Java GUI-Less browser, supporting JavaScript, to run against web pages

htmlunit-user