|
From: Oscar B. <oba...@um...> - 2020-03-21 06:29:28
|
Hello, I have been trying to build a web scraping tool to obtain a string from a dynamically-loaded webpage. My objective was to obtain the string COC1=CC2=C(C=C1)NC3=C2CCNC3 from the section titled "Canonical SMILES" from the following website: https://pubchem.ncbi.nlm.nih.gov/compound/1868 Previously, I had working HTMLUnit code to access the above (thanks Ronald), but now the code is not working! Whereas before I would get printouts to the screen of my target information (COC1=CC2=C(C=C1)NC3=C2CCNC3 from the "Canonical Smiles" section of the above website - this information is displayed dynamically on the website), now instead, HTMLUnit returns the following error: *SEVERE: ReferenceError: "fetch" is not defined* Looking this error up, it seems this "fetch" has something to do with requests and responses across the network when accessing the webpage. In short, it seems like it's something implemented on the end of the owner of the web page, not something I can inadvertently modify, and viewing the webpage on a normal browser, the website looks fine, just as it always has in the past. Would someone please tell me what is going on here? The code was working perfectly one minute, but then yielded the above fetch error the next. Here is the main method of my previously functional code: public static void main(String[] args) throws Exception { String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868"; try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) { // do not stop on js errors webClient.getOptions().setThrowExceptionOnScriptError(false); // do not log js errors (usually using this is a bad idea, at least // if you are hunting for problems). webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener()); HtmlPage page = webClient.getPage(uri); webClient.waitForBackgroundJavaScriptStartingBefore(10_000); final DomNodeList<DomNode> divs = page.querySelectorAll("#Canonical-SMILES .section-content .section-content-item p"); for (DomNode div : divs) { System.out.println("----------------"); System.out.println(div.asXml()); System.out.println("----------------"); System.out.println(div.asText()); System.out.println("----------------"); } } } Oscar B. |