From: Ronald B. <rb...@rb...> - 2020-03-22 12:55:09
|
Hi Oscar, that is not so uncommon. From time to time the providers are changing there web pages by using other js library or only newer versions. In you case it look like the page js code is change and now uses the fetch api. This api is not supported b HtmlUnit at the moment. With a bit of luck you can ignore this. Add webClient.getOptions().setThrowExceptionOnScriptError(false); directly after the webClient creation. Hope that helps. If not please open an issue at github and i will have a deeper look. RBRi On Sat, 21 Mar 2020 01:27:07 -0500 Oscar Bastidas wrote: >Hello, > >I have been trying to build a web scraping tool to obtain a string from a >dynamically-loaded webpage. > >My objective was to obtain the string COC1=CC2=C(C=C1)NC3=C2CCNC3 from the >section titled "Canonical SMILES" from the following website: >https://pubchem.ncbi.nlm.nih.gov/compound/1868 > >Previously, I had working HTMLUnit code to access the above (thanks >Ronald), but now the code is not working! Whereas before I would get >printouts to the screen of my target information (COC1=CC2=C(C=C1)NC3=C2CCNC3 >from the "Canonical Smiles" section of the above website - this information >is displayed dynamically on the website), now instead, HTMLUnit returns the >following error: > >*SEVERE: ReferenceError: "fetch" is not defined* > >Looking this error up, it seems this "fetch" has something to do with >requests and responses across the network when accessing the webpage. In >short, it seems like it's something implemented on the end of the owner of >the web page, not something I can inadvertently modify, and viewing the >webpage on a normal browser, the website looks fine, just as it always has >in the past. > >Would someone please tell me what is going on here? The code was working >perfectly one minute, but then yielded the above fetch error the next. > >Here is the main method of my previously functional code: > >public static void main(String[] args) throws Exception { > String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868"; > > try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) >{ > // do not stop on js errors > webClient.getOptions().setThrowExceptionOnScriptError(false); > // do not log js errors (usually using this is a bad idea, at >least > // if you are hunting for problems). > webClient.setJavaScriptErrorListener(new >SilentJavaScriptErrorListener()); > > HtmlPage page = webClient.getPage(uri); > webClient.waitForBackgroundJavaScriptStartingBefore(10_000); > > final DomNodeList<DomNode> divs = >page.querySelectorAll("#Canonical-SMILES >..section-content .section-content-item p"); > for (DomNode div : divs) { > System.out.println("----------------"); > System.out.println(div.asXml()); > System.out.println("----------------"); > System.out.println(div.asText()); > System.out.println("----------------"); > } > } > } > >Oscar B. > > > >----< Inline text [text-plain-04.txt] >------------------ > > > > >----< Inline text [text-plain-05.txt] >------------------ > >_______________________________________________ >Htmlunit-user mailing list >Htm...@li... >https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > |