|
From: Ronald B. <rb...@rb...> - 2020-03-15 14:22:49
|
Hi Oscar,
this code works for me
public static void main(String[] args) throws Exception {
String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
// do not stop on js errors
webClient.getOptions().setThrowExceptionOnScriptError(false);
// do not log js errors (usually using this is a bad idea, at least
// if you are hunting for problems).
webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener());
HtmlPage page = webClient.getPage(uri);
webClient.waitForBackgroundJavaScriptStartingBefore(10_000);
final DomNodeList<DomNode> divs = page.querySelectorAll("#Canonical-SMILES .section-content .section-content-item p");
for (DomNode div : divs) {
System.out.println("----------------");
System.out.println(div.asXml());
System.out.println("----------------");
System.out.println(div.asText());
System.out.println("----------------");
}
}
}
RBRi
On Tue, 10 Mar 2020 12:54:31 -0500 Oscar Bastidas wrote:
>
>Hello,
>
>I am trying to make a copy of/obtain a string that appears on a webpage
>when the webpage loads on my browser but when I look at the HTML code of
>the webpage in question, I do not see the string at all (it is no where to
>be found in the HTML code).
>
>Here is the URL:
>https://pubchem.ncbi.nlm.nih.gov/compound/1868
>
>and here is my target string:
>COC1=CC2=C(C=C1)NC3=C2CCNC3
>
>The above target string is found under the heading of "2.1.4 Canonical
>SMILES" (this heading doesn't appear either in the HTML code).
>
>Could someone please tell me if this is a special case that cannot be
>scraped? Thanks.
>
>Oscar B.
>
>
>
>----< Inline text [text-plain-04.txt] >------------------
>
>
>
>
>----< Inline text [text-plain-05.txt] >------------------
>
>_______________________________________________
>Htmlunit-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
>
|