Re: [Htmlunit-user] Characters on Webpage not Appearing in HTM

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Oscar,

this code works for me

    public static void main(String[] args) throws Exception {
        String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868";

        try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
          // do not stop on js errors
          webClient.getOptions().setThrowExceptionOnScriptError(false);
          // do not log js errors (usually using this is a bad idea, at least
          // if you are hunting for problems).
          webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener());

          HtmlPage page = webClient.getPage(uri);
          webClient.waitForBackgroundJavaScriptStartingBefore(10_000);

          final DomNodeList<DomNode> divs = page.querySelectorAll("#Canonical-SMILES .section-content .section-content-item p");
          for (DomNode div : divs) {
              System.out.println("----------------");
              System.out.println(div.asXml());
              System.out.println("----------------");
              System.out.println(div.asText());
              System.out.println("----------------");
          }
        }
    }

	RBRi

On Tue, 10 Mar 2020 12:54:31 -0500 Oscar Bastidas wrote:
>
>Hello,
>
>I am trying to make a copy of/obtain a string that appears on a webpage
>when the webpage loads on my browser but when I look at the HTML code of
>the webpage in question, I do not see the string at all (it is no where to
>be found in the HTML code).
>
>Here is the URL:
>https://pubchem.ncbi.nlm.nih.gov/compound/1868
>
>and here is my target string:
>COC1=CC2=C(C=C1)NC3=C2CCNC3
>
>The above target string is found under the heading of "2.1.4 Canonical
>SMILES" (this heading doesn't appear either in the HTML code).
>
>Could someone please tell me if this is a special case that cannot be
>scraped?  Thanks.
>
>Oscar B.
>
>
>
>----< Inline text [text-plain-04.txt] >------------------
>
>
>
>
>----< Inline text [text-plain-05.txt] >------------------
>
>_______________________________________________
>Htmlunit-user mailing list
>Htm...@li...
>https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>
>

Re: [Htmlunit-user] Characters on Webpage not Appearing in HTM

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] Characters on Webpage not Appearing in HTM