Re: [Htmlunit-user] Previously working connection no longer works

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Oscar,

i will try a simple anwer - if you like to automate a web page it will help to know as much as possible about all the strange technologies working together to bring the content to your screen
at least
* html
* css
* javascript
* xml

Usually i use the developer tools from firefox (or CHROME if you like to sponsor this company ;-) to unalyze the page a bit. You can start with a right click on the content you are interested in and select inspect Element.
Then you have to decide for a way to locate the element. The differnt option are introduces at http://htmlunit.sourceforge.net/gettingStarted.html under the Finding a specific element topic.

Hope that helps

	RBRi

On Sat, 21 Mar 2020 01:27:07 -0500 Oscar Bastidas wrote:
>
>Hi Ronald,
>
>I had one other quick question if it's not too much trouble: would you
>please tell me how you knew to search for "#Canonical-SMILES" in the code
>you sent me?  Since the word "SMILES" is nowhere to be found in the source
>HTML, I was curious as to how you knew to search specifically for
>"#Canonical-SMILES"
>in the actual Java code (knowing this would help me scrape for other
>strings on the dynamic webpage).
>
>Lastly, are there any resources you specifically recommend or liked for
>learning how to do the kind of HTMLUnit webscraping you've helped me with
>here?  If it's just reading general tutorials online, that's ok, it's where
>I'm starting now.  Thanks again.
>
>Oscar B.
>
>On Sun, Mar 15, 2020, 9:22 AM Ronald Brill <rb...@rb...> wrote:
>
>> Hi Oscar,
>>
>> this code works for me
>>
>>     public static void main(String[] args) throws Exception {
>>         String uri = "https://pubchem.ncbi.nlm.nih.gov/compound/1868";
>>
>>         try (final WebClient webClient = new
>> WebClient(BrowserVersion.FIREFOX)) {
>>           // do not stop on js errors
>>           webClient.getOptions().setThrowExceptionOnScriptError(false);
>>           // do not log js errors (usually using this is a bad idea, at
>> least
>>           // if you are hunting for problems).
>>           webClient.setJavaScriptErrorListener(new
>> SilentJavaScriptErrorListener());
>>
>>           HtmlPage page = webClient.getPage(uri);
>>           webClient.waitForBackgroundJavaScriptStartingBefore(10_000);
>>
>>           final DomNodeList<DomNode> divs =
>> page.querySelectorAll("#Canonical-SMILES .section-content
>> .section-content-item p");
>>           for (DomNode div : divs) {
>>               System.out.println("----------------");
>>               System.out.println(div.asXml());
>>               System.out.println("----------------");
>>               System.out.println(div.asText());
>>               System.out.println("----------------");
>>           }
>>         }
>>     }
>>
>>         RBRi
>>
>>
>> On Tue, 10 Mar 2020 12:54:31 -0500 Oscar Bastidas wrote:
>> >
>> >Hello,
>> >
>> >I am trying to make a copy of/obtain a string that appears on a webpage
>> >when the webpage loads on my browser but when I look at the HTML code of
>> >the webpage in question, I do not see the string at all (it is no where to
>> >be found in the HTML code).
>> >
>> >Here is the URL:
>> >https://pubchem.ncbi.nlm.nih.gov/compound/1868
>> >
>> >and here is my target string:
>> >COC1=CC2=C(C=C1)NC3=C2CCNC3
>> >
>> >The above target string is found under the heading of "2.1.4 Canonical
>> >SMILES" (this heading doesn't appear either in the HTML code).
>> >
>> >Could someone please tell me if this is a special case that cannot be
>> >scraped?  Thanks.
>> >
>> >Oscar B.
>> >
>> >
>> >
>> >----< Inline text [text-plain-04.txt] >------------------
>> >
>> >
>> >
>> >
>> >----< Inline text [text-plain-05.txt] >------------------
>> >
>> >_______________________________________________
>> >Htmlunit-user mailing list
>> >Htm...@li...
>> >https://lists.sourceforge.net/lists/listinfo/htmlunit-user
>> >
>> >
>>
>>
>
>

Re: [Htmlunit-user] Previously working connection no longer works

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] Previously working connection no longer works