[Htmlunit-user] Nested Frames with ASPX

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hello,

this is my first contribution to list. I am a hobbyist and just try out to scrape a site which gives me a table with some informations of interest for me.

As you can see in the code below I am loading a page and clicking some links and submit a form. This works very well.

Then things going complicated: 

In a browser you see a button. Which is created by an server side script (aspx). Clicking on the button another server side script is executed. This is how I get the link to the second server side script:
link = page.getByXPath("//a[@onclick]")[0]

Now I have a html-page. In this page there is a iframe with a nested server side script (aspx). With
frame = page.getFrames().get(0)
and
page = frame.getEnclosedPage()
I succesfully retrieve the html-page of this first frame.

In this html-page there are again two server side scripts nested. But if I try to receive the html-page I just retrieve a JavaScriptPage with the content: „you cannot open directly.“

For me these ugly nesting of Frames and server side scripts is to avoid scraping.

Can anybody help me ?

Regards,

Tobias 

Here is my code:

import com.gargoylesoftware.htmlunit.WebClient as WebClient

import com.gargoylesoftware.htmlunit.BrowserVersion as BrowserVersion

def main():
   webclient = WebClient(BrowserVersion.FIREFOX_17)
   url = „<URL>"

   page = webclient.getPage(url)
   print "new page loaded: "+url
   link = page.getByXPath("//a[@href='index.php?id=733']")[1]

   page = link.click()
   print "link clicked and new page loaded: "+page.getUrl().toString()
   form = page.getByXPath("//form[@action='index.php?id=intern']")[0]
   user = form.getInputByName("user")
   passw = form.getInputByName("pass")
   button = form.getInputByName("submit")
   user.setValueAttribute("cgl")
   passw.setValueAttribute("rattamahatta")

   page = button.click()
   print "form submitted and new page loaded: "+page.getUrl().toString()
   link = page.getByXPath("//a[@href='index.php?id=837']")[0]

   page = link.click()
   print "link clicked and new page loaded: "+page.getUrl().toString()
   link = page.getByXPath("//a[@onclick]")[0]

   page = link.click()
   print "button clicked and new page loaded: "+page.getUrl().toString()
   frame = page.getFrames().get(0)

   page = frame.getEnclosedPage()
   print "new iframe loaded: "+page.getUrl().toString()
   frames = page.getFrames()

   page1 = frames.get(0).getEnclosedPage()
   print "new iframe loaded: "+page1.getUrl().toString()

#     break

if __name__ == '__main__':
   main()

[Htmlunit-user] Nested Frames with ASPX

Java GUI-Less browser, supporting JavaScript, to run against web pages

[Htmlunit-user] Nested Frames with ASPX