From: Ahmed A. <asa...@ya...> - 2013-12-18 15:14:19
|
Hi Tobias, - Please use latest version if you aren't already - You need to provide complete details (hopefully small) so others can reproduce your issue. - Please read http://htmlunit.sourceforge.net/submittingJSBugs.html Ahmed ________________________________ From: Tobias Ceglarek <ma...@ce...> To: htm...@li... Sent: Wednesday, December 18, 2013 4:23 PM Subject: [Htmlunit-user] Nested Frames with ASPX Hello, this is my first contribution to list. I am a hobbyist and just try out to scrape a site which gives me a table with some informations of interest for me. As you can see in the code below I am loading a page and clicking some links and submit a form. This works very well. Then things going complicated: In a browser you see a button. Which is created by an server side script (aspx). Clicking on the button another server side script is executed. This is how I get the link to the second server side script: link = page.getByXPath("//a[@onclick]")[0] Now I have a html-page. In this page there is a iframe with a nested server side script (aspx). With frame = page.getFrames().get(0) and page = frame.getEnclosedPage() I succesfully retrieve the html-page of this first frame. In this html-page there are again two server side scripts nested. But if I try to receive the html-page I just retrieve a JavaScriptPage with the content: „you cannot open directly.“ For me these ugly nesting of Frames and server side scripts is to avoid scraping. Can anybody help me ? Regards, Tobias Here is my code: import com.gargoylesoftware.htmlunit.WebClient as WebClient import com.gargoylesoftware.htmlunit.BrowserVersion as BrowserVersion def main(): webclient = WebClient(BrowserVersion.FIREFOX_17) url = „<URL>" page = webclient.getPage(url) print "new page loaded: "+url link = page.getByXPath("//a[@href='index.php?id=733']")[1] page = link.click() print "link clicked and new page loaded: "+page.getUrl().toString() form = page.getByXPath("//form[@action='index.php?id=intern']")[0] user = form.getInputByName("user") passw = form.getInputByName("pass") button = form.getInputByName("submit") user.setValueAttribute("cgl") passw.setValueAttribute("rattamahatta") page = button.click() print "form submitted and new page loaded: "+page.getUrl().toString() link = page.getByXPath("//a[@href='index.php?id=837']")[0] page = link.click() print "link clicked and new page loaded: "+page.getUrl().toString() link = page.getByXPath("//a[@onclick]")[0] page = link.click() print "button clicked and new page loaded: "+page.getUrl().toString() frame = page.getFrames().get(0) page = frame.getEnclosedPage() print "new iframe loaded: "+page.getUrl().toString() frames = page.getFrames() page1 = frames.get(0).getEnclosedPage() print "new iframe loaded: "+page1.getUrl().toString() # break if __name__ == '__main__': main() ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ Htmlunit-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlunit-user |