From: Ahmed A. <asa...@ya...> - 2015-11-13 10:03:04
|
Hi Teryl, In real Chrome, there are no generated elements as well. I am not sure how the website works. I guess you need to dig in the JavaScript, please read http://htmlunit.sourceforge.net/submittingJSBugs.html Ahmed From: Teryl Taylor <ter...@gm...> To: htm...@li... Sent: Friday, November 13, 2015 12:41 AM Subject: [Htmlunit-user] Question about scraping object/embed tags from website Hi guys, There is a website that runs a flash video: http://fast.wistia.net/embed/playlists/ba40yik7fl?loop=true&autoPlay=true&controlsVisibleOnLoad=false&version=v1&videoFoam=true And I'm trying to find the embed or object tags that launch the video, so I did the following: BrowserVersion browser = BrowserVersion.FIREFOX_38; PluginConfiguration acrobat = new PluginConfiguration("Adobe Acrobat", "Adobe PDF Plug-In For Firefox and Netscape 11.0.12", "11", "nppdf32.dll"); browser.getPlugins().add(acrobat);PluginConfiguration flash = new PluginConfiguration("Shockwave Flash","Shockwave Flash 19.0 r0", "19", "NPSWF32_19_0_0_226.dll"); //17,0,0,188 flash.getMimeTypes().add(new PluginConfiguration.MimeType("application/x-shockwave-flash", "Shockwave Flash", "swf")); browser.getPlugins().add(flash); webClient.addRequestHeader("ClientIP", clientIP); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setAppletEnabled(true); webClient.getOptions().setCssEnabled(true); webClient.getOptions().setTimeout(20000); webClient.setJavaScriptTimeout(5000); webClient.getOptions().setJavaScriptEnabled(true); webClient.getCookieManager().setCookiesEnabled(true); webClient.setAjaxController(new NicelyResynchronizingAjaxController()); Page p = webClient.getPage(url);if(p != null && p.isHtmlPage()){ int scriptsWaiting = webClient.waitForBackgroundJavaScript(30000); System.out.println("Scripts waiting: " + Integer.toString(scriptsWaiting)); HtmlPage page = (HtmlPage)p; java.util.List<HtmlEmbed> embeds = (List<HtmlEmbed>) page.getByXPath("//embed"); for(HtmlEmbed embed: embeds) { System.out.println("Embed tag found..\n"); } java.util.List<HtmlObject> objects = (List<HtmlObject>) page.getByXPath("//object"); for(HtmlObject object: objects) { System.out.println("Object tag found..\n"); } } The website just sits there for the 30 seconds, and no tags are found. Everything is generated by javascript. Am I searching for the tags in the proper way? Or is there some DOM object I should be checking? There are no exceptions in HTMLUnit. And hte site loads up great in the regular firefox. Any advice you could give would be great. Best, Teryl ------------------------------------------------------------------------------ _______________________________________________ Htmlunit-user mailing list Htm...@li... https://lists.sourceforge.net/lists/listinfo/htmlunit-user |