Re: [Htmlunit-user] problems with crawling a table when html contains custom tags

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I am already using the functions to wait for the javascript.

I used both
waitForBackgroundJavaScript(10000);

and
waitForBackgroundJavaScriptStartingBefore(10000)

and it did not help.

Besides this I need a generic solution, which can be achieved.

For example jsoup knows how to cope with this html

package test;

import java.io.File;
import java.io.IOException;

import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.jsoup.Jsoup;
public class JsoupTest1 {

	public static void main(String[] args) throws IOException {
		File in = new File("l.html");
		Document doc = Jsoup.parse(in, null);
		Elements elems = doc.select("table");

		for (Element elem:elems) {
			System.out.println(elem.text());
		}

	}

}

Maybe i should file a bug, but i don't think that there is a reason
for executing the special javascript command.

Thanks,

David

>In the source page i could see   body tag appended with:

 >onload="hideDiv(true);initBoxes('listview');callSubScroll('frm_tagged_documents',0,1);updateResultsNav();reloadClassification('false');scrollToHitPos('false');"
>onUnload="storeScrollToHitPos('false');

>Execute this js functions then try extracting the page.

>Once more thing what is your desired output

Re: [Htmlunit-user] problems with crawling a table when html contains custom tags

Java GUI-Less browser, supporting JavaScript, to run against web pages

Re: [Htmlunit-user] problems with crawling a table when html contains custom tags