From: Valentin P. <v.v...@gm...> - 2014-04-28 07:37:51
|
Hello I want to get visible text from this page: http://zeitarbeit-online-datenbank.de/index.php?art_id=53466&load=10,1,1&search=such_taetigkeit|;such_kategorie|;sort_order|erstellt_am;sort_dir|desc;such_erst|;such_zeitraum|;such_bl|;such_ort|;such_plz|;such_term|;nmbr|4;ap|0 For this I use the bellow simple code: final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17); HtmlPage page = webClient.getPage(" http://zeitarbeit-online-datenbank.de/index.php?art_id=53466&load=10,1,1&search=such_taetigkeit|;such_kategorie|;sort_order|erstellt_am;sort_dir|desc;such_erst|;such_zeitraum|;such_bl|;such_ort|;such_plz|;such_term|;nmbr|4;ap|0 "); System.out.println(page.asText()); The output is not what it should be. Looking in html I can see this: <table cellpadding="0" cellspacing="0" align="right" border="0"> <form name="suchformular" method="get" action="index.php"></form> <input name="load" value="7" type="hidden"> <input name="sid" value="c23371c7273e7a23add375c01fe35183" type="hidden"> <input name="abgesendet" value="yes" type="hidden"> <tbody><tr> <td align="right" height="32" valign="bottom"> As you can see <tr> is in form instead to be after <table> but in browser this is working. Looking in HtmlSerializer class from htmlunit I can see that for tables is a special method like this: else if (node instanceof HtmlTable) { appendHtmlTable((HtmlTable) node); } private void appendHtmlTable(final HtmlTable htmlTable) { doAppendBlockSeparator(); final String caption = htmlTable.getCaptionText(); if (caption != null) { doAppend(caption); doAppendBlockSeparator(); } boolean first = true; // first thead has to be displayed first and first tfoot has to be displayed last final HtmlTableHeader tableHeader = htmlTable.getHeader(); if (tableHeader != null) { first = appendHtmlTableRows(tableHeader.getRows(), true, null, null); } final HtmlTableFooter tableFooter = htmlTable.getFooter(); first = appendHtmlTableRows(htmlTable.getRows(), first, tableHeader, tableFooter); if (tableFooter != null) { first = appendHtmlTableRows(tableFooter.getRows(), first, null, null); } doAppendBlockSeparator(); } When this: htmlTable.getRows() is called it is not able to return the rows because the rows are in form. Any idea how to fix this? Or do you know why htmlunit use speial routine to print tables? Best Regards Valentin |