From: chris d. <ach...@gm...> - 2014-04-29 12:07:48
|
If you could share the URL its easy to view content in the browser On Tue, Apr 29, 2014 at 5:20 PM, David Michael Gang <mic...@gm...>wrote: > Hi all, > > I crawled a page and the HtmlPage.asText function did not return the > desired result: > I tracked it down that somehow that the table rows were ignored. > The reason is that in the crawled page the table trs were wrapped into a > doc tag. > > <table width="100%" border="0" cellspacing="0" cellpadding="0"> > <tbody> > <tr class="nopadding"> > <td> > <a name="DOCNO_1"> > </a> > </td> > </tr> > <doc> > <tr valign="baseline" height="8" class="toprow"> > <th width="6%" align="left" valign="center" > nowrap="nowrap"> > <span> > <input type="checkbox" id="frm_control_box" > title="Click here to select or de-select all" name="frm_control_box" > value="checkbox" onclick="javascript:subSetAllSelectionStatus()"/> > </span> > </th> > <th width="94%" align="left" valign="center" > nowrap="nowrap"> > <span> > Results > </span> > </th> > </tr> > <tr class="noshaderow1st" style="padding-bottom: > 8px;" height="8" valign="baseline"> > <td width="6%" align="left" nowrap="nowrap" > valign="top"> > <input onclick="javascript:manageBox('1')" > type="checkbox" value="1" name="frm_tagged_documents" title="Click here to > deliver or to view tagged documents" id="frm_tagged_documents1"/> > <label style="{cursor: pointer; cursor: hand;}" > for="frm_tagged_documents1"> > 1. > </label> > </td> > <td width="94%" align="left" valign="top"> > <a href="aaa" target="_parent"> > aaa > </a> > <br class="br"/> > <span class="notranslate"> > bbb > </span> > , November 19, 2011, Pg. 7, 758 words > </td> > </tr> > </doc> > <tr class="nopadding"> > <td> > <a name="DOCNO_2"> > </a> > </td> > </tr> > <doc> > <tr class="shaderow1st" style="padding-bottom: 8px;" > height="8" valign="baseline"> > <td width="6%" align="left" nowrap="nowrap" > valign="top"> > <input onclick="javascript:manageBox('2')" > type="checkbox" value="2" name="frm_tagged_documents" title="Click here to > deliver or to view tagged documents" id="frm_tagged_documents2"/> > <label style="{cursor: pointer; cursor: hand;}" > for="frm_tagged_documents2"> > 2. > </label> > </td> > <td width="94%" align="left" valign="top"> > <a href="ccc" target="_parent"> > ddd > </a> > <br class="br"/> > <span class="notranslate"> > eee > </span> > , November 19, 2011, Pg. 18, 1216 words, MICHAEL > HENDERSON > </td> > </tr> > </doc> > > </tbody> > In firefox the page is displayed nice. > Is it somehow possible to tell htmlunit to ignore the doc tag and recurse > into it to find the tr tag? > > > Thanks, > David > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Htmlunit-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlunit-user > > -- *" What we sow we will reap."* |