From: SourceForge.net <no...@so...> - 2010-01-22 09:02:17
|
Patches item #2933989, was opened at 2010-01-17 23:16 Message generated for change (Settings changed) made by mguillem You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=952180&aid=2933989&group_id=195122 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Ahmed Ashour (asashour) Assigned to: Marc Guillemot (mguillem) Summary: Infinite loop in ContentScanner.nextContent() Initial Comment: Hi all, Attached is a test case that goes into an inifinite loop on calling ContentScanner.nextContent(). The only drawback I see is that not all of fCurrentEntity.stream support mark()ing, but nextContent() calls read() which can potentially calls load() which can get more buffer from the stream. I don't know how nextContent() can return back to the initial stream status if mark() is not supported. Hope you find the patch useful. P.S.: 1- original HtmlUnit bug resides in https://sourceforge.net/tracker/?func=detail&aid=2933404&group_id=47038&atid=448266 2- Current SVN build gives error on my machine ---------------------------------------------------------------------- Comment By: Marc Guillemot (mguillem) Date: 2010-01-22 10:01 Message: Now fixed in SVN. Thanks for reporting. I haven't applied your fix because I wanted to fix the problem more "in deep" and avoid to use mark(). @rschwab: can you test if this works fine for you now? ---------------------------------------------------------------------- Comment By: reinhard (rschwab) Date: 2010-01-21 20:11 Message: the patched version seems to work until i have discovered some missing parts in the text. i have now reversed back to a snapshot from december, because the missing parts in text are too important. ---------------------------------------------------------------------- Comment By: reinhard (rschwab) Date: 2010-01-21 20:10 Message: sorry, "path" is a misspelling of patch. i have meant patch. ---------------------------------------------------------------------- Comment By: Ahmed Ashour (asashour) Date: 2010-01-21 19:48 Message: >> as long as i have used the path, i have not seen hangs/infinite loops I don't understand Reinhard what do you mean by 'path', does the patched Neko work, or doesn't it work. nextContent() tries to read bytes 'in advance' and should return all offset/length to the status as exactly before calling it. This patch identifies a case where ''fCurrentEntity.length" is modified by calling nextContent(), which shouldn't happen, so it is restored to its original value ---------------------------------------------------------------------- Comment By: reinhard (rschwab) Date: 2010-01-21 16:52 Message: another comment: as long as i have used the path, i have not seen hangs/infinite loops. this is may be solved. ---------------------------------------------------------------------- Comment By: reinhard (rschwab) Date: 2010-01-21 16:43 Message: i have applied this patch to svn version of nekohtml and have now the problem that some text is missing in htmlunit by using page.asText(). if i reverse to nekohtml version nekohtml-1.9.14-20091214.212542-7.jar, the text missed is there. the url of the document is http://www.kunstmeile-krems.at/content/vdbevent.html?id=373711 the text i miss is inside of <!-- Event Detail --> >> <span id="datum">Th 04.02.2010<br />18:30 </span> >> <h2 id="titel">WHATEVER WORKS<br /><span id="untertitel">OmU</span></h2> ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=952180&aid=2933989&group_id=195122 |