Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#13 Infinite loop in ContentScanner.nextContent()

closed-fixed
None
5
2012-12-21
2010-01-17
Ahmed Ashour
No

Hi all,

Attached is a test case that goes into an inifinite loop on calling ContentScanner.nextContent().

The only drawback I see is that not all of fCurrentEntity.stream support mark()ing, but nextContent() calls read() which can potentially calls load() which can get more buffer from the stream. I don't know how nextContent() can return back to the initial stream status if mark() is not supported.

Hope you find the patch useful.

P.S.:
1- original HtmlUnit bug resides in https://sourceforge.net/tracker/?func=detail&aid=2933404&group_id=47038&atid=448266
2- Current SVN build gives error on my machine

Discussion

  • Ahmed Ashour
    Ahmed Ashour
    2010-01-17

    Proposed Patch

     
    Attachments
  • reinhard
    reinhard
    2010-01-21

    i have applied this patch to svn version of nekohtml and have now the problem that some text is missing in
    htmlunit by using page.asText().
    if i reverse to nekohtml version nekohtml-1.9.14-20091214.212542-7.jar, the text missed is there.
    the url of the document is

    http://www.kunstmeile-krems.at/content/vdbevent.html?id=373711

    the text i miss is inside of

    <!-- Event Detail -->
    >> <span id="datum">Th 04.02.2010<br />18:30 </span>
    >> <h2 id="titel">WHATEVER WORKS<br /><span id="untertitel">OmU</span></h2>

     
  • reinhard
    reinhard
    2010-01-21

    another comment:
    as long as i have used the path, i have not seen hangs/infinite loops. this is may be solved.

     
  • Ahmed Ashour
    Ahmed Ashour
    2010-01-21

    >> as long as i have used the path, i have not seen hangs/infinite loops

    I don't understand Reinhard what do you mean by 'path', does the patched Neko work, or doesn't it work.

    nextContent() tries to read bytes 'in advance' and should return all offset/length to the status as exactly before calling it.

    This patch identifies a case where ''fCurrentEntity.length" is modified by calling nextContent(), which shouldn't happen, so it is restored to its original value

     
  • reinhard
    reinhard
    2010-01-21

    sorry, "path" is a misspelling of patch. i have meant patch.

     
  • reinhard
    reinhard
    2010-01-21

    the patched version seems to work until i have discovered some missing parts in the text.
    i have now reversed back to a snapshot from december, because the missing parts in text are too important.

     
  • Marc Guillemot
    Marc Guillemot
    2010-01-22

    Now fixed in SVN. Thanks for reporting.

    I haven't applied your fix because I wanted to fix the problem more "in deep" and avoid to use mark().

    @rschwab: can you test if this works fine for you now?

     
  • Marc Guillemot
    Marc Guillemot
    2010-01-22

    • assigned_to: nobody --> mguillem
    • status: open --> closed
     
  • Marc Guillemot
    Marc Guillemot
    2010-01-22

    • status: closed --> closed-fixed
     
  • reinhard
    reinhard
    2010-01-22

    ok, i will test it asap. thanks.

     
  • reinhard
    reinhard
    2010-01-22

    it seems the api has changed. so i have to update htmlunit too? because
    i get some errors when only replacing the nekohtml jar with the latest svn version.

     
  • Ahmed Ashour
    Ahmed Ashour
    2010-01-22

    Yes, replacing "read()" with "fCurrentEntity.read();" still gives compiler error "The method read() from the type HTMLScanner.CurrentEntity is not visible"

     
  • Ahmed Ashour
    Ahmed Ashour
    2010-01-22

    • status: closed-fixed --> open-fixed
     
  • Marc Guillemot
    Marc Guillemot
    2010-01-22

    I've restored the protected read() method allowing HtmlUnit to compile.

     
  • Marc Guillemot
    Marc Guillemot
    2012-12-21

    Closing. If there is any issue with HtmlUnit, please report it in HtmlUnit's tracker.

     
  • Marc Guillemot
    Marc Guillemot
    2012-12-21

    • status: open-fixed --> closed-fixed