#51 characters() instantiates a huge memory buffer


I'm parsing large text elements:
<test1>xxx ... 10MB ... xxx<test1>

Callback method characters() receives a 10MB character
Actually, buffer's size is text element size.
This generates a out of memory exception.

I'd rather receive several characters() call, each call
sending a chunk of data with limited size.

Is there parser configuration to apply?

Francois Loison


  • Nobody/Anonymous

    Logged In: NO

    Sorry, this bug applies to an older version of SAX, please
    discard it.

    Franois Loison

  • Francois Loison

    Francois Loison - 2003-07-11
    • status: open --> wont-fix
  • Anonymous - 2003-07-13
    • status: wont-fix --> closed-rejected
  • Anonymous - 2003-07-13

    Logged In: YES

    As I've pointed out before, this is a "quality of implementation'
    issue, not an API issue.

    SAX allows any number of characters to be reported. Most
    parsers return whatever's left in the buffer (up to something
    like a page at a time), then go process another buffer.

    That is SPECIFICALLY so that huge chunks of text can
    be broken up into smaller ones. The API has worked
    like that since the very earliest days (SAX 1.0 alpha).

    Now if you happen to have a brain-dead implementation,
    that's trying to allocate a 25000 or so pages at once,
    that's clearly not the fault of SAX. And you have an
    easy solution: switch to a parser that's not so stupid.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks