#51 characters() instantiates a huge memory buffer

SAX_2.0
closed-rejected
nobody
5
2003-07-13
2003-07-11
Francois Loison
No

I'm parsing large text elements:
<test1>xxx ... 10MB ... xxx<test1>

Callback method characters() receives a 10MB character
buffer.
Actually, buffer's size is text element size.
This generates a out of memory exception.

I'd rather receive several characters() call, each call
sending a chunk of data with limited size.

Is there parser configuration to apply?

Tx,
Francois Loison

Discussion

  • Logged In: NO

    Sorry, this bug applies to an older version of SAX, please
    discard it.

    Regards,
    Franois Loison

     
    • status: open --> wont-fix
     
  • David Brownell
    David Brownell
    2003-07-13

    • status: wont-fix --> closed-rejected
     
  • David Brownell
    David Brownell
    2003-07-13

    Logged In: YES
    user_id=44117

    As I've pointed out before, this is a "quality of implementation'
    issue, not an API issue.

    SAX allows any number of characters to be reported. Most
    parsers return whatever's left in the buffer (up to something
    like a page at a time), then go process another buffer.

    That is SPECIFICALLY so that huge chunks of text can
    be broken up into smaller ones. The API has worked
    like that since the very earliest days (SAX 1.0 alpha).

    Now if you happen to have a brain-dead implementation,
    that's trying to allocate a 25000 or so pages at once,
    that's clearly not the fault of SAX. And you have an
    easy solution: switch to a parser that's not so stupid.