On 9/26/06, Evan Schoenberg <evan@...> wrote:
> A potentially related question: The comment in that function:
> /* libxml inconsistently starts parsing on creating the
> * parser, so do a ParseChunk right afterwards to force it. */
> js->context =
> xmlCreatePushParserCtxt(&jabber_parser_libxml, js, buf,
> len, NULL);
> xmlParseChunk(js->context, NULL, 0, 0);
> is fairly suspicious.. which is a library we're considering reliable enough
> to depend upon doing anything 'inconsistently'?
xmlCreatePushParserCtxt isn't intended to be the "parsing" function.
It takes buf and len primarily to determine the encoding to use. When
this happens, it creates a temporary buffer that it uses to determine
the encoding and potentially convert it to whatever libxml2 uses
internally (UTF-8, I imagine). Pushing 0 bytes to the parser has the
effect of flushing this buffer and keeping the parsing from stalling.
Code within libxml2 appears to do the same thing:
* If we are operating on converted input, try to flush
* remainng chars to avoid them stalling in the non-converted
xmlParserInputBufferPush(ctxt->input->buf, 0, "");
This probably isn't typically a problem for libxml2 apps, since they
typically read from file, and will soon have more data to parse. Or
they'd close the parser at EOF, which would flush the buffer. Jabber's
use of this is very unique in that we don't get any more XML data to
parse until we parse what the server sent, trigger our callbacks, and
send a response.
I haven't spoken with anyone on libxml; this is just what I gathered
by reading source.
I'll look into your crash bugs later.