Menu

#45 wxXmlInputter reads too much of the stream!

open
5
2003-09-13
2003-08-09
Andy Ames
No

If the input streams have more than a single root XML
element in them, the Build() methods will not stop
reading at the end of the first root element.

The content following the first root element will be read,
and thus removed, from the stream in part or in whole.
This is bad.

This happens because 1024 bytes are read from the
stream at a time, no matter what the content is. Then,
once an entire XML element has been read, the function
returns.

A solution may be to read one character at a time,
sending the character to the parser until a XML element
has been read in its entirety. This is a slow solution,
though.

The other solution is to require the use of a pushback
stream. Once a XML element has been read, push the
unparsed content back into the stream. I'm not sure
how we'd implement this.

Discussion

  • Andy Ames

    Andy Ames - 2003-08-09
    • labels: 555171 --> wxConvey : wxXml
     
  • Andy Ames

    Andy Ames - 2003-08-12

    Example wxXmlInputter usage with incremental document builds

     
  • Andy Ames

    Andy Ames - 2003-08-12

    Logged In: YES
    user_id=265818

    A better solution than either of the ones presented would be
    to implement the Build() methods as one Build() method that
    returns no values.

    We can then advertise and document the wxXmlInputter as
    being an incremental builder. So, when you call Build(), you
    pass it a wxInputStream. The Build() method reads as much
    of the stream as it possibly can before returning.

    Then, the client invokes one of two different methods to get
    the actual XML content created:

    wxXmlDocument* GetDocument();
    wxXmlElement* GetElement();

    If there was not an entire XML document in the stream for all
    previous invocations of Build(), GetDocument() will return
    NULL. The same goes for GetElement(). However, if there
    were multiple XML elements in the stream, GetElement() may
    be invoked as many times as necessary to get the XML
    elements parsed. Once the GetElement() returns NULL, that
    means no more XML elements were parsed.

    However, an XML element may have begun being parsed. To
    finish parsing it, Buil() will have to be invoked again with the
    same wxInputStream, assuming there is more data on the
    stream waiting to be read.

    All of this incremental Build() stuff is in order to effectively
    parse XML data arriving on a wxSocketStream.

    For uses other than streaming XML, will might want to provide
    non-incremental Build() methods still. Anyway, here is my
    proposal for the new wxXmlInputter interface:

    // non-incremental build
    wxXmlDocument* BuildDocument(wxFile& file);
    wxXmlDocument* BuildDocument(const wxString& str);
    wxXmlDocument* BuildDocument(wxInputStream& is);

    // incremental build for XML streaming
    void StartBuild();
    void Build(wxInputStream& is);
    wxXmlDocument* GetDocument();
    wxXmlDocument* GetDocumentStart();
    wxXmlElement* GetElement();
    wxXmlElement* GetElementStart();

    The incremental Get***() methods work as follows:

    1) GetDocument() - returns a valid document if at least one
    entire document has been read on input since the last
    invocation of StartBuild().

    2) GetStartDocument() - returns a valid document if either a)
    the "xml" processing instruction was found on input, or b) a
    doctype declaration was found on input, or c) the first XML
    element start tag was found on input since the last call to
    StartBuild(), or d) any combination of (a), (b), and (c) is
    found on input in that order.

    3) GetElement() - returns a valid element if an entire XML
    element has been read on input since the last call to StartBuild
    ().

    4) GetElementStart() - returns a valid element if an entire
    XML element start tag or empty tag has been read on input
    since the last call to StartBuild().

    Once a valid pointer is returned from any of the Get***()
    methods, it is never again returned. If StartBuild() is called
    while the wxXmlInputter instance has XML document pieces
    yet to be returned, the wxXmlInputter frees the memory
    associated with these XML components. The same goes for
    deleting the wxXmlInputter instance.

    Was the object pointers are returned from the Get***()
    methods, it is the client's responsibility to free the memory.

    Also, the Get***Start() and Get***() methods should always
    return different pointers. That way, if an incremental build is
    performed, and the client does not care about responding to
    the start of a document or element, the client can choose
    not to call GetStart***().

    On the other hand, the client may choose to call the
    Get***Start() method and free the returned object
    independently of the object returned by the corresponding
    Get***() method.

    Attached to this Bug is an example client use of the proposed
    solution. The example illustrates how a Jabber client might
    deal with a streaming XML document.

     
  • Andy Ames

    Andy Ames - 2003-08-12

    Logged In: YES
    user_id=265818

    You know, a much less convoluted solution would be to use
    wxWindows events to notify the client that XML content had
    been read on input.

    The event handler macros can allow the client to specify to
    what element depth events should be generated at.

    Duh.

    So, like with the incremental building description, we have the
    following method:

    void Build(wxInputStream& is);

    However, we also have:

    void SetEventHandler(wxEvtHandler& handler, int id)

    and

    void SetElementDepth(int depth);

    Then, we content has been parsed, several event types may
    be sent:

    1) Start document
    2) Start element
    3) End element
    4) End document

    Start and End element events will only be fired at the depth
    that is set by invoking SetElementDepth().

     
  • Andy Ames

    Andy Ames - 2003-09-13
    • milestone: 325857 -->
     

Log in to post a comment.

MongoDB Logo MongoDB