If the input streams have more than a single root XML
element in them, the Build() methods will not stop
reading at the end of the first root element.
The content following the first root element will be read,
and thus removed, from the stream in part or in whole.
This is bad.
This happens because 1024 bytes are read from the
stream at a time, no matter what the content is. Then,
once an entire XML element has been read, the function
returns.
A solution may be to read one character at a time,
sending the character to the parser until a XML element
has been read in its entirety. This is a slow solution,
though.
The other solution is to require the use of a pushback
stream. Once a XML element has been read, push the
unparsed content back into the stream. I'm not sure
how we'd implement this.
Example wxXmlInputter usage with incremental document builds
Logged In: YES
user_id=265818
A better solution than either of the ones presented would be
to implement the Build() methods as one Build() method that
returns no values.
We can then advertise and document the wxXmlInputter as
being an incremental builder. So, when you call Build(), you
pass it a wxInputStream. The Build() method reads as much
of the stream as it possibly can before returning.
Then, the client invokes one of two different methods to get
the actual XML content created:
wxXmlDocument* GetDocument();
wxXmlElement* GetElement();
If there was not an entire XML document in the stream for all
previous invocations of Build(), GetDocument() will return
NULL. The same goes for GetElement(). However, if there
were multiple XML elements in the stream, GetElement() may
be invoked as many times as necessary to get the XML
elements parsed. Once the GetElement() returns NULL, that
means no more XML elements were parsed.
However, an XML element may have begun being parsed. To
finish parsing it, Buil() will have to be invoked again with the
same wxInputStream, assuming there is more data on the
stream waiting to be read.
All of this incremental Build() stuff is in order to effectively
parse XML data arriving on a wxSocketStream.
For uses other than streaming XML, will might want to provide
non-incremental Build() methods still. Anyway, here is my
proposal for the new wxXmlInputter interface:
// non-incremental build
wxXmlDocument* BuildDocument(wxFile& file);
wxXmlDocument* BuildDocument(const wxString& str);
wxXmlDocument* BuildDocument(wxInputStream& is);
// incremental build for XML streaming
void StartBuild();
void Build(wxInputStream& is);
wxXmlDocument* GetDocument();
wxXmlDocument* GetDocumentStart();
wxXmlElement* GetElement();
wxXmlElement* GetElementStart();
The incremental Get***() methods work as follows:
1) GetDocument() - returns a valid document if at least one
entire document has been read on input since the last
invocation of StartBuild().
2) GetStartDocument() - returns a valid document if either a)
the "xml" processing instruction was found on input, or b) a
doctype declaration was found on input, or c) the first XML
element start tag was found on input since the last call to
StartBuild(), or d) any combination of (a), (b), and (c) is
found on input in that order.
3) GetElement() - returns a valid element if an entire XML
element has been read on input since the last call to StartBuild
().
4) GetElementStart() - returns a valid element if an entire
XML element start tag or empty tag has been read on input
since the last call to StartBuild().
Once a valid pointer is returned from any of the Get***()
methods, it is never again returned. If StartBuild() is called
while the wxXmlInputter instance has XML document pieces
yet to be returned, the wxXmlInputter frees the memory
associated with these XML components. The same goes for
deleting the wxXmlInputter instance.
Was the object pointers are returned from the Get***()
methods, it is the client's responsibility to free the memory.
Also, the Get***Start() and Get***() methods should always
return different pointers. That way, if an incremental build is
performed, and the client does not care about responding to
the start of a document or element, the client can choose
not to call GetStart***().
On the other hand, the client may choose to call the
Get***Start() method and free the returned object
independently of the object returned by the corresponding
Get***() method.
Attached to this Bug is an example client use of the proposed
solution. The example illustrates how a Jabber client might
deal with a streaming XML document.
Logged In: YES
user_id=265818
You know, a much less convoluted solution would be to use
wxWindows events to notify the client that XML content had
been read on input.
The event handler macros can allow the client to specify to
what element depth events should be generated at.
Duh.
So, like with the incremental building description, we have the
following method:
void Build(wxInputStream& is);
However, we also have:
void SetEventHandler(wxEvtHandler& handler, int id)
and
void SetElementDepth(int depth);
Then, we content has been parsed, several event types may
be sent:
1) Start document
2) Start element
3) End element
4) End document
Start and End element events will only be fired at the depth
that is set by invoking SetElementDepth().