[libxml++] libxml++ future

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

Here are a few features or little technical points I'd like to see in libxm=
l++=20
one day. Some could be included in the 1.0 version, while others will=20
certainly wait for a 1.2.

I would appreciate you to comment them : do you think it worth it having su=
ch=20
feature, and do you think it is a good technical choice.

I may have forgot some things important for you, don't hesitate to suggest=
=20
them.

=2D From you observations I will make a first RoadMap.

***************************************************************************=
****
1 - postfix private members intead of prefixing them with an underscore

target version : 1.0

The ISO c++ standart reserve names with a leading underscore to the=20
implemention. One shouldn't use some.
Although there is no risk of real problem with that, I think it would be=20
cleaner.

***************************************************************************=
****
2 - wrap xmlIO.

target version : 1.0

xmlIO interface allow the creation of our own Input/Output Buffers. Wrap th=
em=20
is an elegant and efficient way to reduce some useless potentialy big strin=
gs=20
copy.

Think about how to send a document to a stream. Currently we have to do :

std::ostream & output =3D std::cout; // could be any ostream of course
std::string tmp =3D document.write_to_string();
output << tmp;

In the above code, the entire document is written to a buffer by libxml, th=
en=20
copied to a std::string by libxml++ which is finally returned by=20
write_to_string(). Even the a COW implementation of std::string, we'll need=
=20
twice more memory than the size of the document. With a non-COW=20
implementation it is even worse : it may be copied 3 or 4 time.

I wrote a small wrapper to xmlOutputBuffer and implemented a=20
Document::write_to_stream() function. The precedent code become :

std::ostream & output =3D std::cout; // std::cout is still an example of co=
urse
document.write_to_stream(output);

The advantage is much more than just writing 1 line instead of 2. The entir=
e=20
document is never in memory. libxml write to buffer by small pieces which a=
re=20
immediatly sent to the stream by the wrapper. A patch demonstrating this is=
=20
on the patch manager if you want to experiment it. The wrapper allow the us=
er=20
to very easily define it's own OutputBuffer. I modified dom_build example t=
o=20
test it, and it works pretty well.

Another possible thing is to wrap xmlInputBuffer. Although we can (and did)=
=20
implement parse_stream without it, it would permit to implement=20
xmlTextReader.getRemainder() in an elegant way (cf. 3).

***************************************************************************=
****
3 - wrap xmlTextReader

target version : 1.0 ?

=46irst some reference if you want to know better what I'm speaking about :
* libxml2 xmlTextReader implementation :
http://xmlsoft.org/xmlreader.html
* C# xmlTextReader interface :
http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html

I know this interface is not part the XML specification, which is one argum=
ent=20
not to implement it.
However I think is worth it : It will answer some needs on which SAX or DOM=
=20
are not satisfying for many people, and I bet some new users may get=20
interested into libxml++ if we implement such a thing.

I think we can give it an API very close to the C# one, thanks to the xmlIO=
=20
wrappers.

***************************************************************************=
****
4 - wrap xmlTextWriter

target version : it's too early to know

This interface if far less advanced than xmlTextReader. I don't think it's=
=20
time to think seriously about it but it's a logical step after xmlTextReade=
r.=20
An idea to keep for the future ?

***************************************************************************=
****
5 - use a string type which handle UTF-8

target version : 1.2

This point has been discussed in the past. I will just sum-up the state of =
the=20
discussions at this time.
The main debate was : do we impose a precise class or do we transform libxm=
l++=20
to a templated library to let the user which class he wants.
This debate ended with a vote pro/against templates with a quite balanced=20
result.

We however have an alternative way : explicit instanciation. This would=20
consist of implementing the lib with templates, but not including=20
implementations in header.
Instead, we would explicitely instanciate the template classes into the=20
dynamic lib with a chosen string type (very probably Glib::ustring). Progra=
ms=20
using this default string type wouldn't need to be recompiled at each minor=
=20
release, which is the main argument against templates.
At the same time, users who want to use another string type (QString for=20
exemple, or even std::string of char *), could still do it, at the price of=
=20
recompiling their application at each release of libxml++, even if the API=
=20
doesn't change.

=2D - Is this solution acceptable for you ?
=2D - Is there any issue about LGPL with template libraries ?

***************************************************************************=
****
6 - Implement node iterators

target version : ?

This point was also discussed earlier. We couldn't make any decision on a=20
clean API.
Since xmlNode has some internal pointers to the other nodes of the tree (ne=
xt,=20
prev, children, parent), we could easily implement iterators allowing to wa=
lk=20
in the tree in different ways :

=2D - children_iterator: explore all the children of a node.
=2D - depth_first_traversal_iterator: allow to explore all node with a dept=
h first=20
algorithm, starting from a node, ending when all the subtree has been=20
explored.
=2D - breadth_first_traversal_iterator: idem but breadth first.

These iterators could be bidirectionnal. The question is how to define the=
=20
end() element.
Each of them would have a const version.
I'll try to make something more complete than last time about this. Any ide=
a=20
is welcome.=09

***************************************************************************=
****
7 - make a better XPath support

target version : ?

I'm not very familiar with XPath. I don't know if the current support we ha=
ve=20
is enough for common uses. Any feedback on this would be appreciated.

***************************************************************************=
****
The end.

If you reached this point, thank you for reading :-)

I'm waiting forward for comments/ideas,

Best regards,

Christophe
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/c29lB+sU3TyOQjARAiNkAJ4nk/xRLfksbrQ7MVxQoYHQ2nQRsQCdEwpO
ssdIu42Eu/5e0iqj2nSnWYg=3D
=3DLCLJ
=2D----END PGP SIGNATURE-----