From: Christophe de V. <cde...@al...> - 2003-09-25 22:42:47
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, Here are a few features or little technical points I'd like to see in libxm= l++=20 one day. Some could be included in the 1.0 version, while others will=20 certainly wait for a 1.2. I would appreciate you to comment them : do you think it worth it having su= ch=20 feature, and do you think it is a good technical choice. I may have forgot some things important for you, don't hesitate to suggest= =20 them. =2D From you observations I will make a first RoadMap. ***************************************************************************= **** 1 - postfix private members intead of prefixing them with an underscore target version : 1.0 The ISO c++ standart reserve names with a leading underscore to the=20 implemention. One shouldn't use some. Although there is no risk of real problem with that, I think it would be=20 cleaner. ***************************************************************************= **** 2 - wrap xmlIO. target version : 1.0 xmlIO interface allow the creation of our own Input/Output Buffers. Wrap th= em=20 is an elegant and efficient way to reduce some useless potentialy big strin= gs=20 copy. Think about how to send a document to a stream. Currently we have to do : std::ostream & output =3D std::cout; // could be any ostream of course std::string tmp =3D document.write_to_string(); output << tmp; In the above code, the entire document is written to a buffer by libxml, th= en=20 copied to a std::string by libxml++ which is finally returned by=20 write_to_string(). Even the a COW implementation of std::string, we'll need= =20 twice more memory than the size of the document. With a non-COW=20 implementation it is even worse : it may be copied 3 or 4 time. I wrote a small wrapper to xmlOutputBuffer and implemented a=20 Document::write_to_stream() function. The precedent code become : std::ostream & output =3D std::cout; // std::cout is still an example of co= urse document.write_to_stream(output); The advantage is much more than just writing 1 line instead of 2. The entir= e=20 document is never in memory. libxml write to buffer by small pieces which a= re=20 immediatly sent to the stream by the wrapper. A patch demonstrating this is= =20 on the patch manager if you want to experiment it. The wrapper allow the us= er=20 to very easily define it's own OutputBuffer. I modified dom_build example t= o=20 test it, and it works pretty well. Another possible thing is to wrap xmlInputBuffer. Although we can (and did)= =20 implement parse_stream without it, it would permit to implement=20 xmlTextReader.getRemainder() in an elegant way (cf. 3). ***************************************************************************= **** 3 - wrap xmlTextReader target version : 1.0 ? =46irst some reference if you want to know better what I'm speaking about : * libxml2 xmlTextReader implementation : http://xmlsoft.org/xmlreader.html * C# xmlTextReader interface : http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html I know this interface is not part the XML specification, which is one argum= ent=20 not to implement it. However I think is worth it : It will answer some needs on which SAX or DOM= =20 are not satisfying for many people, and I bet some new users may get=20 interested into libxml++ if we implement such a thing. I think we can give it an API very close to the C# one, thanks to the xmlIO= =20 wrappers. ***************************************************************************= **** 4 - wrap xmlTextWriter target version : it's too early to know This interface if far less advanced than xmlTextReader. I don't think it's= =20 time to think seriously about it but it's a logical step after xmlTextReade= r.=20 An idea to keep for the future ? ***************************************************************************= **** 5 - use a string type which handle UTF-8 target version : 1.2 This point has been discussed in the past. I will just sum-up the state of = the=20 discussions at this time. The main debate was : do we impose a precise class or do we transform libxm= l++=20 to a templated library to let the user which class he wants. This debate ended with a vote pro/against templates with a quite balanced=20 result. We however have an alternative way : explicit instanciation. This would=20 consist of implementing the lib with templates, but not including=20 implementations in header. Instead, we would explicitely instanciate the template classes into the=20 dynamic lib with a chosen string type (very probably Glib::ustring). Progra= ms=20 using this default string type wouldn't need to be recompiled at each minor= =20 release, which is the main argument against templates. At the same time, users who want to use another string type (QString for=20 exemple, or even std::string of char *), could still do it, at the price of= =20 recompiling their application at each release of libxml++, even if the API= =20 doesn't change. =2D - Is this solution acceptable for you ? =2D - Is there any issue about LGPL with template libraries ? ***************************************************************************= **** 6 - Implement node iterators target version : ? This point was also discussed earlier. We couldn't make any decision on a=20 clean API. Since xmlNode has some internal pointers to the other nodes of the tree (ne= xt,=20 prev, children, parent), we could easily implement iterators allowing to wa= lk=20 in the tree in different ways : =2D - children_iterator: explore all the children of a node. =2D - depth_first_traversal_iterator: allow to explore all node with a dept= h first=20 algorithm, starting from a node, ending when all the subtree has been=20 explored. =2D - breadth_first_traversal_iterator: idem but breadth first. These iterators could be bidirectionnal. The question is how to define the= =20 end() element. Each of them would have a const version. I'll try to make something more complete than last time about this. Any ide= a=20 is welcome.=09 ***************************************************************************= **** 7 - make a better XPath support target version : ? I'm not very familiar with XPath. I don't know if the current support we ha= ve=20 is enough for common uses. Any feedback on this would be appreciated. ***************************************************************************= **** The end. If you reached this point, thank you for reading :-) I'm waiting forward for comments/ideas, Best regards, Christophe =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQE/c29lB+sU3TyOQjARAiNkAJ4nk/xRLfksbrQ7MVxQoYHQ2nQRsQCdEwpO ssdIu42Eu/5e0iqj2nSnWYg=3D =3DLCLJ =2D----END PGP SIGNATURE----- |