From: Christophe de V. <cde...@al...> - 2003-09-25 22:42:47
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, Here are a few features or little technical points I'd like to see in libxm= l++=20 one day. Some could be included in the 1.0 version, while others will=20 certainly wait for a 1.2. I would appreciate you to comment them : do you think it worth it having su= ch=20 feature, and do you think it is a good technical choice. I may have forgot some things important for you, don't hesitate to suggest= =20 them. =2D From you observations I will make a first RoadMap. ***************************************************************************= **** 1 - postfix private members intead of prefixing them with an underscore target version : 1.0 The ISO c++ standart reserve names with a leading underscore to the=20 implemention. One shouldn't use some. Although there is no risk of real problem with that, I think it would be=20 cleaner. ***************************************************************************= **** 2 - wrap xmlIO. target version : 1.0 xmlIO interface allow the creation of our own Input/Output Buffers. Wrap th= em=20 is an elegant and efficient way to reduce some useless potentialy big strin= gs=20 copy. Think about how to send a document to a stream. Currently we have to do : std::ostream & output =3D std::cout; // could be any ostream of course std::string tmp =3D document.write_to_string(); output << tmp; In the above code, the entire document is written to a buffer by libxml, th= en=20 copied to a std::string by libxml++ which is finally returned by=20 write_to_string(). Even the a COW implementation of std::string, we'll need= =20 twice more memory than the size of the document. With a non-COW=20 implementation it is even worse : it may be copied 3 or 4 time. I wrote a small wrapper to xmlOutputBuffer and implemented a=20 Document::write_to_stream() function. The precedent code become : std::ostream & output =3D std::cout; // std::cout is still an example of co= urse document.write_to_stream(output); The advantage is much more than just writing 1 line instead of 2. The entir= e=20 document is never in memory. libxml write to buffer by small pieces which a= re=20 immediatly sent to the stream by the wrapper. A patch demonstrating this is= =20 on the patch manager if you want to experiment it. The wrapper allow the us= er=20 to very easily define it's own OutputBuffer. I modified dom_build example t= o=20 test it, and it works pretty well. Another possible thing is to wrap xmlInputBuffer. Although we can (and did)= =20 implement parse_stream without it, it would permit to implement=20 xmlTextReader.getRemainder() in an elegant way (cf. 3). ***************************************************************************= **** 3 - wrap xmlTextReader target version : 1.0 ? =46irst some reference if you want to know better what I'm speaking about : * libxml2 xmlTextReader implementation : http://xmlsoft.org/xmlreader.html * C# xmlTextReader interface : http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html I know this interface is not part the XML specification, which is one argum= ent=20 not to implement it. However I think is worth it : It will answer some needs on which SAX or DOM= =20 are not satisfying for many people, and I bet some new users may get=20 interested into libxml++ if we implement such a thing. I think we can give it an API very close to the C# one, thanks to the xmlIO= =20 wrappers. ***************************************************************************= **** 4 - wrap xmlTextWriter target version : it's too early to know This interface if far less advanced than xmlTextReader. I don't think it's= =20 time to think seriously about it but it's a logical step after xmlTextReade= r.=20 An idea to keep for the future ? ***************************************************************************= **** 5 - use a string type which handle UTF-8 target version : 1.2 This point has been discussed in the past. I will just sum-up the state of = the=20 discussions at this time. The main debate was : do we impose a precise class or do we transform libxm= l++=20 to a templated library to let the user which class he wants. This debate ended with a vote pro/against templates with a quite balanced=20 result. We however have an alternative way : explicit instanciation. This would=20 consist of implementing the lib with templates, but not including=20 implementations in header. Instead, we would explicitely instanciate the template classes into the=20 dynamic lib with a chosen string type (very probably Glib::ustring). Progra= ms=20 using this default string type wouldn't need to be recompiled at each minor= =20 release, which is the main argument against templates. At the same time, users who want to use another string type (QString for=20 exemple, or even std::string of char *), could still do it, at the price of= =20 recompiling their application at each release of libxml++, even if the API= =20 doesn't change. =2D - Is this solution acceptable for you ? =2D - Is there any issue about LGPL with template libraries ? ***************************************************************************= **** 6 - Implement node iterators target version : ? This point was also discussed earlier. We couldn't make any decision on a=20 clean API. Since xmlNode has some internal pointers to the other nodes of the tree (ne= xt,=20 prev, children, parent), we could easily implement iterators allowing to wa= lk=20 in the tree in different ways : =2D - children_iterator: explore all the children of a node. =2D - depth_first_traversal_iterator: allow to explore all node with a dept= h first=20 algorithm, starting from a node, ending when all the subtree has been=20 explored. =2D - breadth_first_traversal_iterator: idem but breadth first. These iterators could be bidirectionnal. The question is how to define the= =20 end() element. Each of them would have a const version. I'll try to make something more complete than last time about this. Any ide= a=20 is welcome.=09 ***************************************************************************= **** 7 - make a better XPath support target version : ? I'm not very familiar with XPath. I don't know if the current support we ha= ve=20 is enough for common uses. Any feedback on this would be appreciated. ***************************************************************************= **** The end. If you reached this point, thank you for reading :-) I'm waiting forward for comments/ideas, Best regards, Christophe =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQE/c29lB+sU3TyOQjARAiNkAJ4nk/xRLfksbrQ7MVxQoYHQ2nQRsQCdEwpO ssdIu42Eu/5e0iqj2nSnWYg=3D =3DLCLJ =2D----END PGP SIGNATURE----- |
From: Matt E. <ma...@au...> - 2003-09-26 00:10:56
|
Looks great! A couple ideas I would be will to submit patches for a) Support of std::stringw in addition to std::string b) Overloaded element functions which do conversions from primitives to strings for the caller for example: int n; elem->add_content(n); If people don't like b maybe an overloaded class which supports the function overloads.. BTW I am able to successfully able to use libxml++ on Linux, MacOSX, and win32 nice work! On Thursday, September 25, 2003, at 03:42 PM, Christophe de Vienne wrote: > Instead, we would explicitely instanciate the template classes into the > dynamic lib with a chosen string type (very probably Glib::ustring). > Programs > using this default string type wouldn't need to be recompiled at each > minor > release, which is the main argument against templates. > At the same time, users who want to use another string type (QString > for > exemple, or even std::string of char *), could still do it, at the > price of > recompiling their application at each release of libxml++, even if the > API > doesn't change. > > - - Is this solution acceptable for you ? > - - Is there any issue about LGPL with template libraries ? |
From: Dan D. <da...@de...> - 2003-09-26 15:11:29
|
On Thu, 2003-09-25 at 18:42, Christophe de Vienne wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > Here are a few features or little technical points I'd like to see in libxml++ > one day. Some could be included in the 1.0 version, while others will > certainly wait for a 1.2. > > I would appreciate you to comment them : do you think it worth it having such > feature, and do you think it is a good technical choice. > > I may have forgot some things important for you, don't hesitate to suggest > them. > > - From you observations I will make a first RoadMap. > > ******************************************************************************* > 1 - postfix private members intead of prefixing them with an underscore > > target version : 1.0 > > The ISO c++ standart reserve names with a leading underscore to the > implemention. One shouldn't use some. > Although there is no risk of real problem with that, I think it would be > cleaner. > > ******************************************************************************* > 2 - wrap xmlIO. > > target version : 1.0 > > xmlIO interface allow the creation of our own Input/Output Buffers. Wrap them > is an elegant and efficient way to reduce some useless potentialy big strings > copy. > > Think about how to send a document to a stream. Currently we have to do : > > std::ostream & output = std::cout; // could be any ostream of course > std::string tmp = document.write_to_string(); > output << tmp; > > In the above code, the entire document is written to a buffer by libxml, then > copied to a std::string by libxml++ which is finally returned by > write_to_string(). Even the a COW implementation of std::string, we'll need > twice more memory than the size of the document. With a non-COW > implementation it is even worse : it may be copied 3 or 4 time. > > I wrote a small wrapper to xmlOutputBuffer and implemented a > Document::write_to_stream() function. The precedent code become : > > std::ostream & output = std::cout; // std::cout is still an example of course > document.write_to_stream(output); > > The advantage is much more than just writing 1 line instead of 2. The entire > document is never in memory. libxml write to buffer by small pieces which are > immediatly sent to the stream by the wrapper. A patch demonstrating this is > on the patch manager if you want to experiment it. The wrapper allow the user > to very easily define it's own OutputBuffer. I modified dom_build example to > test it, and it works pretty well. > > Another possible thing is to wrap xmlInputBuffer. Although we can (and did) > implement parse_stream without it, it would permit to implement > xmlTextReader.getRemainder() in an elegant way (cf. 3). > I think this is good, but I am also very interested in SAX serialization, perhaps based upon one of node iterators below. I think you can count on that conribution coming from me. ----- I am also personally interested in entity-support in the near term. |
From: Christophe de V. <cde...@al...> - 2003-09-26 15:31:45
|
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le Vendredi 26 Septembre 2003 17:11, Dan Dennedy a =E9crit : > On Thu, 2003-09-25 at 18:42, Christophe de Vienne wrote: > > > Another possible thing is to wrap xmlInputBuffer. Although we can (and > > did) implement parse_stream without it, it would permit to implement > > xmlTextReader.getRemainder() in an elegant way (cf. 3). > > I think this is good, but I am also very interested in SAX > serialization, perhaps based upon one of node iterators below. I think > you can count on that conribution coming from me. > I'm not familiar with this. Do you have any reference on it ? Thanks. > > I am also personally interested in entity-support in the near term. > Well, this is not so evident thing to handle (cf. messages of DV on xml=20 mailing-list about SAX and entities). What do you exactly need ? And do you have suggestions on how to do it ? Cheers, Christophe =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/dFvQB+sU3TyOQjARAlEfAKDKMHJXzzKHcN7XYz6LVnvcUKb7dQCgyz4G VixyhIa7wmMwjnSs4USiSIk=3D =3DJxMA =2D----END PGP SIGNATURE----- |
From: Dan D. <da...@de...> - 2003-09-29 19:41:26
|
On Fri, 2003-09-26 at 11:31, Christophe de VIENNE wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Le Vendredi 26 Septembre 2003 17:11, Dan Dennedy a écrit : > > On Thu, 2003-09-25 at 18:42, Christophe de Vienne wrote: > > > > > Another possible thing is to wrap xmlInputBuffer. Although we can (and > > > did) implement parse_stream without it, it would permit to implement > > > xmlTextReader.getRemainder() in an elegant way (cf. 3). > > > > I think this is good, but I am also very interested in SAX > > serialization, perhaps based upon one of node iterators below. I think > > you can count on that conribution coming from me. > > > > I'm not familiar with this. Do you have any reference on it ? > Thanks. see the Component Pipelines section of http://www.xml.com/lpt/a/2002/02/13/cocoon2.html I think this is similar to what xmlsoft ML discussion and contributed patch calls "inverted SAX." I don't know if it has been accepted into libxml2. It is a simple idea tho--traverse tree while synthesizing SAX events. > > > > I am also personally interested in entity-support in the near term. > > > > Well, this is not so evident thing to handle (cf. messages of DV on xml > mailing-list about SAX and entities). hmm.. lots of reading to do here. > What do you exactly need ? > And do you have suggestions on how to do it ? see response to murrayc |