Thread: [Libxmlplusplus-general] parametrizing libxml++ for the character /string type

Brought to you by: amigadave, ari_j, cdevienne, murrayc

libxmlplusplus-general

[Libxmlplusplus-general] parametrizing libxml++ for the character /string type

From: Stefan S. <se...@sy...> - 2003-01-27 19:55:09

hi there,

libxml2 uses some internal utf8 types to represent characters.
libxml++ currently uses std::string, which only works for characters
in the ASCII subset of utf8. It was suggested to use glibmm::ustring
instead, but I'd like to propose a different solution:

What if the xmlpp::Node class (as an example) is split into two
parts: one that is character type agnostic, i.e. uses libxml2's
internal type, and one that does the conversion to C++ types (for
example glibmm::ustring) ?

Here is how this could look like:

class _TextNode
{
public:
   void set_content(const xmlChar *content)
   {
      xmlNodeSetContent(_impl, content);
   }

   /* ... */
private:
   xmlNode *_impl;
};

template <typename string_type, typename string_traits>
class TextNode : private _TextNode
{
public:
   void set_content(const string_type &content)
   {
      _TextNode::set_content(string_traits::to_utf8(content));
   }
   /* ... */
};

So, the real libxml2 wrapper class for a text node is _TextNode.
It's this class that does all the real work. TextNode then provides
a thin Adapter to that (i.e. it uses _TextNode as implementation)
providing a type-safe interface, and by means of the 'string_traits'
providing a mapping to arbitrary user-provided unicode implementations.

You may just do a

typedef TextNode<glibmm::ustring, your_ustring_adaptor> YourTextNode;

to hide the templating, while others can use a different unicode
library.

Hope this makes some sense to you.

Stefan

Re: [Libxmlplusplus-general] parametrizing libxml++ for the character /string type

From: Christophe de V. <cde...@al...> - 2003-01-27 20:37:37

Le Lundi 27 Janvier 2003 20:58, Stefan Seefeld a =E9crit :
> hi there,
>
> libxml2 uses some internal utf8 types to represent characters.
> libxml++ currently uses std::string, which only works for characters
> in the ASCII subset of utf8. It was suggested to use glibmm::ustring
> instead, but I'd like to propose a different solution:
>
<big snip>
>
> Hope this makes some sense to you.
>

A lot of. But the change you suggest in the way _TextNode would be implemen=
ted=20
is big : at this time, the libxml2 types (xmlNode in this case) are used on=
ly=20
are read/write time, not to store the datas while manipulating nodes.

But I like much the idea so I see two options :
=2D doing exactly the way you did, but this means rethink completely the wa=
y=20
Node is implemented
=2D Keep the idea of a templated class, but with no parent class, and=20
string_type as the content type.

template <typename string_type, typename string_traits>
class TextNode
{
public:
   void set_content(const string_type &content)
   {
      _TextNode::set_content(string_traits::to_utf8(content));
   }

   void write(xmlDocPtr doc, xmlNodePtr parent) const;
private:
   string_type content;
};

Then the write() member function would use the string adaptor to produce th=
e=20
libxml2 node.

void TextNode::write(xmlDocPtr doc, xmlNodePtr parent) const
{
  xmlNodePtr node =3D xmlNewText( string_traits::to_utf8(_content) );
/* ... */
}

The read method would do the exact oposite.
This way we wouldn't have to modify too much the current implementation,=20
unless we decide that it's better to rely on libxml2 types to store datas.


Thanks for your suggestion,

Christophe

Re: [Libxmlplusplus-general] parametrizing libxml++ for the character /string type

From: Stefan S. <se...@sy...> - 2003-01-27 20:44:03

Christophe de Vienne wrote:

>>Hope this makes some sense to you.
>>
> 
> 
> A lot of. But the change you suggest in the way _TextNode would be implemented 
> is big : at this time, the libxml2 types (xmlNode in this case) are used only 
> are read/write time, not to store the datas while manipulating nodes.

exactly, which is (part of) why I suggest in the other thread to use
xmlNode as the implementation ubiquitously.

> But I like much the idea so I see two options :
> - doing exactly the way you did, but this means rethink completely the way 
> Node is implemented
> - Keep the idea of a templated class, but with no parent class, and 
> string_type as the content type.

yes, both would work for the actual issue. But I'd suggest to use
xmlNode for other issues, tue (notably to really delegate whatever we
can down to libxml2, such as xpath lookups).

[snip]

> This way we wouldn't have to modify too much the current implementation, 
> unless we decide that it's better to rely on libxml2 types to store datas.

yes, understood. Well, I'll play a bit with an implementation as
suggested, and then send in more suggestions. Based
on that we can then discuss whether and how to do the migration.

Sounds good ?

Stefan

Re: [Libxmlplusplus-general] parametrizing libxml++ for the character /string type

From: Christophe de V. <cde...@al...> - 2003-01-28 09:57:02

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le Lundi 27 Janvier 2003 21:47, Stefan Seefeld a =E9crit :
> Christophe de Vienne wrote:
> >>Hope this makes some sense to you.
> >
> > A lot of. But the change you suggest in the way _TextNode would be
> > implemented is big : at this time, the libxml2 types (xmlNode in this
> > case) are used only are read/write time, not to store the datas while
> > manipulating nodes.
>
> exactly, which is (part of) why I suggest in the other thread to use
> xmlNode as the implementation ubiquitously.

I hadn't read this thread then. Now I see better what you meant.

> [...]
> yes, understood. Well, I'll play a bit with an implementation as
> suggested, and then send in more suggestions. Based
> on that we can then discuss whether and how to do the migration.
>
> Sounds good ?
>

Yes. In fact, after reading the other thread, I agree that using xmlNode=20
(libxml) the way you suggest if far better than what's currently done.
One other positive aspect will be that if a user want's to use some libxml2=
=20
method we did not wrap, it will be easily doable.

I'm currently not having the time to do this, but your patches are very=20
welcome. However I won't put them in the current branch but in the unstable=
=20
one (which does not exists yet but as soon as it is needed, it will) as=20
Murray said.

Best regards,

Christophe

=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAj42U8YACgkQB+sU3TyOQjB3WQCfVfoctputavY0ic7Q7x92hV3x
V1EAn3Ssatl117UJWc2jRlB532pY/eJ2
=3D+Sh3
=2D----END PGP SIGNATURE-----

Re: [Libxmlplusplus-general] parametrizing libxml++ for the character /string type

From: Stefan S. <se...@sy...> - 2003-01-29 03:54:32

Attachments: changes

Christophe de VIENNE wrote:

> I'm currently not having the time to do this, but your patches are very 
> welcome. However I won't put them in the current branch but in the unstable 
> one (which does not exists yet but as soon as it is needed, it will) as 
> Murray said.

ok, here is the first patch. The goal was to change the implementation to
delegate whatever we can down to libxml2, while respecting the existing API.

So, everything compiles, and the examples run unchanged.

There are, however, a couple of issues which need to be addressed. I hope we can
sort them out together. First a little account on what this change does:

All libxml2 structures use '_private' members for application data. I use that
to point to the corresponding libxml++ wrapper class, so we can do a reverse
lookup. For example, to access the first child node of a xmlpp::Node object,
you'd do something like:

reinterpret_cast<Node *>(this->_impl->children->_private)

Easy enough, isn't it ?

The tricky point is, as said earlier, ownership management. libxml2's nodes
are owned by their parent nodes (and ultimately by the enclosing document),
not by the libxml++ wrapper object. We need to work out how transfer of ownership
should happen when a node is unlinked from its document / parent node.

Another tricky point is that libxml2 will automagically merge nodes occasionally,
for example if you insert a new text node right after an existing text node.
Thus,

Node *Node::add_child(const std::string &)

may or may not return a new object. It is, however, (and luckily,) owned by the
parent node, so the caller doesn't have to care. A similar argument is to be made
for setting attributes.

All this said, I believe there are now a couple of ways to enhance the API itself:

* I'd like to add iterators to make child node and attribute traversal more efficient
   (right now they are copied into a temporary container that is returned)

* I'd like to suggest to add a 'Visitor' interface for simpler traversal of a document,
   notably to externalize it (the 'write' method would look a *lot* simpler)

* the domparser should be refactored into a 'Document' and possibly a single
   'Document *parse_document(std::istream &)' factory function.

* add new node types such as 'processing instruction', 'cdata section', etc.

* new functionality can be added (notably the xpath lookup stuff I have been suggesting)

* do the split into character type agnostic/specific parts, and hook up external
   unicode libraries

Anyways, I guess that's enough for tonight :-)
Let me know what you think of this plan...

Enjoy,
		Stefan