Re: [libxml++] UTF8 support

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ole Laursen wrote:

> But I really don't see the big problem here. std::string is really
> just a fancy way of saying 'char *', right?

No.

> And any decent
> Unicode-aware string library will have a convenient conversion from
> std::string, right?

No.

> So if you need to process the individual characters (in my experience
> with gtkmm/glibmm, this is seldomly needed) you can simply treat the
> input/output from the library as raw data which you feed to your
> string library. Why is this a problem?

I don't fully get your point. Are you advocating libxml++ continuing
to use std::string ? That's really a bad idea IMO:

'char *' is, beside being used for strings in C, a data type used for
generic memory, i.e. there are no semantics associated with it (such
as 'null terminated string').

std::string represents *text*, and as such, it has a lot more meaning.
You can iterate over the elements, expecting to get at individual
characters. Just to name an example.

While it may be true that you can (technically) use std::string to
contain utf8 data, the std::string *interface* would be completely
inappropriate (beside the 'data()' and 'length()' methods :-)

Please don't abuse std::string in such a horrible way.

But to go along the line you seem to suggest: libxml++ may use a
'data container' that is agnostic of the encoding or any related
interpretation of the content. That may actually not even be such
a bad idea, since it could just be a smart pointer taking over
the memory from libxml2, freeing the data in its destructor using
xmlFree().
That would make it possible to abstract the unicode library away
as my suggestion, and would replace my suggested compile-time 
polymorphism by runtime-polymorphism (assuming appropriate
conversion functions doing the 'convert from/to libxml2' work).
It wouldn't incure much performance penalty, as there is no
additional copying involved.

Regards,
		Stefan