Re: [Cppunit-devel] Unicode Support (was: 1.6.0 is released!)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

----- Original Message -----
From: "Michael Arnoldus" <ch...@mu...>
To: "Cpp Unit Develpment Mailing List" <cpp...@li...>
Sent: Monday, September 24, 2001 3:00 PM
Subject: Re: [Cppunit-devel] Unicode Support (was: 1.6.0 is released!)

>Let me see if I understand you correctly. When not changing anything in
CppUnit, you are degrading the strings in CppUnit to be simple containers
which contains "something the TestRunner know what to do with" - and
currently it just happens to be ASCII strings.

Actually, I was rather suggesting MBCS only as a work around having only
std::string (which should be able to contain MBCS string, right?). My point
was rather that unless you are doing application which essence is to
manipulate Unicode object (dictionnary, translator,...), you can probably
get away with restricting your Unicode string to latin1 for ASSERT_EQUAL and
ASSERT_MESSAGE.

>This might work, but I think it should be reflected in the design then -
f.ex. by letting the TestRunner pass the strings as a pointer to a subclass
of a specific class (or maybe a template parameter class).

Yes, but let's not consider it for now. That solution strike me as wrong
(who know what they would put in a string ;-) ).

>I personnaly do not like a design where a string is something we use to
stuff  something else into and then decode it again - this way doom lies!

Don't know much about MBCS, just that is a way to encode character set
depending on your page (meaning you need to know your page to decode).

>You are right __FILE__ are not unicode.

>Can you use unicode strings in the assert macros the way you have done it?

No. I convert the Unicode string to latin1 in my implementation of
toString(), which is not really a loss since there are just object dump and
I control their content.

>Yes, unicode use a different runtime library.

Managing all those configuration will become a source of headache:
2 DLL config.
2 static config (not yet, but requested)
*2 for unicode
*2 : cppunit and testrunner
=> if cppunit remains the same for unicode and ansi string, there twice
less...

>A while ago you asked me about Unicode programming on unix. I have found an
article about unicode on linux:
http://www-106.ibm.com/developerworks/linux/library/l-linuni.html

I'll try to give it a look. I'm using Qt (http://www.trolltech.com) for now
which implements its one Unicode support (lots better than MFC if I might
say).

>I now the first Unicode port for windows I did, was not a very good job.
I'll be willing to it right, but I need somebody who understands CppUnit to
talk to about the design.

    Nor was it a bad job. The solution could be acceptable (the definition
of the generic string/stream would need to be centralized, but that's it).

    What bother me, is that it have a global impact on CppUnit, and having
string that change that way is a source of headache. What the user of
CppUnit should do ? Use CppUnit::String or CString, or Tools::String
(typedef to std::string or std::wstring like in CppUnit)....

    Note from that point, I'm just throwing ideas around, hoping to discover
some interesting stuffs...

    I think it should be possible to design something a lots cleaner. Let's
put the fact down:

    1) User define strings that may be Unicode are:
    - result of assertion_traits::toString(), used by ASSERT_EQUAL
    - parameter of ASSERT_MESSAGE

    2) Those strings are just a way to convey additional information with
the test failure.

    Let's ignore the current implementation of ASSERTxxx for now. Let's
imagine that we have:

ASSERT_UNICODE_MESSAGE( unicodeMessage, condition );

    How to we get the unicode string to the TestRunner. The obvious answer
is using Exception, the class used to report the failure detail to the
TestRunner. Once we got that part down, then the remaining elements could be
tackle down.

    So my take would be that the first step toward adding unicode support to
CppUnit would be to add Unicode support to Exception. I believe Exception
should still provide a std::string interface to the failure message (all
TestRunner are not written to support unicode), but should also provide a
std::wstring interface.

    We would have:

    std::string what() const;
    std::wstring unicodeWhat() const

    Constructor would need to be changed to accept std::string or
std::wstring for the message. That would also impact Exception subclass:
NotEqualException take two strings at construction, so two constructors
should also be provided.
(PS:  can we do std::wstring( L"\u306b\u307b" ) ? how to you go inputing
hardcoded unicode string?)

    My guess would be that Exception would store everything as std::wstring
(the wider format). We should also have utility function to from and to
unicode (unicode to ansi could be a dummy convertion: if character code is
not in range 0-255 then replace with '?').

    For user, everything is transparent. TestRunners could use either the
ansi or the unicode version to retrieve information.

    And while I'm at it, I can think of a way to deal with ansi/const char
*conversion:

template<typename StringType>
struct convert_to_wstring
{
  std::wstring toWString( StringType str )
  {
    return str;
  }
};

template<typename StringType>
std::wstring convertToWString( StringType str )
{
    return convert_to_wstring<StringType>( str );
}

And you specialize for const char*, std::string, and possibility other user
define string... That template function could be use anywhere to uniformize
string to wstring (for exemple implementation of assertion_traits that
returns a std::string instead of std::wstring).

The only dark point would be:

      static std::string toString( const T& x )
      {
          OStringStream ost;
          ost << x;
          return ost.str();
      }

    That use an ansi stream. What would be the impact of changing that to a
wide stream ?

    What do you think ?

    Baptiste.
---
Baptiste Lepilleur <gai...@fr...>   http://gaiacrtn.free.fr/index.html
Author of The Text Reformatter, a tool for fanfiction readers and writers.
Language: English, French (Well, I'm French).