While OWL only supported narrow text encoded according to Windows code pages (also known as "ANSI" code pages) , OWLNext also supports wide text encoded as Unicode UTF-16, as implemented by Windows 2000/XP and later. This page describes how you can enable your application to support Unicode, either using the narrow UTF-8 code page in ANSI build mode, or the traditional wide UTF-16 character set in the UNICODE build mode.
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems. Windows 2000/XP and later use Unicode, encoded as UTF-16, throughout the operating system.
For text-related functionality, Windows provides two variants of the API, using the suffix 'A' for functions and types dealing with traditional narrow (8-bit) Windows code page text (somewhat incorrectly referred to as "ANSI" text within Windows, hence the use of the letter 'A'), and the suffix 'W' for functions and types dealing with wide (16-bit) Unicode UTF-16 text ('W' for "wide"). An agnostic variant of the API, without these suffixes, is provided using macros. Each macro expands to the correct function or type according to the build mode, controlled by the UNICODE preprocessor symbol. For example:
The agnostic version is implemented as a macro that expands to SetWindowTextW if UNICODE is defined, otherwise it expands to SetWindowTextA. Likewise, LPCTSTR expands to LPCWSTR if UNICODE is defined, otherwise LPCSTR.
Note that functions dealing with traditional narrow Windows code page text will translate the text to Unicode UTF-16 behind the scene and then forward the call to the Unicode counterpart of the function. It is hence no more efficient, in fact less so, to use the narrow ANSI variant of the API.
Wide string and character literals, i.e. literals prefixed by 'L', are encoded as UTF-16 (after translation from the encoding used by the source file) by C/C++ compilers targeting the Windows platform. The TEXT macro, and its shorter synonym _T, add the 'L' prefix to the literal if UNICODE is defined, otherwise not.
For example:
void foo(HWND w)
{
SetWindowTextA(w, "Færøyene"); // ANSI version
SetWindowTextW(w, L"Færøyene"); // UTF-16 version
SetWindowText(w, TEXT("Færøyene")); // Agnostic version
SetWindowText(w, _T("Færøyene")); // Agnostic version; less verbose alternative
}
Since Windows 10 version 1903 (May 2019 Update), the operating system now also supports an ANSI code page for UTF-8, allowing you to use narrow Unicode strings encoded in UTF-8.
void foo(HWND w)
{
SetWindowTextA(w, u8"Færøyene"); // ANSI UTF-8 code page
SetWindowTextA(w, "Færøyene"); // The "u8" prefix can be dropped, if the execution character set is set to UTF-8 in the compiler.
}
For more information about Unicode in Windows, see Working with Strings and Unicode in the Windows API in the Windows Development Reference. The MSDN article Globalization Step-by-Step is also a useful introduction to the topic.
Setting the active ANSI code page to UTF-8 allows you to use the good old ANSI build mode and narrow strings for your OWLNext applications and still be able to use Unicode. However, if you rely on this code page, you can then only deploy your applications on Windows 10 version 1903 or later. Also, note that OWLNext will not work out of the box with UTF-8. You will have to make further changes to your OWLNext application to make it work with UTF-8. In particular, you have to explicitly set fonts with UTF-8 support enabled, and you may have to modify or replace any component that does string processing, so that it works properly with the UTF-8 encoding.
For information on how to activate the UTF-8 code page, see "Use the UTF-8 code page" in the Windows documentation.
A big issue when using the UTF-8 code page, is that GDI text functions will not use it by default. The code page used by GDI text functions is determined by the currently selected font. In particular, the LOGFONT::lfCharset property of the font must be set to the undocumented value 254 to interpret strings as UTF-8 encoded (see Ted's Blog). Additionally, the selected font must of course support all the characters needed in the Unicode character set (some fonts only have a limited range of characters).
enum { UNICODE_CHARSET = 254 }; // Undocumented
// Create a new font with support for UTF-8 enabled.
//
auto f = TFont{"Segoe UI", -10, 0, 0, 0, 400, 0, 0, 0, 0, UNICODE_CHARSET};
// Enable support for UTF-8 for the current window font.
//
auto lf = TFont{GetWindowFont()}.GetObject();
lf.lfCharSet = UNICODE_CHARSET;
SetWindowFont(TFont{lf});
Currently, OWLNext does not know anything about this, so all GDI text drawing done by OWLNext will not support UTF-8 by default. You will have to override the fonts wherever possible, and where not possible, you will have to find alternative solutions (e.g. use owner-drawing, or use a fully custom component, or modify OWLNext).
OWLNext applications can be made to support both UTF-16 and ANSI text. When an OWLNext application is compiled with UNICODE defined, OWLNext will use UTF-16 types and functions; otherwise, it will use the ANSI variants instead. The application will automatically be linked with the respective variant of the OWLNext library. You can either code explicitly for wide UTF-16 text, or you can use the macro API to make your application agnostic to the UNICODE build mode, i.e. so that it uses UTF-16 text in UNICODE build mode, and ANSI text otherwise. For example:
void foo(TWindow& w)
{
w.SetWindowText(L"Færøyene"); // Unicode version
w.SetWindowText(_T("Færøyene")); // Agnostic version
}
The following sections provide some guidelines for moving your OWLNext application into the Unicode world. For complete code, demonstrating the use of the agnostic macro API to support both UNICODE and ANSI build modes, see [Examples].
Add UNICODE to the preprocessor definitions for your application. This is necessary even if you do not intend to use the Windows macro API for Unicode. OWLNext uses the macro API in its own API and implementation, and automatically decides which variant of the OWLNext library to link, depending on the definition of the UNICODE symbol.
If you define UNICODE, then OWLNext will automatically define _UNICODE also (note the leading underscore). The _UNICODE symbol is used to enable a similar generic-text macro API for the string functions in the runtime library. For example, the macro _tcscpy will expand to wcscpy if _UNICODE is defined, otherwise it will expand to _mbscpy (if _MBCS is defined) or strcpy (if none is defined). For more information, see Generic-Text Mappings in the Microsoft documentation.
The signature of the OwlMain entry function must be updated for the UNICODE build mode, using one of the following alternatives:
int OwlMain(int argc, LPWSTR argv[]); // Unicode version
int OwlMain(int argc, wchar_t* argv[]); // Equivalent alternative
int OwlMain(int argc, LPTSTR argv[]); // Agnostic version
int OwlMain(int argc, TCHAR* argv[]); // Equivalent alternative
int OwlMain(int argc, owl::tchar* argv[]); // Another agnostic alternative since OWLNext 6.32
Note that all of these alternatives boil down to the same signature in the UNICODE build mode. Use one of the agnostic alternatives if your application needs to support ANSI build mode as well.
For all classes inherited from OWLNext classes, you must update the signature of any virtual override that has a string parameter or return value. For example, the correct UNICODE compliant signature of TView::SetDocTitle is now:
virtual bool SetDocTitle(LPCTSTR docname, int index); // UNICODE compliant signature
Tip: If you use a C++11 compliant compiler, then use the keyword override on all your overriding virtual functions. When override is used, the compiler will tell you if the signature does not match the base virtual function. Without it, the mismatch goes undetected, and your program will not work as intended.
bool SetDocTitle(LPCTSTR docname, int index) override; // UNICODE compliant signature
The recommended way to handle strings in OWLNext is to use the string classes in the standard C++ library. On the Windows platform, wchar_t is defined as 16-bit, making std::wstring a perfect fit for UTF-16.
OWLNext 6.32 introduced wide-spread support for the standard string classes throughout the library. Version 6.32 also introduced the owl::tstring type definition as an agnostic type, mapping to std::wstring in UNICODE build mode, and to std::string otherwise. Note that the owl::tstring type definition was named owl_string in versions prior to OWLNext 6.32. See Strings in OWLNext for more information.
If you need to refer to the string character type in a generic way, you can use owl::tstring::value_type. Outside the string classes, you can use the type definition owl::tchar. It will map to wchar_t in UNICODE mode, and to char otherwise. Note that owl::tchar was introduced in OWLNext 6.32. In versions prior to 6.32, you can use the Windows macro TCHAR.
To simplify and increase the robustness of your application, you should avoid using plain C-style string functions, but if you have to, then use the wide or agnostic variants, e.g. use wcscpy or _tcscpy to replace strcpy and _mbscpy. See Generic-Text Mappings in the Microsoft documentation.
To support UTF-16, you must replace use of narrow streams by their wide counterparts. OWLNext 6.32 introduced agnostic type definitions, such as owl::tostringstream, that maps to wide streams (e.g. std::wostringstream) in UNICODE mode, and to narrow streams (e.g. std::ostringstream) otherwise. See "include/owl/private/strmdefs.h" for all the available type definitions.
All string and character literals prefixed by 'L' will be encoded as UTF-16 by compilers targeting the Windows platform. To create a UNICODE agnostic application, encase all character strings and single characters with the TEXT macro (or the shorter _T synonym). For example:
void foo(TStatic& c)
{
c.SetText(L"Færøyene"); // UTF-16 string literal
c.SetText(_T("Færøyene")); // Agnostic alternative
}
See "Strings in OWLNext" for guidance on conversion between narrow ANSI strings and wide UTF-16 strings.
When dealing with Unicode string lengths it is important to understand the difference between character count (Unicode code points) and encoded element count (Unicode code units). These counts are not synonymous for variable-width encodings such as UTF-8 and UTF-16. A common misconception is that UTF-16 is a fixed-width encoding (i.e. it is confused with the older fixed-width UCS-2 encoding), but it is not; some code points in the Unicode character set require two UTF-16 (wchar_t) code units to represent.
Note that std::wstring::size and wcslen (and hence owl::tstring::size and _tcslen) return the code unit count.
Beware that sizeof, commonly used to calculate the element count of a char array, will not return the element count of a wchar_t array; it returns the size of the array in bytes. If your compiler supports C++17, you can use std::size to calculate the element count of an array. For older compilers, you can use COUNTOF, a macro provided by OWLNext.
For example, a common usage of TInputDialog in a traditional OWL application is:
char buf[32] = "Færøyene";
TInputDialog dlg(this, "Title", "Prompt:", buf, sizeof(buf));
...
To support the UNICODE mode, the code above could be ported as follows:
tchar buf[32] = _T("Færøyene");
TInputDialog dlg(this, _T("Title"), _T("Prompt:"), buf, std::size(buf));
...
Note that std::size and COUNTOF can only be used on arrays. They will not work for string pointers, such as LPTSTR. When you have a pointer to a string, then you need to use _tcslen or wcslen to calculate its length (i.e. in UTF-16 code units). In general, it is recommended that you rewrite your code to use standard string classes where possible to avoid the use of error-prone buffers.
News: 2021/02/utf-8-support-in-owlnext
Wiki: Examples
Wiki: Knowledge_Base
Wiki: OWLMaker
Wiki: Replacing_the_Borland_C++_Class_Libraries
Wiki: Strings_in_OWLNext
Wiki: Upgrading_from_OWL