OWLNext: C++ Application Framework / Bugs / #41 TCharSet and TFilterValidator are broken for Unicode

#41 TCharSet and TFilterValidator are broken for Unicode

Milestone: unspecified

Status: closed

Owner: nobody

Labels: Run-time Error (general) (28)

Priority: 5

Updated: 2012-09-27

Created: 2009-07-26

Creator: Vidar Hasfjord

Private: No

The TCharSet class in "owl/bitset.h" is unable to represent a set of wide characters. It can only represent narrow (8-bit) characters. If the given set of wide characters (the constructor parameter) contains characters with codes above 255 then those characters are arbitrarily converted to characters with codes below 256 by truncation before adding those to the set representation. This limits the useful characters to the lower 256 character codes.

TCharSet is used by TFilterValidator. This means that TFilterValidator is also broken. No other classes in OWLNext currently uses TCharSet.

This bug is the cause of the following compiler warning when building the Unicode variant of OWLNext 6.21.9:

filtval.cpp(57) : warning C4244: 'argument' : conversion from 'TCHAR' to 'uint8', possible loss of data

I recommend that TCharSet is fixed so that it is able to represent a set of wide characters correctly. Alternatively, an exception should be thrown for unsupported character codes (> 256).

Vidar Hasfjord - 2009-07-27

Unified diff applicable to OWLNext 6.21.9

TCharSet-Unicode-fix.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vidar Hasfjord - 2009-07-27

I've attached a patch with an untested fix for this issue.

Fixed: "TCharSet and TFilterValidator is broken for Unicode" (Tracker ID 2827517). Added support for wide characters in TBitSet (the base for TCharSet).

The Unicode variant of 6.21.9 now builds cleanly without warnings with VC9.

Note that the fix changes TBitSet substantially from a ordinary class to a class template. User client code will have to be updated. I have only compiled it with VC9, and it is uncertain how it will work with other compilers. And I have not tested the functionality of the fix; i.e. that it actually works in practice.

So please test this patch.

Also note that the implementation is brute-force. I just extended the bit array to cope with a wide character set (16-bit). This means that TCharSet does not support UTF-16 with multi-code-unit Unicode characters (surrogate pairs). It is limited to the UCS-2 fixed-length subset.

For wide characters the implementation requires a 8 KiB buffer for each instance of TCharSet, which may be considered a large memory cost.

Better solutions are welcomed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ognyan Chernokozhev - 2009-08-12

Fixed in 6.21.10

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

TCharSet and TFilterValidator are broken for Unicode

Borland's Object Windows Library for the modern age

Group

Searches

Help

#41 TCharSet and TFilterValidator are broken for Unicode

Related

Discussion