Bugs item #789161, was opened at 2003-08-15 07:43
Message generated for change (Comment added) made by asfernandes
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=109028&aid=789161&group_id=9028
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Charsets/Collations
>Group: Fixed v2.0
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Peter Jacobi (peter_jacobi)
Assigned to: Adriano dos Santos Fernandes (asfernandes)
Summary: CHAR length not enforced for charset UNICODE_FSS
Initial Comment:
CHAR columns of character set UNICODE_FSS will
hold more characters than declared. Seemingly
data is accepted if it uses up to 3*charlenght
bytes as UTF-8 string.
Tested with FB 1.5rc4 on Win32 using ISQL
See the session transscript:
(Note that ASCII chars needs 1 byte in
UTF-8, umlauts (ÄÖÜäöü) need two bytes
and € (the Euro sign; U+20AC) needs three
bytes.
SQL> set names WIN1252;
SQL> create database '/utf8bug1.fdb' user 'sysdba'
password 'masterkey';
SQL> create table t1 (c1 char(2) character set
UNICODE_FSS);
SQL> insert into t1 (c1) values ('A');
SQL> insert into t1 (c1) values ('AB');
SQL> insert into t1 (c1) values ('ABC');
SQL> insert into t1 (c1) values ('ABCD');
SQL> insert into t1 (c1) values ('ABCDE');
SQL> insert into t1 (c1) values ('ABCDEF');
SQL> insert into t1 (c1) values ('ABCDEFG');
SQL> insert into t1 (c1) values ('Ä');
SQL> insert into t1 (c1) values ('ÄÖ');
SQL> insert into t1 (c1) values ('ÄÖÜ');
SQL> insert into t1 (c1) values ('ÄÖÜÄ');
SQL> insert into t1 (c1) values ('€');
SQL> insert into t1 (c1) values ('€€');
SQL> insert into t1 (c1) values ('€€€');
SQL> select * from t1;
C1
======
A
AB
ABC
ABCD
ABCDE
ABCDEF
Ä
ÄÖ
ÄÖÜ
€
€€
(end of session transscript)
----------------------------------------------------------------------
>Comment By: Adriano dos Santos Fernandes (asfernandes)
Date: 2005-05-27 21:41
Message:
Logged In: YES
user_id=940451
For backward compatibility reason, UNICODE_FSS remain with
this behavior.
If you do not like it, use the new UTF8 character set.
----------------------------------------------------------------------
Comment By: Nickolay Samofatov (skidder)
Date: 2004-11-18 18:55
Message:
Logged In: YES
user_id=495356
Adriano is working on this in INTL branch
----------------------------------------------------------------------
Comment By: Peter Jacobi (peter_jacobi)
Date: 2003-08-21 07:07
Message:
Logged In: YES
user_id=845149
Responding to akini's comment (and asking
to followup the 'always UNICODE' part to
firebird-support)
akini wrote:
> It's a shame if we must wait for FB2 until unicode support is
> properly implemented in Firebird.
You don'thave to, wait, you can help evolving FB's
UNICODE support, as this is an open source project.
> [...} (always stating that create new db
> with unicode_fss as default charset). It would make
> international-aware database life easier.
[...]
> And maybe it should use
> unicode format as implicit default charset for "create
> database" command.
Always using UNICODE has performance and data
integrity downsides. Requirements vary, so I can't
agree on that one.
I'll move this discussion to the firebird-support
mailing list.
> And maybe support for real UTF16BE/LE format where chars
> are 2 bytes long.
You are invited to test my fbintl2.dll which implements some
experimental character set support, including UTF16BE.
Peter Jacobi
----------------------------------------------------------------------
Comment By: Aki Nieminen (akini)
Date: 2003-08-21 06:47
Message:
Logged In: YES
user_id=367429
[this post is more related to jdbc and dotNET drivers
connectionstrings]
And I think it's poorly documented how to use UNICODE_FSS
properly at client-side and server-side. All examples use a
simple connectionstrings without lc_ctype parameter and then
clients just dont work properly against unicode databases.
http://www.inetsoftware.de has an excellent MSSQL jdbc driver,
where they have implemented two different subprotocol
variations. I've always found it easier and bulletproof to use
unicode in MSSQL server. (note: mssql uses 2-byte unicode
format internally where all characters are stored as 2 bytes).
Use unicode format (tables have nchar, nvarchar columns):
url=jdbc:inetdae7:111.222.333.444:1433
Use old ascii format (tables have char, varchar columns):
url=jdbc:inetdae7a:111.222.333.444:1433
Similar approach in Jaybird and dotNETProviders might be
something to think about. Use a new subprotocol and client
will use unicode charset implicitly.
Or then just document charset parameter better and
examples should use it (always stating that create new db
with unicode_fss as default charset). It would make
international-aware database life easier.
It's a shame if we must wait for FB2 until unicode support is
properly implemented in Firebird. And maybe it should use
unicode format as implicit default charset for "create
database" command. And change that charset name to a
standardized UTF-8.
And maybe support for real UTF16BE/LE format where chars
are 2 bytes long. This would minimize the client-server
conversions in Java and dotNet environments where
characters always are 2-bytes unicodes.
----------------------------------------------------------------------
Comment By: Aki Nieminen (akini)
Date: 2003-08-21 06:24
Message:
Logged In: YES
user_id=367429
Here is my page about the problem with VARCHAR(1) column.
Its related to the same identity confling between bytes and
characters. If we say VARCHAR(1) then we mean
_characters_ from sql schema point of view.
We dont care whether firebird internally handles it as 3 bytes
long inside the engine. That engine should logically handle and
display it back to clients as VARCHAR(1) column.
http://koti.mbnet.fi/akini/ib/
http://koti.mbnet.fi/akini/ib/unicode.html
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=109028&aid=789161&group_id=9028
|