Ticket #369 (new defect)
Enhanced syntax for CHR function
|Reported by:||mihaipopescu||Owned by:||mihaipopescu|
|Cc:||mihaipopescu, gaoprea, clazzart, dwkang, hyangtack, lynix, leekiyeul, rrazvan|
Description (last modified by mihaipopescu) (diff)
Enhanced syntax for CHR:
CHR (N [USING charset_name]);
- N : any numeric value coercible to BIGINT data type.
- charset_name : charset identifier The supported charsets are UTF-8 and ISO-8859-1.
The possible values for the second argument are the following strings: utf8, iso88591 (can be surrounded by commas).
If the second argument is not used, the charset of the default locale will be used.
When the first argument cannot be coerced to BIGINT, an error is thrown ("Cannot coerce *** to type bigint.").
If the value of the charset is not in the set of possible values stated above, an "Invalid character set" error will be thrown (grammar error).
A codeset incompatibility error can be thrown for the output value when using the UTF8 charset with particular CUBRID settings. This will be described below.
Implementation and error handling
- CUBRID handles the value argument as a BIGINT
- up to 4 least significant bytes of input value are taken into account discarding leading zero bytes
- the remaining bytes are stored in the output buffer string starting with the most significant byte (of the remaining bytes)
- the output string is just an interpretation of the buffer
- for ISO-8859-1 charset : one byte corresponds to one character
- for UTF-8 charset : several bytes can compose a character, so a checking is performed on the buffer: if it forms a valid UTF-8 sequence, the string is returned, otherwise an error is thrown (Character at offset <position_in_buffer> is invalid with current charset) or NULL (if return_null_on_function_errors is set) is returned.
- The size of the output is maximum 4 bytes, while the length depends on the charset.
Special cases and examples
- When the first argument is NULL, the result is NULL
- The first argument will be coerced to BIGINT and only a maximum of 4 of the least significant bytes are taken into consideration.
select chr(1094795585 using iso88591); select chr(284562637121 using iso88591);
1094795585 = 65 * 256 3 + 65 * 256 2 + 65 * 256 + 65 -> AAAA
284562637121 = 66 * 256 4 + 65 * 256 3 + 65 * 256 2 + 65 * 256 + 65 -> BAAAA , but CHR only takes 4 bytes
will yield the same result:
=== <Result of SELECT Command in Line 1> === chr(1094795585 using iso88591) ====================== 'AAAA' === <Result of SELECT Command in Line 2> === chr(284562637121 using iso88591) ====================== 'AAAA'
- CHR also accepts negative arguments, and as before, uses maximum 4 of the least significant bytes.
(4294967094 = 0xFFFFFFFF - 202 + 1)
=== <Result of SELECT Command in Line 1> === chr(-202 using iso88591) ====================== 'ÿÿÿ6' === <Result of SELECT Command in Line 2> === chr(4294967094 using iso88591) ====================== 'ÿÿÿ6'
Here is an example for using different charsets when converting 50108 to string using CHR.
50108 is C3 BC in hex; Interpreted as a UTF8 string, represents the latin small letter u with diaeresis: "ü".
As an ISO-8859-1 string is "Ã¼".
=== <Result of SELECT Command in Line 1> === char_length( chr(50108 using iso88591)) ===================================== 2 === <Result of SELECT Command in Line 2> === char_length( chr(50108 using utf8)) ====================================== 1
Differences from MySQL
- MySQL supports variable number of arguments; in CUBRID, the number of arguments to convert is one
- When using a single argument, MySQL accepts larger values, but it still only uses the least 4 significant bytes at the most.