Re: [Firebird-devel] Case-insensitive collations

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi James !

James K. Lowden wrote:

> This is at least partly a function of the application.
>
>It's standard practice to use UPPER() to defeat case-sensitive searches.  
>  
>
As you know, if you use UPPER, you cannot use the index anymore.

Maybe when indices based on functions are available, the problem goes 
away, but I still have some doubts... see below.

>Perhaps a similar function "NOACCENT()", is needed for accents?  I'm not
>familiar with languages whose accented characters are sometimes not 
>considered distinct.  (In Swedish, for example, 'a', and 'å' are two very
>different letters.)  How is this solved by other implementations?  
>  
>
The NOACCENT function could solve the problem, But all queries should be 
changed to:

Select * from Customer where UPPER(NOACCENT(Name)) = 
UPPER(NOACCCENT(:aValue));

If I have an index such:
create index SK_Name on Customer UPPER(NOACCENT(Name));

This query will be indexed, but if I just mystyped the order of the 
functions [NOACCENT(UPPER(Name))] the query will be "natural" again, not 
a big problem if I just adopt a common way of use...

In my main apllication no problem (just change my Data Classes), but in 
third party tools (Crystal Reports for example), this become a 
problem... And in every query like when you are doing some checking in 
ISQL, etc, you have to put the UPPER(NOACCENT(something)) to get the 
desired result. I think a more elegant approach is if it can be defined 
on the table definition, so nobody should remember to put the double 
function everywhere. As you pointed out, this is not the default for 
every language, so a collation with this caracteristic should be choosen 
by the developer when it applies to his needs.

Other Implementations:
I will say about MSSQL (version 6.5 is the last I have used, a really 
old one....).
I don't think that MSSQL approach is the better one, but...

When you install the Server, you choose what dictionary (I think that 
was the term used) you want to work (very bad to choose a global one, in 
my opnion this should be at least for each DB, the ideal is for each 
column !).

MSSQL has a similarity table, something like this:
A = a = Á = á = Ã = ã = À = à = Â = â
B = b
C = c = Ç = ç
D = d
E = e = É = é = Ê = ê

This table is used for comparisons. MSSQL stays that when one uses a non 
binary dictionary the speeds could be 20% slow.

The approach specified in the UNICODE doc's Peter pointed out is much 
powerfull, generating the sortkey as 3 or more 2 bytes values, and then 
compare the sort key values seens good and powerfull to me (more than 
one char could be mapped to one char, etc.), don't know about 
performance penalty, but since this will be the developer choice and 
will not be a default to every db around the world, I think I could 
expect (are are warned about) a slower performance when I define a 
column with as case/accent insensitive data.

>It seems to me the problem with using collations that don't distinguish,
>say, between accented and non-accented characters is that the applications
>are then prevented from distinguishing, every time.  Language contructs
>that allow per-use overrides are more flexible.  
>
>  
>
I think the collation should be defined in the column (as it is), I 
could have different columns that should behave diferently. But as 
mentioned on the paper point out by Peter, the behaviour should be 
consistent with the cultural aspects of the user, even if Swedish people 
differs "a" and "å" if I (a Brazilian guy) was searching for Sweden 
names I expect the mentioned chars to be close to each other, and if I 
just type "a" I expect the records with "å" to be found any way. As I 
understood this will have a consistent behaviour across any 
table/column, then it should be based on connection. But I don't have 
experience with this kind of problem...

>--jkl
>  
>
see you !

-- 

Alexandre Benson Smith
Development
THOR Software e Comercial Ltda.
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br

Re: [Firebird-devel] Case-insensitive collations

A powerful, cross platform, SQL database system

Re: [Firebird-devel] Case-insensitive collations