Re: [Firebird-devel] Case-insensitive collations

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Alexandre,

> But then, they have a problem, if they search for
> like "Alexandre%"
> 
> he will not macth the legacy records that was imported as "ALEXANDRE"

It is (in my opinion) a defect in the Firebird code, that
like "Alexandre%" (and equivalently STARTING WITH "Alexandre")
doesn't work for you. For every multi-level collation it should matches
ALEXANDRE and alexandre and Alexandr=E9 etc.

I don't judege it to be wise, to add a large number of collations to make 
up for a code defect, which can easily changed, if only we agree that
it is a defect.

> in commercial applications (that is what I develop) the rule are:
> The case/accent does not matter on searching and ordering

Then why don't you implement one or let one commercial programmer spent 
four commercially paid hours to make your commercial application work 
commercially?

I'm doing this as a hobby of mine and I am more interested in 
linguistically correct sorting. 

> The letters 'a', '=E1, '=E0', '=E3', 'A', '=C1', '=C0', '=C3' should be =
considered the
> same in compararions and sorting.

The most generally rule I found, is that 'foreign' characters should be 
mapped to their nearest ASCII equivalent, but that some or all of the
non-ASCII characters of your own language are considered distinct.
So a Polish dictionary or phone book has separate entries for 
U+0141 LATIN CAPITAL LETTER L WITH STROKE 
but not for 
U+0153 LATIN CAPITAL LETTER O WITH DIAERESIS
And in Denmark it's just the other way around.

If you expect users, who will only want to enter ASCII characters for 
searching, are the same users doing the data entry? Then can you trust the=
m 
the enter the non-ASCII characters correctly or should the database
better store only ASCII characters.

So the remaining use case for a very aggressive no-accent collation seems 
to be an application, where data entry is done very carefully by users, wh=
o
are aware of character details, and searching by users who only know ASCII
or are forced to use as system where it is hard to enter non-ASCII 
characters.

> AFAIK, the multi-level collation will not work for "like" and "starts 
> with" in the majority of the search for names I use starts with, I read
> about your sugestion of using between "something" and "somethingzz", but
> like are much more powerfull... :-(
> 
> As I said above in general does not matter if will insert "Red" or "red"=
 or
> "RED", if will put an unique constraint or PK on this column, I must be =
sure
> that the case variations should not be considered if I define this field=
 as
> case insensitive, and for the user, he will not have problens since if h=
e
> searchs for "red" or "RED" or "Red" the record will be found anyway.

> I have contact with this guy, he has on the last days adjusted his patch=
 for
> FB 1.5. he is a member of CFLP (Portuguese Spoken Firebird Community).
> 
> His patch can do a case insensitive/accent insensitive search, columns w=
ith
> up to 250 chars can be indexed, the "like", "containing" and "starts wit=
h"
> works.

Fine. So you see, Firebird INTL architecture allows easy additions specifi=
c 
to your needs.

The above can also be achieved using the LOADABLE collation of my pjcolkit=
, 
but as residing in fbintl2 and not in fbintl, it is somewhat more awkward 
to use. (http://www.jodelpeter.de/i18n/fbarch/loadable.txt)

> I will be glad if I can help you to better understand this situation.

There is a non technical point to consider:

Some aspects of collations are just tedious, stupid work. So you can expec=
t 
a lack of volunteering in OSS projects. It's like the situation with fonts=
:
There are a big number of free fonts, but almost none of them look good
at small point sizes (some even look ugly at all point sizes!), because
this would require a large amount of "hinting", which is a very, very
tedious and stupid work.

Regards,
Peter Jacobi

Re: [Firebird-devel] Case-insensitive collations

A powerful, cross platform, SQL database system

Re: [Firebird-devel] Case-insensitive collations