Re: [Firebird-devel] Case-insensitive collations

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Why the need of case insensivite search:
I have database that collects data from several independent
sources. I have no control over their rules, whether everything
is in upper/lower/proper/whatever case, all I know is that
in most cases I have to store data in their original form,
and I have no right to normalize them.

Why the need of accent insensitive search:
* sometimes accent/diacritical marks are missing in data because
  - user forgot to write it (spelling error)
  - user does not know what the correct mark is, e.g. in foreign names
  - it was written in system (or passed through the system) that does not
    support such character, e.g. west european journalist could hardly
    write my name correctly, because "r with caron" (the correct second
    letter of my lastname) does not exist in iso8859_1/win1252.
* data are stored correctly, but the person entering the query
  - can be standing at counter, with only one hand free,
    thus using just basic 26 letters can reduce required typing
  - can be ... not much good at grammar to say it mildly
  - can be afraid of computers, can be perplexed by laptop's
    keyboard layout, can be used to other search engines, etc...

In many cases even Dave's case/accent insensitive collations
do not help, because they are
    - accent insensitive only for secondary differences, but
      they still treat several letters with caron as primary difference
      (and such collation is of little use for *search*, at least for Czech.
      Either all marks should be considered, or all should be ignored).
    - accent sensitive with Containing operator
      (perhaps just fn_to_lower() should be rewritten. Is it used only by Containing?)

I always prefer if I have a choice.
Despite wanting *-insensitive behaviour in many cases,
sometimes *-sensitive behaviour is also desirable. E.g. CONTAINING
is *always* case-insensitive (or semi-case-insensitive for binary collations),
but its case-sensitive variant is sometimes needed too (e.g. when searching
for short abbreviations like AP, that can be part of many words),
or its diacritic-insensitive variant (for many already mentioned reasons).
So I see good reasons to have all 3 variants of CONTAINING.
(But, the Containing is the least problematic operation,
because it does not use index, and can be easily replaced
by UDF. Perhaps even more effectively, because right operand
of Containing is uppercased unnecessarily again and again.)

Now I see I chose wrongly the Subject.
People have some requirements, and these can be achieved
by many ways - collations are only one of them. However,
most solutions are probably long-term, while adding few
new collations could be done in relatively short time.

Ivan

Re: [Firebird-devel] Case-insensitive collations

A powerful, cross platform, SQL database system

Re: [Firebird-devel] Case-insensitive collations