From: Ivan P. <Iva...@se...> - 2004-06-24 15:55:03
|
Why the need of case insensivite search: I have database that collects data from several independent sources. I have no control over their rules, whether everything is in upper/lower/proper/whatever case, all I know is that in most cases I have to store data in their original form, and I have no right to normalize them. Why the need of accent insensitive search: * sometimes accent/diacritical marks are missing in data because - user forgot to write it (spelling error) - user does not know what the correct mark is, e.g. in foreign names - it was written in system (or passed through the system) that does not support such character, e.g. west european journalist could hardly write my name correctly, because "r with caron" (the correct second letter of my lastname) does not exist in iso8859_1/win1252. * data are stored correctly, but the person entering the query - can be standing at counter, with only one hand free, thus using just basic 26 letters can reduce required typing - can be ... not much good at grammar to say it mildly - can be afraid of computers, can be perplexed by laptop's keyboard layout, can be used to other search engines, etc... In many cases even Dave's case/accent insensitive collations do not help, because they are - accent insensitive only for secondary differences, but they still treat several letters with caron as primary difference (and such collation is of little use for *search*, at least for Czech. Either all marks should be considered, or all should be ignored). - accent sensitive with Containing operator (perhaps just fn_to_lower() should be rewritten. Is it used only by Containing?) I always prefer if I have a choice. Despite wanting *-insensitive behaviour in many cases, sometimes *-sensitive behaviour is also desirable. E.g. CONTAINING is *always* case-insensitive (or semi-case-insensitive for binary collations), but its case-sensitive variant is sometimes needed too (e.g. when searching for short abbreviations like AP, that can be part of many words), or its diacritic-insensitive variant (for many already mentioned reasons). So I see good reasons to have all 3 variants of CONTAINING. (But, the Containing is the least problematic operation, because it does not use index, and can be easily replaced by UDF. Perhaps even more effectively, because right operand of Containing is uppercased unnecessarily again and again.) Now I see I chose wrongly the Subject. People have some requirements, and these can be achieved by many ways - collations are only one of them. However, most solutions are probably long-term, while adding few new collations could be done in relatively short time. Ivan |