From: Peter J. <pj...@wa...> - 2004-06-24 08:37:57
|
Hi Alexandre, > But then, they have a problem, if they search for > like "Alexandre%" > > he will not macth the legacy records that was imported as "ALEXANDRE" It is (in my opinion) a defect in the Firebird code, that like "Alexandre%" (and equivalently STARTING WITH "Alexandre") doesn't work for you. For every multi-level collation it should matches ALEXANDRE and alexandre and Alexandr=E9 etc. I don't judege it to be wise, to add a large number of collations to make up for a code defect, which can easily changed, if only we agree that it is a defect. > in commercial applications (that is what I develop) the rule are: > The case/accent does not matter on searching and ordering Then why don't you implement one or let one commercial programmer spent four commercially paid hours to make your commercial application work commercially? I'm doing this as a hobby of mine and I am more interested in linguistically correct sorting. > The letters 'a', '=E1, '=E0', '=E3', 'A', '=C1', '=C0', '=C3' should be = considered the > same in compararions and sorting. The most generally rule I found, is that 'foreign' characters should be mapped to their nearest ASCII equivalent, but that some or all of the non-ASCII characters of your own language are considered distinct. So a Polish dictionary or phone book has separate entries for U+0141 LATIN CAPITAL LETTER L WITH STROKE but not for U+0153 LATIN CAPITAL LETTER O WITH DIAERESIS And in Denmark it's just the other way around. If you expect users, who will only want to enter ASCII characters for searching, are the same users doing the data entry? Then can you trust the= m the enter the non-ASCII characters correctly or should the database better store only ASCII characters. So the remaining use case for a very aggressive no-accent collation seems to be an application, where data entry is done very carefully by users, wh= o are aware of character details, and searching by users who only know ASCII or are forced to use as system where it is hard to enter non-ASCII characters. > AFAIK, the multi-level collation will not work for "like" and "starts > with" in the majority of the search for names I use starts with, I read > about your sugestion of using between "something" and "somethingzz", but > like are much more powerfull... :-( > > As I said above in general does not matter if will insert "Red" or "red"= or > "RED", if will put an unique constraint or PK on this column, I must be = sure > that the case variations should not be considered if I define this field= as > case insensitive, and for the user, he will not have problens since if h= e > searchs for "red" or "RED" or "Red" the record will be found anyway. > I have contact with this guy, he has on the last days adjusted his patch= for > FB 1.5. he is a member of CFLP (Portuguese Spoken Firebird Community). > > His patch can do a case insensitive/accent insensitive search, columns w= ith > up to 250 chars can be indexed, the "like", "containing" and "starts wit= h" > works. Fine. So you see, Firebird INTL architecture allows easy additions specifi= c to your needs. The above can also be achieved using the LOADABLE collation of my pjcolkit= , but as residing in fbintl2 and not in fbintl, it is somewhat more awkward to use. (http://www.jodelpeter.de/i18n/fbarch/loadable.txt) > I will be glad if I can help you to better understand this situation. There is a non technical point to consider: Some aspects of collations are just tedious, stupid work. So you can expec= t a lack of volunteering in OSS projects. It's like the situation with fonts= : There are a big number of free fonts, but almost none of them look good at small point sizes (some even look ugly at all point sizes!), because this would require a large amount of "hinting", which is a very, very tedious and stupid work. Regards, Peter Jacobi |