From: Peter J. <pj...@wa...> - 2003-10-07 14:01:03
|
Hi Nickolay, All, > Things are not so simple. German letter b (written like greek beta) > collates the same way as "ss" sequence. As a German I can tell you, that this is about 25% true on a scale of 0% t= o 100% true-ity. And if this is the only case holding back the implementation, we can argue to further decrease the true-ity level. > There are many other artefacts > like this. Correct solution is to preprocess both patterns and source > string the way simular to transformation used for indexing. But this > requires some changes to INTL interface. Can you elaborate? I would like to see this work in some way, but for the multi-level collations the sortkey returned consists of 2-4 parts and so i= t won't be of any direct use string searching: E.g: Caf=E9teria will return CAFETERIA333433333211111111 and Caf=E9 will return CAFE33342111, and for obvious reasons the latter isn't a substring of former one. There is already a unused (?) but designed interface in INTL, to return only the primary differences ('partial'), then Caf=E9teria will return CAFETERIA and Caf=E9 will return CAFE33342111, so that would fit the bill for nocase/noaccent substring searching. Regards, Peter Jacobi |