Re[2]: [Firebird-devel] Additional Collate for Firebird (Case/Accent Insensitive)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello, Peter,

>> Current LIKE implementation doesn't use collations. Another problem is
>> that if collations are used and INTL interface is unchanged we can
>> forget about intelligent KMP algorithm. Best thing we can use is
>> "brute force" :(((

> For short strings, that wouldn't really be that bad.

> For long string comparisons, you can use general regexp matching and
> preprocess every character in the pattern to a group of all equivalent
> characters in the collation:
LIKE '%CAFE%' =3D>>
> /.*[cC][aA=E4=C4=E1=C1=E0=C0][fF][eE=E9=C9=E8=C8].*/

General regexp mathing is very slow.

> So, the INTL interface may need an addition for a painless retrieval of t=
he
> equivalence classes, but in O(n) speak, it's a minor detail as dumb or
> smart retrieval of equivalence classes are both O(1) ;-)

Things are not so simple. German letter b (written like greek beta)
collates the same way as "ss" sequence. There are many other artefacts
like this. Correct solution is to preprocess both patterns and source
string the way simular to transformation used for indexing. But this
requires some changes to INTL interface.

BTW, my implementation of correct LIKE matching is in experimentation
stage yet. So somebody else may address the problems. And I can share
my ideas and results of experiments.

The problems with string=5Fboolean implementation are:

1. LIKE pattern matching is extremely slow
2. collations are not used for string functions
3. BLOBS are processed incorrectly

I think this problems should be addressed in complex as they are
very tightly bound. I think of the following solution:

1. Implement single-pass pattern matching algorithm for LIKE
(Knuth-Morris-Pratt algorithm with some extensions seems to fit the
task perfectly)

2. Use callbacks in EVL=5Fxx=5Flike and EVL=5Fxx=5Fcontains functions

3. Add collation filter function to INTL ABI (and drop all like, merge
and sleuth functions) that will use callback to fetch data and call
callback function to give out results. This function should normalize
string data the way so it can be used for pattern matching.

This would solve all problems. I do not think that partial solutions
are acceptable because they'll have to be dropped when other problems
are addressed.

Maybe you, Peter or Blas can pick up the this issues ?
I can address them myself, but this will somewhat defer implementation
of other points in my TODO.

> Peter Jacobi

--=20
Nickolay Samofatov                         mailto:sk...@bs...

Re[2]: [Firebird-devel] Additional Collate for Firebird (Case/Accent Insensitive)

A powerful, cross platform, SQL database system

Re[2]: [Firebird-devel] Additional Collate for Firebird (Case/Accent Insensitive)