[groonga-talk:335] Re: Spanish tokens

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

In <201...@we...>
  "[groonga-talk:333] Spanish tokens" on Wed, 07 Sep 2016 09:47:45 -0300,
  Gustavo Courault <gco...@un...> wrote:

> So, I just install pgroonga and make some searchs.
> I note that groonga dont have spanish tokens. For example, "cómico"  
> obtain a different result than "comico" and the expected behavior is  
> obtain the sum of them: "cómico" and "comico".
> That is, the search should ignore the letter "o" with a tilde.
> Perhaps I dont understand how install or write the apropiate tokens.

Groonga normalizes the index target text. The default
normalization is based on Unicode 5.1. We have a plan to
update based Unicode version. If we update based Unicode
version, your case will be soled.

Until the update, there is a workaround.
groonga-normalizer-mysql package provides
NormalizerMySQLUnicode520CI normalizer:
  https://github.com/groonga/groonga-normalizer-mysql

The normalizer ignores a tilde in "ó". It means that
"cómico" equals to "comico".

You can install groonga-normalizer-mysql by the following
instruction:
  https://github.com/groonga/groonga-normalizer-mysql#install

You can use the normalizer by the following SQL:

  CREATE TABLE memos (
    id integer,
    content text
  );

  CREATE INDEX pgroonga_content_index
            ON memos
         USING pgroonga (content)
          WITH (tokenizer='NormalizerMySQLUnicode520CI');

The line is the important line:
  WITH (tokenizer='NormalizerMySQLUnicode520CI')

See also:
  http://pgroonga.github.io/reference/create-index-using-pgroonga.html#custom-tokenizer

Thanks,
--
kou

[groonga-talk:335] Re: Spanish tokens

an embeddable full-text search engine library

[groonga-talk:335] Re: Spanish tokens