|
From: Kouhei S. <ko...@cl...> - 2016-09-08 07:40:22
|
Hi, In <201...@we...> "[groonga-talk:333] Spanish tokens" on Wed, 07 Sep 2016 09:47:45 -0300, Gustavo Courault <gco...@un...> wrote: > So, I just install pgroonga and make some searchs. > I note that groonga dont have spanish tokens. For example, "cómico" > obtain a different result than "comico" and the expected behavior is > obtain the sum of them: "cómico" and "comico". > That is, the search should ignore the letter "o" with a tilde. > Perhaps I dont understand how install or write the apropiate tokens. Groonga normalizes the index target text. The default normalization is based on Unicode 5.1. We have a plan to update based Unicode version. If we update based Unicode version, your case will be soled. Until the update, there is a workaround. groonga-normalizer-mysql package provides NormalizerMySQLUnicode520CI normalizer: https://github.com/groonga/groonga-normalizer-mysql The normalizer ignores a tilde in "ó". It means that "cómico" equals to "comico". You can install groonga-normalizer-mysql by the following instruction: https://github.com/groonga/groonga-normalizer-mysql#install You can use the normalizer by the following SQL: CREATE TABLE memos ( id integer, content text ); CREATE INDEX pgroonga_content_index ON memos USING pgroonga (content) WITH (tokenizer='NormalizerMySQLUnicode520CI'); The line is the important line: WITH (tokenizer='NormalizerMySQLUnicode520CI') See also: http://pgroonga.github.io/reference/create-index-using-pgroonga.html#custom-tokenizer Thanks, -- kou |