|
From: Kentaro H. <ha...@cl...> - 2016-09-08 04:18:52
|
Hi, On Wed, 07 Sep 2016 09:47:45 -0300 Gustavo Courault <gco...@un...> wrote: > Hello: > > I use postgres full text seach for an OPAC in ours libraries. > It works fine but it is slowly and is complicated search for only one > word and so on. > So, I just install pgroonga and make some searchs. > I note that groonga dont have spanish tokens. For example, "cómico" > obtain a different result than "comico" and the expected behavior is > obtain the sum of them: "cómico" and "comico". > That is, the search should ignore the letter "o" with a tilde. > Perhaps I dont understand how install or write the apropiate tokens. > Thank you for your help. It seems that proper spanish normalizer is required to recognize "cómico" and "comico" are same. But as far as I know, there is no such a normalizer for Groonga. If you have no afford to implement spanish normalizer by yourself, there is a workaround. It uses Groonga functionality from PGroonga. Groonga has a feature to treat synonyms from TSV http://groonga.org/docs/reference/query_expanders/tsv.html The workaround use above feature. First, create synonyms entry. # cat /etc/groonga/synonyms.tsv # -*- coding: utf-8 -*- # # key[TAB]synonym1[TAB]synonym2[TAB]... # #groonga groonga rroonga mroonga comico comico cómico Then restart postgresql. Try example on https://pgroonga.github.io/tutorial/. CREATE TABLE memos ( id integer, content text ); CREATE INDEX pgroonga_content_index ON memos USING pgroonga (content); INSERT INTO memos VALUES (1, 'cómico is ...'); INSERT INTO memos VALUES (2, 'comico is ...'); We know 'OR' works, but it is not what you want to. pgroonga_test=# select * from memos where content @@ 'cómico OR comico'; id | content ----+--------------- 1 | cómico is ... 2 | comico is ... (2 rows) Let's try to use Groonga feature via PGroonga. It uses pgroonga.command. https://pgroonga.github.io/reference/functions/pgroonga-command.html # SELECT * FROM json_array_elements(pgroonga.command('select ' || pgroonga.table_name('pgroonga_content_index') || ' --query comico --match_columns content --query_expander QueryExpanderTSV')::json->1->0); value ------------------------------------------------------------- [2] [["_id","UInt32"],["content","LongText"],["ctid","UInt64"]] [2,"comico is ...",2] [1,"cómico is ...",1] (4 rows) It seems a bit tricky, but just works for you. -- Kentaro Hayashi <ha...@cl...> |