1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Ticket #108 (assigned defect)

Opened 4 years ago

Last modified 6 months ago

Search didn't work for multi-byte character title/description

Reported by: yeahy Owned by: andy_st
Priority: major Milestone: 3.2
Version: 3.0 Alpha 2 Keywords: i18n
Cc: andy_st, jankoprowski

Description

When search for multi-byte characters title/description, always get "No results found for xxx".

Existing photos title:
äüö.jpg,
大.jpg

Search string 1: äüö
Search result: No results found for äüö

Search string 2: 大
Search result: No results found for 大

Change History

Changed 4 years ago by yeahy

  • milestone set to 3.0 Alpha 3

Changed 4 years ago by bharat

  • milestone changed from 3.0 Alpha 3 to 3.0 Beta 1

Changed 4 years ago by bharat

  • milestone changed from 3.0 Beta 1 to 3.0 Beta 2

Changed 4 years ago by bharat

  • milestone changed from 3.0 Beta 1 to 3.0 Beta 2

Changed 4 years ago by tnalmdal

  • milestone changed from 3.0 Beta 2 to 3.0 Beta 3

Changed 4 years ago by bharat

  • owner set to bharat
  • status changed from new to assigned

Changed 4 years ago by andy_st

  • keywords i18n added
  • milestone changed from 3.0 Beta 3 to 3.0 RC 1

Hi jankoprowski,

I think MySQL fulltext search should work with multi-byte Unicode characters just fine, at least with MySQL 5 and later versions.
It should certainly work with Cyrillic, and there might (still) be some problems with Han characters (Chinese, Japanese, Korean, ...), where MySQL at least in prior version had problems because it wouldn't tokenize words correctly.

Please have a look at some of the discussion comments at:
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html

It'd be great if you could experiment a bit with your G3 / MySQL installation to find out what parameters are needed to make it work and we can then try to configure most of it at installation / runtime, and document the rest.

Generally, I'd look in 3 places:

  • MySQL server settings, i.e. there are fulltext (ft) settings which can be configured in your MySQL settings file. A restart of the MySQL server is necessary when changing these. And you might have to rebuild your fulltext index as well after changing these settings.
  • MySQL connection settings. When gallery 3 connects to the MySQL server, the connection has some default values for character encoding and the like. After creating the connection, the connection can be configured. The character encoding should be utf8.
  • MySQL table column collation. We need utf8_unicode_ci columns for item title, description, etc.

Please investigate and experiment the issue. I'm looking forward to hearing back from you such that we can ensure that search in Gallery 3 works just fine with multi-byte characters.

Changed 4 years ago by andy_st

  • cc andy_st added

Changed 4 years ago by andy_st

  • owner changed from bharat to andy_st

Changed 4 years ago by jankoprowski

Ok :) I try... But I'am really trying (a lot) and I still get nothing. I try on two other Unix system (until now i check this on ma xampp installation).

Greetings from Poland !

Changed 4 years ago by jankoprowski

Fulltext searching for given characters start working after set ft_min_word_len to 1.

SHOW VARIABLES LIKE 'ft_min_word_len';
ft_min_word_len 1

We set this variable adding in section [mysqld] to my.cnf

[mysqld]
ft_min_word_len=1

and restarting the server. My other configuration around this problem:

my.cnf
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
SHOW VARIABLES LIKE "character_set%";
character_set_client	utf8
character_set_connection	utf8
character_set_database	utf8
character_set_filesystem	binary
character_set_results	utf8
character_set_server	utf8
character_set_system	utf8

But I think the most significant in this case is ft_min_word_len value.
One more think. After change this value we must reindex table. For example by:

REPAIR TABLE table_name QUICK;

Changed 4 years ago by jankoprowski

  • cc jankoprowski added

Changed 3 years ago by bharat

  • milestone changed from 3.0 RC 1 to 3.0 RC 2

Changed 3 years ago by andy_st

I'd imagine ft_min_word_len=1 is quite expensive (space and time).

It looks like ft_min_word_len is the number of UTF-8 code points (~ characters).

Is there a way to tell MySQL that ft_min_word_len should be counting in bytes, not characters? That way, we could set it to 3 and avoid indexing Latin script words of length 1-2.

Either way, I guess we should add your recommendations to Gallery 3's documentation. It's a MySQL configuration setting which we should recommend for users with CJK content.

Changed 3 years ago by jankoprowski

I can't find any additional informations about another solution. Recommendation in documentation or even some kind of warning/information in database installator section sounds good. I found one workaround propositionŁ

http://bytes.com/topic/mysql/answers/77599-problem-ft_min_word_len

but I don't know is this worth to implement. Especially at this stage of project.

Changed 3 years ago by tnalmdal

  • milestone changed from 3.0 RC 2 to 3.1

Changed 6 months ago by dentizm

This is a great inspiring article.I am pretty much pleased with your good work.mantolama,
dış cephe mantolama,mantolama malzemeleri,yalıtım,ısı yalıtımı,çatı,
çatı tadilatı,izolasyon,mantolama fiyatları,ısı yalıtım malzemeleri, You put really very helpful information. best regards.

Note: See TracTickets for help on using tickets.