1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Ticket #108 (closed defect: fixed)

Opened 5 years ago

Last modified 3 months ago

Search didn't work for multi-byte character title/description

Reported by: yeahy Owned by: andy_st
Priority: major Milestone: 3.2
Version: 3.0 Alpha 2 Keywords: i18n
Cc: andy_st, jankoprowski

Description

When search for multi-byte characters title/description, always get "No results found for xxx".

Existing photos title:
äüö.jpg,
大.jpg

Search string 1: äüö
Search result: No results found for äüö

Search string 2: 大
Search result: No results found for 大

Change History

Changed 5 years ago by yeahy

  • milestone set to 3.0 Alpha 3

Changed 5 years ago by bharat

  • milestone changed from 3.0 Alpha 3 to 3.0 Beta 1

Changed 5 years ago by bharat

  • milestone changed from 3.0 Beta 1 to 3.0 Beta 2

Changed 5 years ago by bharat

  • milestone changed from 3.0 Beta 1 to 3.0 Beta 2

Changed 5 years ago by tnalmdal

  • milestone changed from 3.0 Beta 2 to 3.0 Beta 3

Changed 5 years ago by bharat

  • owner set to bharat
  • status changed from new to assigned

Changed 5 years ago by andy_st

  • keywords i18n added
  • milestone changed from 3.0 Beta 3 to 3.0 RC 1

Hi jankoprowski,

I think MySQL fulltext search should work with multi-byte Unicode characters just fine, at least with MySQL 5 and later versions.
It should certainly work with Cyrillic, and there might (still) be some problems with Han characters (Chinese, Japanese, Korean, ...), where MySQL at least in prior version had problems because it wouldn't tokenize words correctly.

Please have a look at some of the discussion comments at:
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html

It'd be great if you could experiment a bit with your G3 / MySQL installation to find out what parameters are needed to make it work and we can then try to configure most of it at installation / runtime, and document the rest.

Generally, I'd look in 3 places:

  • MySQL server settings, i.e. there are fulltext (ft) settings which can be configured in your MySQL settings file. A restart of the MySQL server is necessary when changing these. And you might have to rebuild your fulltext index as well after changing these settings.
  • MySQL connection settings. When gallery 3 connects to the MySQL server, the connection has some default values for character encoding and the like. After creating the connection, the connection can be configured. The character encoding should be utf8.
  • MySQL table column collation. We need utf8_unicode_ci columns for item title, description, etc.

Please investigate and experiment the issue. I'm looking forward to hearing back from you such that we can ensure that search in Gallery 3 works just fine with multi-byte characters.

Changed 5 years ago by andy_st

  • cc andy_st added

Changed 5 years ago by andy_st

  • owner changed from bharat to andy_st

Changed 5 years ago by jankoprowski

Ok :) I try... But I'am really trying (a lot) and I still get nothing. I try on two other Unix system (until now i check this on ma xampp installation).

Greetings from Poland !

Changed 5 years ago by jankoprowski

Fulltext searching for given characters start working after set ft_min_word_len to 1.

SHOW VARIABLES LIKE 'ft_min_word_len';
ft_min_word_len 1

We set this variable adding in section [mysqld] to my.cnf

[mysqld]
ft_min_word_len=1

and restarting the server. My other configuration around this problem:

my.cnf
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
SHOW VARIABLES LIKE "character_set%";
character_set_client	utf8
character_set_connection	utf8
character_set_database	utf8
character_set_filesystem	binary
character_set_results	utf8
character_set_server	utf8
character_set_system	utf8

But I think the most significant in this case is ft_min_word_len value.
One more think. After change this value we must reindex table. For example by:

REPAIR TABLE table_name QUICK;

Changed 5 years ago by jankoprowski

  • cc jankoprowski added

Changed 4 years ago by bharat

  • milestone changed from 3.0 RC 1 to 3.0 RC 2

Changed 4 years ago by andy_st

I'd imagine ft_min_word_len=1 is quite expensive (space and time).

It looks like ft_min_word_len is the number of UTF-8 code points (~ characters).

Is there a way to tell MySQL that ft_min_word_len should be counting in bytes, not characters? That way, we could set it to 3 and avoid indexing Latin script words of length 1-2.

Either way, I guess we should add your recommendations to Gallery 3's documentation. It's a MySQL configuration setting which we should recommend for users with CJK content.

Changed 4 years ago by jankoprowski

I can't find any additional informations about another solution. Recommendation in documentation or even some kind of warning/information in database installator section sounds good. I found one workaround propositionŁ

http://bytes.com/topic/mysql/answers/77599-problem-ft_min_word_len

but I don't know is this worth to implement. Especially at this stage of project.

Changed 4 years ago by tnalmdal

  • milestone changed from 3.0 RC 2 to 3.1

Changed 17 months ago by dentizm

This is a great inspiring article.I am pretty much pleased with your good work.mantolama,
dış cephe mantolama,mantolama malzemeleri,yalıtım,ısı yalıtımı,çatı,
çatı tadilatı,izolasyon,mantolama fiyatları,ısı yalıtım malzemeleri, You put really very helpful information. best regards.

Changed 9 months ago by shadlaws

  • status changed from assigned to closed
  • resolution set to fixed

Seems to have been resolved long ago - closing.

Changed 4 months ago by madona2014

Howdy I am so excited I found your webpage, I really found you by accident, while I was searching on Yahoo for something else, Nonetheless I am here now and would just like to say thanks for a remarkable post and a all round entertaining blog (I also love the theme/design), I don’t have time to read through it all at the moment but I have bookmarked it and also added your RSS feeds, so when I have time I will be back to read a great deal more, Please do keep up the excellent job. العاب بنات

Changed 3 months ago by sabaya

Hey, you used to write fantastic, but the last several posts have been kinda boring… I miss your tremendous writings. Past few posts are just a little bit out of track! come on! العاب تلبيس

Changed 3 months ago by lizasmith

Your post is so interesting and informative. I got a lot of useful and significant information. Thank you so much. http://www.exam-collections.com/1Y0-350-vce.html - http://www.exam-collections.com/74-335-vce.html

Changed 3 months ago by maskodok


Thanks for such a great article here. I was searching for something like this for quite a long time and at last I’ve found it on your blog. It was definitely interesting for me to I read about web applications and their market situation nowadays. thanks one more time and keep posting such nice ones in the nearest future too.

Mobil Sedan Corolla | Mobil Sedan Corolla | IDRpoker.com Agen Texas Poker Online Indonesia Terpercaya | Cipto Junaedy

Note: See TracTickets for help on using tickets.