From: Bernadette H. <ber...@de...> - 2009-09-01 01:30:18
|
Hi Kai, thanks for your email. Still having issues, though. I suspect we have an underlying problem somewhere which is preventing the ISOLatin1AccentFilterFactory from working. Tables in SQLyog (e.g. frsk_author) are displaying diacritics correctly, e.g. Coté, J. But in solr admin they are displaying such as Coté, J. In fez editing form, display is correct (Coté, J). In record view, all is well. In list view, all is not. If I switch solr off, all is OK everywhere. I've set JAVA_OPTS="-Dfile.encoding=UTF-8" in my env. Variables but it had no impact. Can you make any further suggestions? Regards bern From: Jauslin Kai [mailto:kai...@li...] Sent: Monday, 31 August 2009 6:54 PM To: Bernadette Houghton Cc: fez...@li... Subject: AW: Diacritics Hi Bern, We use a special filter for text fields in Solr: ISOLatin1AccentFilter (see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-4ebf7aea23b3d6d34a1f8314f9de17334a3e2fac). You need to add this to the "index" and "query" section of the fieldtype "text" in the Solr schema.xml. The "all" field should have this type. The second thing you need to make sure is that you have a perfect UTF-8 workflow. I have encountered several issues (unfortunately I did not have time to report them all back). Possibly, they are already corrected in the newest trunk version. Let's see: - MySQL connection (as you mentioned), plus: if you are upgrading the database, check that the fez_record_search_key* tables are really on utf8_unicode_ci (if not: you can change it manually with the MySQL query browser for each table). - Fulltext indexing: file class.fulltext_tools: check that it has the "-enc UTF-8" flag in the line exec(APP_PDFTOTEXT_EXEC." -enc UTF-8 -nopgbrk $filename $textfilename"); - Class.fulltext_index: check updateFulltextCache to contain the line $fulltext = utf8_encode($fulltext); You can do checks at several levels: 1. MySQL table fez_fulltext_cache: should contain correct UTF-8 - i.e. when viewing in MySQL query browser or in SQLyog. This is the source for Solr, if it's wrong here (e.g. double characters for diacritics), it will be wrong in Solr 2. Solr Admin Backend: search for pid, e.g. "eth:12345" and check the XML. This should be correct when viewing in the web browser. 3. Fez Editing Form (if correct here, it should be also correct when viewing). Make sure that your Smarty templates all have UTF-8 file encoding and UTF-8 character set. Cheers, Kai -- ETH Zürich, Kai Jauslin, ETH-Bibliothek, Prozesse und IT, Integration und Entwicklung, Rämistrasse 101, CH-8092 Zürich, Tel +41-44-6324972, Büro HG H29.5, kai...@li..., www.ethbib.ethz.ch Von: Bernadette Houghton [mailto:ber...@de...] Gesendet: Montag, 31. August 2009 03:19 An: Jauslin Kai Betreff: Diacritics Hi Kai, I note that you seem to have diacritics set up nicely at ETH - you can search with and without the diacritic character, e.g. either "hafliger" or "Häfliger" will retrieve this author. This isn't happening for us, though - we can only retrieve by searching with the diacritic. I've added the following to my.ini, as per a previous message from you on fez-users - default-character-set=utf8 collation_server=utf8_unicode_ci character_set_server=utf8 skip-character-set-client-handshake (We also have a bit of an issue with diacritics displaying with strange characters in List view, with SOLR turned on, but this seems to be another story). Any suggestions you can offer will be much appreciated. Regards bern Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: ber...@ho... Email: ber...@de...<mailto:ber...@de...> Website: http://www.deakin.edu.au <http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free |