Share

OpenFTS

The forum address has changed, you have been automatically redirected. Please update any bookmarks to use the new URL.

Subscribe

UTF-8 support in tsearch2

You are viewing a single message from this topic. View all messages.

  1. 2005-11-20 10:09:24 UTC
    Hello,

    Just found out about tsearch2, and it is very nice: thanks to Oleg and Teodor. This is strictly about tsearch2 and postgreSQL rather than OpenFTS, so please pardon me if this is not the right forum, and thanks for pointing me to the right one.

    I am confused with the status of UTF-8 in tsearch2. I read http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2_german_utf8.html by Markus Wollny (and some other articles in czech for instance), but it left me with more questions than answers:

    - This document says that the ispell ".dict" file should encoded in UTF-8, however the linked german dictionary appears to be in Iso Latin 1.

    - The affixes dictionary (.aff) cannot be encoded in UTF-8, the regexes are not parsed correctly. Should it be encoded in Iso Latin, but include UTF-8 character descriptions? how?

    - What about the Snowball stemmers? I've only found ISO Latin code, and the german howto by Markus doesn't say much about this.

    I'm using PostgreSQL 8.1, and trying to index an UTF-8 database of French data. lexize and ts_debug always strip accented characters from my tests.

    Am I missing something? Would I be better of reencoding my database in Iso Latin?

    Thanks in advance for any hint.

    marco
< Previous | 1 | Next >

Add a Reply

This forum does not allow anonymous participation.

Log in to add a reply. Not registered? Create an account to participate and receive email updates when replies are posted to this topic.