Ideas on case-insensitive full text search?

  • Christa Runge

    Christa Runge - 2012-08-12

    Currently, the full text search in karatasi is case-sensitive.
    This simplification was chosen intentional, because a case-insensitive search is hard - I have no idea how to do it.

    The key problem is to find a conversion algorithm for every string into a case-insensitive format (e.g. all letters converted to caps, or all letters converted to the small letter).
    This algorithm must work for all UNICODE letters, not only for the ASCII letters.
    It must be implementable in Objective C and in Java.

    Does anybody have an idea on this?


  • Christa Runge

    Christa Runge - 2012-08-15

    I have discussed this issue with Christian Hujer:

    - the currently planned solution to extend the database with fields of the same contents but converted to all-lower characters (see tracker #3555187) is no good idea, because introduces redundant data with the inherent risk of an inconsistent database.
    - a better solution would be to use a "folded" string in the search statement.
    TODO: find out if this is supported in SQLite.
    - the concept of "upper-case letters" versus "lower-case letters" is common in European alphabets. Asian alphabets often don't know this concept. However, in Unicode this concept is present, so a Unicode-encoded string can be converted to lower case (or to upper case) independent of the language.
    - most APIs offer functions like toUpper() / toLower() for Unicode strings.
    TODO: find out if this is true for Java, and how these functions can be used.
    TODO: find out if this is true for Objective C, and how these functions can be used.

  • Sidney

    Sidney - 2012-08-18

    For what it might be worth, I got this suggestion from an associate:

    Regarding the Karatasi question, you can search for "unicode case insensitive compare " or "unicode case folding " (where  is "objective-c" or "java") and find articles, blog entries and forum posts on how to do it.


Log in to post a comment.