Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Ideas on case-insensitive full text search?

2012-08-12
2013-04-14
  • Christa Runge
    Christa Runge
    2012-08-12

    Currently, the full text search in karatasi is case-sensitive.
    This simplification was chosen intentional, because a case-insensitive search is hard - I have no idea how to do it.

    The key problem is to find a conversion algorithm for every string into a case-insensitive format (e.g. all letters converted to caps, or all letters converted to the small letter).
    This algorithm must work for all UNICODE letters, not only for the ASCII letters.
    It must be implementable in Objective C and in Java.

    Does anybody have an idea on this?

    Christa

     
  • Christa Runge
    Christa Runge
    2012-08-15

    I have discussed this issue with Christian Hujer:

    - the currently planned solution to extend the database with fields of the same contents but converted to all-lower characters (see tracker #3555187) is no good idea, because introduces redundant data with the inherent risk of an inconsistent database.
    - a better solution would be to use a "folded" string in the search statement.
    TODO: find out if this is supported in SQLite.
    - the concept of "upper-case letters" versus "lower-case letters" is common in European alphabets. Asian alphabets often don't know this concept. However, in Unicode this concept is present, so a Unicode-encoded string can be converted to lower case (or to upper case) independent of the language.
    - most APIs offer functions like toUpper() / toLower() for Unicode strings.
    TODO: find out if this is true for Java, and how these functions can be used.
    TODO: find out if this is true for Objective C, and how these functions can be used.

     
  • Sidney
    Sidney
    2012-08-18

    For what it might be worth, I got this suggestion from an associate:

    Regarding the Karatasi question, you can search for "unicode case insensitive compare " or "unicode case folding " (where  is "objective-c" or "java") and find articles, blog entries and forum posts on how to do it.
    Sid