> character (it definitely makes tokenizing easier).
Can it be configured for UTF-8, and not a specific sub range?
[Khoisna languages uses glyphs from a couple of different sub-ranges.]
>since the great majority of "real" texts use "'n" I'm guessing.
If the individual has not configured their keyboard to be Afrikaans,
they probably don't use the single character.
> These could all be fixed up right when the text is loaded.
Interesting idea: Rule sets to clean up text for a specific language,
before doing any grammar checking.
Does your Office Suite conform to ISO Standards?
Get latest updates about Open Source Projects, Conferences and News.