[Indic-computing-devel] Re: [LIG] Regexp and Indian languages ?
Status: Alpha
Brought to you by:
jkoshy
From: Sayamindu D. <say...@cl...> - 2004-11-26 08:40:02
|
On Thu, 2004-11-25 at 16:06 -0800, Arun Sharma wrote: > So I was thinking about how one would go about using regular expressions > with an Indian language while I was brushing my teeth this morning. > > The current syntax seems to be "character" oriented. For eg, f.o matches foo. > However, if I want to write a regexp such as: > > su . la . > > that matches > > su bbu la xmi > > we need to introduce a new concept of a syllable into the regexp > syntax. For eg: "_" might mean one syllable as opposed to "." which > means one character. > > In other words "su_la_" would match subbulaxmi. This simple minded > proposal would mean that the zillions of existing regexps which use > "_" without suspecting it to be a special character would be broken. This link may be of interest http://www.unicode.org/reports/tr18/ -thanks- Sayamindu |