[Indic-computing-devel] Re: [BSD-INDIA] Regexp and Indian languages ?
Status: Alpha
Brought to you by:
jkoshy
From: B G. <bg...@gm...> - 2004-11-26 18:18:13
|
Arun, The linguist I'd mentioned in the previous mail is Indrani Roy, and I've copied her... in case you need more info, she'd be the best person to ask... cheers BGa On Thu, 25 Nov 2004 16:06:09 -0800, Arun Sharma <ar...@sh...> wrote: > So I was thinking about how one would go about using regular expressions > with an Indian language while I was brushing my teeth this morning. > > The current syntax seems to be "character" oriented. For eg, f.o matches foo. > However, if I want to write a regexp such as: > > su . la . > > that matches > > su bbu la xmi > > we need to introduce a new concept of a syllable into the regexp > syntax. For eg: "_" might mean one syllable as opposed to "." which > means one character. > > In other words "su_la_" would match subbulaxmi. This simple minded > proposal would mean that the zillions of existing regexps which use > "_" without suspecting it to be a special character would be broken. > > This might be a good undergrad project for the linguistically inclined > (and hence the crosspost to Linux and BSD mailing lists which often get > such queries). > > If there is existing literature on this topic, I'd love to find out more. > > -Arun > _______________________________________________ > bsd-india mailing list > bsd...@bs... > http://www.bsd-india.org/mailman/listinfo/bsd-india > -- We will find a way, or we will make one - Hannibal |