Re: [Indic-computing-devel] Regexp and Indian languages ?
Status: Alpha
Brought to you by:
jkoshy
|
From: <jit...@nc...> - 2004-11-26 14:49:40
|
Dear Krishnamurthy Nagarajan We at janabhaaratii feel indebted to the pioneering start your efforts (indic computing develpers team in general and some of you named in email addresses here in particular) in indic computing. Under the C-DAC project janabhaaratii,funded by TDIL we wish to take this forward in colaboration and fully sharing mode. Your suggestion and ideas will be most appreciated. Kindly do give us your current coordinates(address/phones/afiliations etc.) so whenever we wish we can contact you and even invite you. Hence please also keep us informed on your current project. On our side we intend to work exclusively on GPL/LGPL software and will put up our contributions/compilations on our project website for 'free' access. Since we have just stated the project last month, our project website is under constution. But our mission statement is on our corporate website. www.cdacindia.com regards jitendra Quoting Krishnamurthy Nagarajan <kn...@ya...>: > > ----- Original message from Krishnamurthy Nagarajan <kn...@ya...> ----- > Date: Fri, 26 Nov 2004 02:08:57 -0800 (PST) > From: Krishnamurthy Nagarajan <kn...@ya...> > Reply-To: Krishnamurthy Nagarajan <kn...@ya...> > Subject: Re: [Indic-computing-devel] Regexp and Indian languages ? > To: Arun Sharma <ar...@sh...>, > ind...@li... > > Hi Arun, > > Perhaps you could take a look at the generic > transliteration library for Indian languages that I > developed quite sometime back. It's on sourceforge at > http://indic-computing.sourceforge.net/projects/miscellaneous.html > > (under 'Other infrastructural projects', as > 'translib') > > I had come up with some kind of regular expression > syntax to express the syllables in Indian words. I > developed sample transliteration rules for four > languages (Hindi, Telugu, Kannada and Tamil). > > A snippet from the ruleset for Hindi, just to raise > your curiosity : > > ^%vowel glyph(%vowel) > _%vowel glyph(%vowel) > r%cons%vowel translit(%2,%vowel) > HALF_R_POST > (%cons)a translit(%1,a) > (%cons)(A|aa) translit(%1,a) VOWEL_SIGN_AA > %cons%vowel translit(%1,a) > dep_vowel_sign(%vowel) > %cons%cons%vowel dep_cons_sign(%1) > translit(%2,%3) > ..... > > (^ is used by me to denote beginning of word, $ for > end of word, _ for forced ZWNJ etc) > > Here, the LHS corresponds to a subset of a word (a > syllable, usually) and the RHS denotes the action, to > output the glyphs or other actions (including > recursive call to the main transliteration function > translit()). One or more such sub-expressions would > constitute an input word. > > btw, I didn't use the regular Unix regexp syntax. With > the framework and syntax I developed, it's quite > feasible to write a regexp parser for Indian > languages (transliterated using US-English or even > direct UTF-8 or other forms) using such rules. > > I hope my answer is relevant to your question. > > cheers, > Nagarajan > Indic-computing project > > --- Arun Sharma <ar...@sh...> wrote: > > > So I was thinking about how one would go about using > > regular expressions > > with an Indian language while I was brushing my > > teeth this morning. > > > > The current syntax seems to be "character" oriented. > > For eg, f.o matches foo. > > However, if I want to write a regexp such as: > > > > su . la . > > > > that matches > > > > su bbu la xmi > > > > we need to introduce a new concept of a syllable > > into the regexp > > syntax. For eg: "_" might mean one syllable as > > opposed to "." which > > means one character. > ... > > > > __________________________________ > Do you Yahoo!? > The all-new My Yahoo! - Get yours free! > http://my.yahoo.com > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Indic-computing-devel mailing list > http://indic-computing.sourceforge.net/ > Ind...@li... > https://lists.sourceforge.net/lists/listinfo/indic-computing-devel > [Other Indic-Computing mailing lists available: -users, -standards, > -announce] > > --------------------------------------------------------------- This mail is sent through IMP: http://horde.org/imp/ Used as the Webmail Interface at C-DAC, Mumbai: http://www.ncst.ernet.in |