[Indic-computing-devel] Re: [LIG] Regexp and Indian languages ?
Status: Alpha
Brought to you by:
jkoshy
From: Sayamindu D. <say...@cl...> - 2004-11-27 04:56:22
|
On Fri, 2004-11-26 at 09:47 -0800, Arun Sharma wrote: > On Fri, Nov 26, 2004 at 02:09:19PM +0530, Sayamindu Dasgupta wrote: > > This link may be of interest > > http://www.unicode.org/reports/tr18/ > > Thank you! This was exactly what I was looking for. Grapheme > clusters (sec 2.2 and 3.2) seem to be meant for just this. > > > For example, an implementation could interpret "\X" as matching any > > default grapheme cluster, while interpreting "." as matching any single > > code point. It could interpret "\h" as a zero-width match against any > > grapheme cluster boundary, and "\H" as the negation of that. > > Now, are there any open source implementations of these specs for C/C++ > and Java? What about std::string and java.lang.String? They need to > have iterators to iterate over grapheme clusters as well. IBM ICU probably implements at least a subset of these specs. http://oss.software.ibm.com/icu/userguide/regexp.html There are bindings for Java, as well as C/C++ -thanks- Sayamindu |