[Indic-computing-devel] Re: [LIG] Regexp and Indian languages ?
Status: Alpha
Brought to you by:
jkoshy
From: Arun S. <ar...@sh...> - 2004-11-26 17:47:52
|
On Fri, Nov 26, 2004 at 02:09:19PM +0530, Sayamindu Dasgupta wrote: > This link may be of interest > http://www.unicode.org/reports/tr18/ Thank you! This was exactly what I was looking for. Grapheme clusters (sec 2.2 and 3.2) seem to be meant for just this. > For example, an implementation could interpret "\X" as matching any > default grapheme cluster, while interpreting "." as matching any single > code point. It could interpret "\h" as a zero-width match against any > grapheme cluster boundary, and "\H" as the negation of that. Now, are there any open source implementations of these specs for C/C++ and Java? What about std::string and java.lang.String? They need to have iterators to iterate over grapheme clusters as well. -Arun |