From: Mark D. ☕ <ma...@ma...> - 2009-10-28 14:19:27
|
What might work is to put an ICU shim on top of it. Mark On Wed, Oct 28, 2009 at 00:14, Jungshik SHIN (신정식) <jsh...@gm...>wrote: > Hi, > > Google Chrome includes CLD (compact language detector). At the moment, we > just did a quick and dirty job of 'transplanting' CLD from Google's internal > source repository to Chrome's source tree. Eventually, we (Google) want to > make it a part of ICU or make it a stand-alone library, but don't have any > concrete plan, yet. > > The CLD 'APIs' can be found in the Chrome source tree at > > > http://src.chromium.org/viewvc/chrome/trunk/src/third_party/cld/bar/toolbar/cld/i18n/encodings/compact_lang_det/compact_lang_det.h?view=log > > Jungshik > > 2009/10/23 Phillips, Addison <ad...@am...> > > Hi, >> >> Not sure that this is the right posting place, but thought I'd try this >> list first. >> >> I'm working with ICU4J and currently have a need to do language >> recognition/detection on large blocks of text. The text is already in a >> well-known Unicode character encoding, so I don't need to do encoding >> detection. However, CharsetDetector is currently the only API that does >> language detection. >> >> What I'd like to propose is building a separate language detection class. >> Language detection would be useful for ICU users in general, since often the >> language of some text to process is more important than, say, the runtime >> locale. >> >> I also intend to expand the list of languages, which means acquiring more >> data and possibly a tool for compiling the data (CharsetRecog's use of >> static data in the class files isn't very modular). >> >> So my questions for this list are: >> >> 1. Does the ICU community care about this functionality? >> 2. Are other pieces (tools? Additional language data?) available already? >> 3. Is this where I should propose it and, if so, what form should a >> serious proposal take? >> >> Thanks in advance, >> >> Addison >> >> Addison Phillips >> Globalization Architect -- Lab126 >> >> Internationalization is not a feature. >> It is an architecture. >> >> >> >> ------------------------------------------------------------------------------ >> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >> is the only developer event you need to attend this year. Jumpstart your >> developing skills, take BlackBerry mobile applications to market and stay >> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >> http://p.sf.net/sfu/devconference >> _______________________________________________ >> icu-design mailing list >> icu...@li... >> To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design >> > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > icu-design mailing list > icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-design > |