From: Mark D. ☕ <ma...@ma...> - 2011-01-27 16:26:04
|
There is a open-source code as part of the Chrome release which you can use for Chinese segmentation. Once you have the text segmented, there is basic pinyin transliteration in ICU, which you could augment with open-source data such as from CDICT. Mark *— Il meglio è l’inimico del bene —* On Thu, Jan 27, 2011 at 07:56, Tom Bishop, Wenlin Institute < we...@we...> wrote: > > On Jan 25, 2011, at 6:15 AM, ashish yadav wrote: > > > Hi, > > > > Thank Tom for info ... > > > > Can you please help me know that how to convert Chinese characters to > PINYIN ( program logic) or any open source code / lib which provide this > conversion ? > > I don't know any open source solution; the methods we use are proprietary. > > I can describe a basic method. Use a comprehensive dictionary of Chinese > words (not only characters/monosyllables but also polysyllabic words). > Segment the text from beginning to end, always selecting the longest > matching word from the dictionary. Then segment the text from end to > beginning (i.e., backwards), again selecting the longest matching word from > the dictionary. Where the two segmentations disagree with each other, > disambiguate (e.g., by human intervention). This method will still result in > some errors; in general it is necessary for a human to make corrections, > since the correct pinyin sometimes depends on the meaning. > > Best wishes, > > Tom > > > > > > Thanks & Regards > > Ashish > > "Be a Part of Solution" > > > > On Thu, Jan 20, 2011 at 12:42 AM, Tom Bishop, Wenlin Institute < > we...@we...> wrote: > > > > On Jan 18, 2011, at 10:23 PM, ashish yadav wrote: > > > > > Hi , > > > > > > I want to sort : > > > a.Chines Characters by PINYIN, not by GB and Unicode. The main > principle is “Symbol -> Number -> Alphabet -> Chinese Character” based on > ASCII character code. > > > > This will be done badly, since in general a Chinese character has > multiple pronunciations. The correct pronunciation depends on the context, > and requires recognition of polysyllabic words. ICU does not have this > ability. > > > > I hope this information is helpful. > > > > Tom > > > > 文林 Wenlin Institute, Inc. Software for Learning Chinese > > E-mail: we...@we... Web: http://www.wenlin.com > > Telephone: 1-877-4-WENLIN (1-877-493-6546) > > ☯ > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Protect Your Site and Customers from Malware Attacks > > Learn about various malware tactics and how to avoid them. Understand > > malware threats, the impact they can have on your business, and how you > > can protect your company and customers by using code signing. > > http://p.sf.net/sfu/oracle-sfdevnl > > _______________________________________________ > > icu-support mailing list - icu...@li... > > To Un/Subscribe: > https://lists.sourceforge.net/lists/listinfo/icu-support > > > > > ------------------------------------------------------------------------------ > > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > > Finally, a world-class log management solution at an even better > price-free! > > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > > February 28th, so secure your free ArcSight Logger TODAY! > > > http://p.sf.net/sfu/arcsight-sfd2d_______________________________________________ > > icu-support mailing list - icu...@li... > > To Un/Subscribe: > https://lists.sourceforge.net/lists/listinfo/icu-support > > > 文林 Wenlin Institute, Inc. Software for Learning Chinese > E-mail: we...@we... Web: http://www.wenlin.com > Telephone: 1-877-4-WENLIN (1-877-493-6546) > ☯ > > > > > > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better > price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > icu-support mailing list - icu...@li... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support > |