From: Yingying Z. <yin...@sa...> - 2006-07-13 07:53:36
|
I had read ICU4J source code for the Unicode detection part and found it = is wholly based on BOM check. So I have some doubts here: =20 Dose the charset detection test result illustrated in the ppt include = Unicode? If includes, is the test data used are data files? Because if = the data used is a section digested from a Unicode file, it will lose = the BOM=A1=A1identifier and the ICU arithmetic might fail to detect it. = The detection accuracy can=A1=AFt be so high. =20 Is there any arithmetic illustration document of the ICU charset = detection function? I think that will be much helpful for ICU = transplantation or arithmetic reuse. =20 Any information relevant will be much appreciated. =20 =20 Regards, Zhao,Yingying =20 I18N Development Engineer SAS Research and Development Beijing Tel: +86 10 63103355-681 E-mail: yin...@sa... <mailto:yin...@sa...>=20 Web: http://www.sas.com <http://www.sas.com>=20 =20 Message: 4 =20 Date: Thu, 22 Jun 2006 17:57:56 -1000 =20 From: Eric Mader <em...@ic...> =20 Subject: Re: [icu-support] Any experience on character detection? =20 To: icu...@li... =20 Message-ID: <449...@ic...> =20 Content-Type: text/plain; charset=3DISO-8859-1; format=3Dflowed =20 =20 =20 Andy Heninger and I presented a paper about ICU's character detection at = =20 the last Unicode Conference. You can download the PowerPoint=20 =20 presentation from here: =20 http://icu.sourceforge.net/docs/papers/Automatic_Charset_Recognition_IUC2= 9.ppt =20 =20 Regards, =20 Eric Madeer =20 IBM GCoC =20 =20 =20 Yingying Zhao wrote: =20 > I am considering transplant ICU4J character detection arithmetic. = Does anybody already have experience on this? How about its detection = accuracy? =20 >=20 =20 > I am working on a task specified on character encoding detection = now. But the currently used arithmetic has very low detection accuracy, = so I want to find out that of ICU and make a comparison.=20 =20 >=20 =20 > Any information on character detection will be much appreciated. =20 >=20 =20 > Regards, =20 > Zhao,Yingying |