Yup, that looks like a bug. Can you file it?
----- Original Message -----=20
From: "Tony Scerri" <tony.scerri@...>
Sent: Friday, June 17, 2005 06:16
Subject: [icu-support] Any-Latin Transliteration Problem
Apologies if this isnt the best place to be asking this, but i didnt
want to go raising a bug request incase it was something I have setup
incorrectly. I am trying to do a basic Unicode to ASCII conversion of
data, using the Java version of ICU 3.2. Whilst testing it i found the
following conversion which to me looks obviously wrong, but i'm not
sure whether this is a bug, or whether its expected behaviour when
converting the character sequence. I was running this over a 12Mb XML
file and comparing the two, and it looks good for the most part except
this little occurrence which appears a few times throughout the file.
I have limited it to the following unicode character stream.
\u003b \u0034 \u0020 \u03bc
which converts to
\u003f \u0034 \u0020 \u006d
I have tried this with the C++ version using uconv, but I have to pass
it the whole original file, it wont convert any characters at all if i
give it the short unicode sequence above. On the whole file it
produces identical output to the Java version i have written which is
at least consistent. When debugging the Java version it looked as
though one of the rule based transliterators was mistakenly replacing
the ';' (\u003b) for a '?' (\u003f).
Any help on whether this is to be expected behaviour or if people
think it is a bug would be appreciated.
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id=16492&op=3Dick
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support