perky 03/06/20 02:04:52
Modified: . CHANGES NOTES.big5
Added: . NOTES.cp932
Log:
- Tweaked some mapping for cp932 and cp950 to make more consistency
with MS Windows.
- CP932: Added single byte "UNDEFINED" characters 0x80, 0xa0, 0xfd,
0xfe, 0xff (documented on NOTES.cp932)
- CP950: Changed encode mappings to another more popular for
duplicated unicode points: 5341 -> A451, 5345 -> A4CA
- A unittest for big5 mapping is added.
- Fixed a bug that cp932 codec couldn't decode half-width katakana.
Revision Changes Path
1.3 +11 -0 cjkcodecs/CHANGES
Index: CHANGES
===================================================================
RCS file: /cvsroot/koco/cjkcodecs/CHANGES,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -r1.2 -r1.3
--- CHANGES 19 Jun 2003 19:12:58 -0000 1.2
+++ CHANGES 20 Jun 2003 09:04:52 -0000 1.3
@@ -5,3 +5,14 @@
*) Fixed a bug that JIS X 0201 routine doesn't encode and decode 0x7f.
+ *) Tweaked some mapping for cp932 and cp950 to make more consistency
+ with MS Windows.
+ - CP932: Added single byte "UNDEFINED" characters 0x80, 0xa0, 0xfd,
+ 0xfe, 0xff (documented on NOTES.cp932)
+ - CP950: Changed encode mappings to another more popular for
+ duplicated unicode points: 5341 -> A451, 5345 -> A4CA
+
+ *) A unittest for big5 mapping is added.
+
+ *) Fixed a bug that cp932 codec couldn't decode half-width katakana.
+
1.3 +11 -10 cjkcodecs/NOTES.big5
Index: NOTES.big5
===================================================================
RCS file: /cvsroot/koco/cjkcodecs/NOTES.big5,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -r1.2 -r1.3
--- NOTES.big5 19 Jun 2003 18:02:11 -0000 1.2
+++ NOTES.big5 20 Jun 2003 09:04:52 -0000 1.3
@@ -1,15 +1,16 @@
big5 codec maps the following characters as cp950 does rather than
conforming Unicode.org's that maps to 0xFFFD.
-BIG5 Unicode Description
+ BIG5 Unicode Description
-0xA15A 0x2574 SPACING UNDERSCORE
-0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE
-0xA1C5 0x02CD SPACING HEAVY UNDERSCORE
-0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT
-0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT
-0xA2CC 0x5341 HANGZHOU NUMERAL TEN
-0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY
+ 0xA15A 0x2574 SPACING UNDERSCORE
+ 0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE
+ 0xA1C5 0x02CD SPACING HEAVY UNDERSCORE
+ 0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT
+ 0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT
+ 0xA2CC 0x5341 HANGZHOU NUMERAL TEN
+ 0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY
-Because unicode 0x5341, 0x5345 is mapped to another big5 codes already,
-a roundtrip compatibility is not guaranteed for them.
+Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
+big5 codes already, a roundtrip compatibility is not guaranteed for
+them.
1.1 cjkcodecs/NOTES.cp932
Index: NOTES.cp932
===================================================================
To conform to Windows's real mapping, cp932 codec maps the following
codepoints in addition of the official cp932 mapping.
CP932 Unicode Description
0x80 0x80 UNDEFINED
0xA0 0xF8F0 UNDEFINED
0xFD 0xF8F1 UNDEFINED
0xFE 0xF8F2 UNDEFINED
0xFF 0xF8F3 UNDEFINED
|