koco-cvs Mailing List for Python Korean Codecs (Page 13)
Brought to you by:
perky
You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
(88) |
May
(5) |
Jun
|
Jul
(27) |
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(77) |
Feb
(3) |
Mar
|
Apr
(22) |
May
(123) |
Jun
(80) |
Jul
(83) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Hye-Shik C. <pe...@us...> - 2003-04-20 18:37:39
|
perky 03/04/20 11:37:38 Modified: . setup.py Log: Install aliases for 3rd party framework. Revision Changes Path 1.2 +19 -5 iconvcodec/setup.py Index: setup.py =================================================================== RCS file: /cvsroot/koco/iconvcodec/setup.py,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- setup.py 20 Apr 2003 18:17:57 -0000 1.1 +++ setup.py 20 Apr 2003 18:37:38 -0000 1.2 @@ -25,11 +25,12 @@ # OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF # SUCH DAMAGE. # -# $Id: setup.py,v 1.1 2003/04/20 18:17:57 perky Exp $ +# $Id: setup.py,v 1.2 2003/04/20 18:37:38 perky Exp $ # import sys from distutils.core import setup, Extension +from distutils.command.install import install include_dirs = [] library_dirs = [] @@ -59,16 +60,29 @@ sys.argv.remove(arg) if use_libiconv and not libraries: - include_dirs = ['/usr/local/include', - '/sw/include', '/usr/pkg/include'] - library_dirs = ['/usr/local/lib', '/sw/lib', - '/usr/pkg/lib'] + include_dirs = ['/usr/local/include', '/sw/include', '/usr/pkg/include'] + library_dirs = ['/usr/local/lib', '/sw/lib', '/usr/pkg/lib'] libraries = ['iconv'] +class Install(install): + def initialize_options(self): + install.initialize_options(self) + if sys.hexversion >= 0x2010000: + self.extra_path = ("iconv_codec", "import iconv_codec") + else: + self.extra_path = "iconv_codec" + def finalize_options(self): + org_install_lib = self.install_lib + install.finalize_options(self) + self.install_libbase = self.install_lib = \ + org_install_lib or self.install_purelib + setup (name = "iconvcodec", version = "1.0", author = "Hye-Shik Chang", author_email = "pe...@Fr...", + cmdclass = {'install': Install}, + py_modules = ['iconv_codec'], ext_modules = [ Extension("_iconv_codec", ["_iconv_codec.c"], |
From: Hye-Shik C. <pe...@us...> - 2003-04-20 18:17:58
|
perky 03/04/20 11:17:57 Log: Import iconv_codec from python iconv codec patch rev.3 Status: Vendor Tag: ICONVCODEC_REV3 Release Tags: PYTHON N iconvcodec/test_iconv_codec.py N iconvcodec/_iconv_codec.c N iconvcodec/COPYRIGHT N iconvcodec/setup.py N iconvcodec/AUTHORS N iconvcodec/iconv_codec.py No conflicts created by this import |
From: Hye-Shik C. <pe...@us...> - 2003-04-20 17:38:06
|
perky 03/04/20 10:38:05 Added: . AUTHORS COPYRIGHT Log: Add conventional files. Revision Changes Path 1.1 cjkcodecs/AUTHORS Index: AUTHORS =================================================================== Hye-Shik Chang <pe...@Fr...> 1.1 cjkcodecs/COPYRIGHT Index: COPYRIGHT =================================================================== $Id: COPYRIGHT,v 1.1 2003/04/20 17:38:05 perky Exp $ Copyright (C) 2003 Hye-Shik Chang. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
From: Hye-Shik C. <pe...@us...> - 2003-04-20 17:35:34
|
perky 03/04/20 10:35:33 Added: tools genmap_ja_codecs.py genmap_ko_codecs.py genmap_support.py genmap_zh_CN_codecs.py genmap_zh_TW_codecs.py Log: Import codec implementations from Multibyte Codecs patch. Revision Changes Path 1.1 cjkcodecs/tools/genmap_ja_codecs.py Index: genmap_ja_codecs.py =================================================================== # # genmap_ja_codecs.py: Japanese Codecs Map Generator # # Copyright (C) 2003 Hye-Shik Chang <pe...@Fr...>. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR # IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, # INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, # STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING # IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. # # $Id: genmap_ja_codecs.py,v 1.1 2003/04/20 17:35:32 perky Exp $ # from genmap_support import * JISX0208_C1 = (0x21, 0x74) JISX0208_C2 = (0x21, 0x7e) JISX0212_C1 = (0x22, 0x6d) JISX0212_C2 = (0x21, 0x7e) CP932P0_C1 = (0x81, 0x81) # patches between shift-jis and cp932 CP932P0_C2 = (0x5f, 0xca) CP932P1_C1 = (0x87, 0x87) # CP932 P1 CP932P1_C2 = (0x40, 0x9c) CP932P2_C1 = (0xed, 0xfc) # CP932 P2 CP932P2_C2 = (0x40, 0xfc) try: jisx0208file = open('JIS0208.TXT') except IOError: print "=>> Please download mapping table from http://www.unicode." \ "org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0208.TXT" raise SystemExit try: jisx0212file = open('JIS0212.TXT') except IOError: print "=>> Please download mapping table from http://www.unicode." \ "org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0212.TXT" raise SystemExit try: cp932file = open('CP932.TXT') except IOError: print "=>> Please download mapping table from http://www.unicode." \ "org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT" raise SystemExit omap = open('_ja_codecs.h', 'w') print "Loading Mapping File..." jisx0208datever, sjisdecmap = loadmap(jisx0208file, natcol=0, unicol=2) jisx0208datever, jisx0208decmap = loadmap(jisx0208file, natcol=1, unicol=2) jisx0212datever, jisx0212decmap = loadmap(jisx0212file) cp932datever, cp932decmap = loadmap(cp932file) sjisencmap, cp932encmap = {}, {} cp932diff = {} for c1, m in cp932decmap.items(): for c2, code in m.items(): cp932encmap[code] = (c1, c2) if sjisdecmap.has_key(c1) and sjisdecmap[c1].has_key(c2): sjisencmap[sjisdecmap[c1][c2]] = (c1, c2) if sjisdecmap[c1][c2] != code: cp932diff[(c1, c2)] = (sjisdecmap[c1][c2], code) else: del cp932decmap[c1][c2] if not cp932decmap[c1]: del cp932decmap[c1] difmap = [] for uni, (c1, c2) in cp932encmap.iteritems(): if sjisencmap.has_key(uni): s1, s2 = sjisencmap[uni] if (s1, s2) != (c1, c2): difmap.append(uni) print "Printing Copyright..." printcopyright(omap, filename='_ja_codecs.h', encodingnames='Japanese Encodings', sourcename='JISX0208.TXT/JISX0212.TXT', sourceversion='%s/%s' % (jisx0208datever, jisx0212datever)) print "Generating JIS X 0208 decode map..." genmap_decode(omap, "jisx0208_decode", JISX0208_C1, JISX0208_C2, jisx0208decmap) print "Generating JIS X 0208 decode map index..." print_decmapindex(omap, "jisx0208_decode", jisx0208decmap, rng=(0, 128)) print "Generating JIS X 0212 decode map..." genmap_decode(omap, "jisx0212_decode", JISX0212_C1, JISX0212_C2, jisx0212decmap) print "Generating JIS X 0212 decode map index..." print_decmapindex(omap, "jisx0212_decode", jisx0212decmap, rng=(0, 128)) print "Generating CP932 decode map..." genmap_decode(omap, "cp932_decode", CP932P0_C1, CP932P0_C2, cp932decmap) genmap_decode(omap, "cp932_decode", CP932P1_C1, CP932P1_C2, cp932decmap) genmap_decode(omap, "cp932_decode", CP932P2_C1, CP932P2_C2, cp932decmap) print "Generating CP932 decode map index..." print_decmapindex(omap, "cp932_decode", cp932decmap) print "Generating Constants..." for mnam in ('JISX0208', 'JISX0212', 'CP932P0', 'CP932P1', 'CP932P2'): for c in ('C1', 'C2'): mappfx = mnam + '_' + c maprange = eval(mappfx) print >> omap, "#define %-19s 0x%02x" % ( mappfx+'_BOTTOM', maprange[0]) print >> omap, "#define %-19s 0x%02x" % ( mappfx+'_TOP', maprange[1]) print "Generating CP932 Tweaks..." if difmap: print >> omap, "#define CP932_TWEAKUNIMAP(umap)", for uni in difmap: print >> omap, "\\" print >> omap, "\t(umap)[0x%02x][0x%02x] = NOCHAR;" % ( uni >> 8, uni & 0xFF), print >> omap print "\nDone!" # ex: ts=8 sts=4 et 1.1 cjkcodecs/tools/genmap_ko_codecs.py Index: genmap_ko_codecs.py =================================================================== # # genmap_ko_codecs.py: Korean Codecs Map Generator # # Copyright (C) 2003 Hye-Shik Chang <pe...@Fr...>. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR # IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, # INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, # STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING # IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. # # $Id: genmap_ko_codecs.py,v 1.1 2003/04/20 17:35:32 perky Exp $ # from genmap_support import * KSX1001_C1 = (0xa1, 0xfe) KSX1001_C2 = (0xa1, 0xfe) UHCL1_C1 = (0x81, 0xa0) UHCL1_C2 = (0x41, 0xfe) UHCL2_C1 = (0xa1, 0xfe) UHCL2_C2 = (0x41, 0xa0) try: mapfile = open('CP949.TXT') except IOError: print "=>> Please download mapping table from http://www.unicode." \ "org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP949.TXT" raise SystemExit omap = open('_ko_codecs.h', 'w') print "Loading Mapping File..." datever, decmap = loadmap(mapfile) print "Printing Copyright..." printcopyright(omap, filename='_ko_codecs.h', encodingnames='Korean Encodings', sourcename='CP949.TXT', sourceversion=datever) print "Generating KS X 1001 decode map..." genmap_decode(omap, "ksx1001_decode", KSX1001_C1, KSX1001_C2, decmap) print "Generating KS X 1001 decode map index..." print_decmapindex(omap, "ksx1001_decode", decmap) uhcdecmap = {} for c1, c2map in decmap.iteritems(): for c2 in c2map.iterkeys(): if not (c1 >= 0xa1 and c2 >= 0xa1): # uhc uhcdecmap.setdefault(c1, {}) uhcdecmap[c1][c2] = c2map[c2] print "Generating UHC Level 1 decode map..." genmap_decode(omap, "uhc_decode", UHCL1_C1, UHCL1_C2, uhcdecmap) print "Generating UHC Level 2 decode map..." genmap_decode(omap, "uhc_decode", UHCL2_C1, UHCL2_C2, uhcdecmap) print "Generating UHC decode map index..." print_decmapindex(omap, "uhc_decode", uhcdecmap) print "Generating Constants..." for mnam in ('KSX1001', 'UHCL1', 'UHCL2'): for c in ('C1', 'C2'): mappfx = mnam + '_' + c maprange = eval(mappfx) print >> omap, "#define %-19s 0x%02x" % ( mappfx+'_BOTTOM', maprange[0]) print >> omap, "#define %-19s 0x%02x" % ( mappfx+'_TOP', maprange[1]) print "\nDone!" # ex: ts=8 sts=4 et 1.1 cjkcodecs/tools/genmap_support.py Index: genmap_support.py =================================================================== # # genmap_support.py: Multibyte Codec Map Generator # # Copyright (C) 2003 Hye-Shik Chang <pe...@Fr...>. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR # IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, # INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, # STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING # IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. # # $Id: genmap_support.py,v 1.1 2003/04/20 17:35:32 perky Exp $ # import re COPYRIGHT_HEADER = """\ /* * %(filename)s * Mapping Tables for %(encodingnames)s * * Generated from %(sourcename)s as of %(sourceversion)s * $Id: genmap_support.py,v 1.1 2003/04/20 17:35:32 perky Exp $ */ """ re_UNIMAPDATE = re.compile('Date:\s*([ a-zA-Z0-9/]*)') re_UNIMAPVERSION= re.compile('Table version:\s*([0-9.]+)') def printcopyright(fo, **data): print >> fo, COPYRIGHT_HEADER % data def genmap_decode(fo, prefix, c1range, c2range, dmap, onlymask=()): c2width = c2range[1] - c2range[0] + 1 c2values = range(c2range[0], c2range[1] + 1) for c1 in range(c1range[0], c1range[1] + 1): if not dmap.has_key(c1) or (onlymask and c1 not in onlymask): continue c2map = dmap[c1] for c2 in c2values: if c2map.has_key(c2): break else: continue print >> fo, ("static const Py_UNICODE %s_%02X[%d] = {" " /* %02X::%02X-%02X */" % (prefix, c1, c2width, c1, c2range[0], c2range[1])) c2map[prefix] = True c2s = c2values[:] while c2s: dp = c2s[:8] del c2s[:8] print >> fo, ' ', ' '.join([ c2map.has_key(i) and ("0x%04x," % c2map[i]) or "UNIINV," for i in dp ]) print >> fo, "};" print >> fo def print_decmapindex(fo, prefix, fmap, f2map={}, f2mapprefix='', rng=(0x80, 0x100)): print >> fo, "static const Py_UNICODE *%s_map[128] = {" % (prefix) for i in range(*rng): if fmap.has_key(i) and fmap[i].has_key(prefix): print >> fo, " %s_%02X, /* 0x%02X */" % (prefix, i, i) elif f2map.has_key(i) and f2map[i].has_key(f2mapprefix): print >> fo, " %s_%02X, /* 0x%02X */" % (f2mapprefix, i, i) else: print >> fo, " 0, /* 0x%02X */" % i print >> fo, "};" print >> fo def loadmap(fo, sethighbit=0, natcol=0, unicol=1): fo.seek(0, 0) head = fo.read(1024) mapdatever = '%s-%s' % ( re_UNIMAPVERSION.findall(head)[0], re_UNIMAPDATE.findall(head)[0] ) if sethighbit: sethighbit = 0x80 fo.seek(0, 0) decmap = {} for line in fo: line = line.split('#', 1)[0].strip() if not line or len(line.split()) < 2: continue row = map(eval, line.split()) loc, uni = row[natcol], row[unicol] if loc >= 0x100: decmap.setdefault((loc >> 8) | sethighbit, {}) decmap[(loc >> 8)|sethighbit][(loc & 0xff)|sethighbit] = uni return mapdatever, decmap 1.1 cjkcodecs/tools/genmap_zh_CN_codecs.py Index: genmap_zh_CN_codecs.py =================================================================== # # genmap_zh_CN_codecs.py: Simplified Chinese Codecs Map Generator # # Copyright (C) 2003 Hye-Shik Chang <pe...@Fr...>. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR # IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, # INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, # STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING # IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. # # $Id: genmap_zh_CN_codecs.py,v 1.1 2003/04/20 17:35:32 perky Exp $ # from genmap_support import * GB2312_C1 = (0xa1, 0xfe) GB2312_C2 = (0xa1, 0xfe) GBKL1_C1 = (0x81, 0xa8) GBKL1_C2 = (0x40, 0xfe) GBKL2_C1 = (0xa9, 0xfe) GBKL2_C2 = (0x40, 0xa0) GB18030EXTP1_C1 = (0xa1, 0xa9) GB18030EXTP1_C2 = (0x40, 0xfe) GB18030EXTP2_C1 = (0xaa, 0xaf) GB18030EXTP2_C2 = (0xa1, 0xfe) GB18030EXTP3_C1 = (0xd7, 0xd7) GB18030EXTP3_C2 = (0xfa, 0xfe) GB18030EXTP4_C1 = (0xf8, 0xfd) GB18030EXTP4_C2 = (0xa1, 0xfe) GB18030EXTP5_C1 = (0xfe, 0xfe) GB18030EXTP5_C2 = (0x50, 0xfe) try: gb2312map = open('GB2312.TXT') except IOError: print "=>> Please download mapping table from http://www.unicode." \ "org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT" raise SystemExit try: cp936map = open('CP936.TXT') except IOError: print "=>> Please download mapping table from http://www.unicode." \ "org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT" raise SystemExit try: gb18030map = open('gb-18030-2000.xml') except IOError: print "=>> Please download mapping table from http://oss.software" \ ".ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml" raise SystemExit re_gb18030ass = re.compile('<a u="([A-F0-9]{4})" b="([0-9A-F ]+)"/>') def parse_gb18030map(fo): m, gbuni = {}, {} for i in range(65536): if i < 0xd800 or i > 0xdfff: # exclude unicode surrogate area gbuni[i] = None for uni, native in re_gb18030ass.findall(fo.read()): uni = eval('0x'+uni) native = [eval('0x'+u) for u in native.split()] if len(native) <= 2: del gbuni[uni] if len(native) == 2: # we can decode algorithmically for 1 or 4 bytes m.setdefault(native[0], {}) m[native[0]][native[1]] = uni gbuni = gbuni.keys() gbuni.sort() return m, gbuni omap = open('_zh_CN_codecs.h', 'w') print "Loading Mapping File..." gb18030decmap, gb18030unilinear = parse_gb18030map(gb18030map) datever, gbkdecmap = loadmap(cp936map) gb2312_datever, gb2312decmap = loadmap(gb2312map, 1) difmap = {} for c1, m in gbkdecmap.items(): for c2, code in m.items(): del gb18030decmap[c1][c2] if not gb18030decmap[c1]: del gb18030decmap[c1] for c1, m in gb2312decmap.items(): for c2, code in m.items(): if gbkdecmap[c1][c2] != code: difmap[(c1,c2)] = (code, gbkdecmap[c1][c2]) del gbkdecmap[c1][c2] if not gbkdecmap[c1]: del gbkdecmap[c1] print "Printing Copyright..." printcopyright(omap, filename='_zh_CN_codecs.h', encodingnames='Simplified Chinese Encodings', sourcename='CP936.TXT', sourceversion=datever) print "Generating GB2312 decode map..." genmap_decode(omap, "gb2312_decode", GB2312_C1, GB2312_C2, gb2312decmap) print "Generating GB2312 decode map index..." print_decmapindex(omap, "gb2312_decode", gb2312decmap) print "Generating GBK Level 1 decode map..." genmap_decode(omap, "gbk_decode", GBKL1_C1, GBKL1_C2, gbkdecmap) print "Generating GBK Level 2 decode map..." genmap_decode(omap, "gbk_decode", GBKL2_C1, GBKL2_C2, gbkdecmap) print "Generating GBK decode map index..." print_decmapindex(omap, "gbk_decode", gbkdecmap) print "Generating GB18030 extension plane 1 decode map..." genmap_decode(omap, "gb18030_decode", GB18030EXTP1_C1, GB18030EXTP1_C2, gb18030decmap) print "Generating GB18030 extension plane 2 decode map..." genmap_decode(omap, "gb18030_decode", GB18030EXTP2_C1, GB18030EXTP2_C2, gb18030decmap) print "Generating GB18030 extension plane 3 decode map..." genmap_decode(omap, "gb18030_decode", GB18030EXTP3_C1, GB18030EXTP3_C2, gb18030decmap) print "Generating GB18030 extension plane 4 decode map..." genmap_decode(omap, "gb18030_decode", GB18030EXTP4_C1, GB18030EXTP4_C2, gb18030decmap) print "Generating GB18030 extension plane 5 decode map..." genmap_decode(omap, "gb18030_decode", GB18030EXTP5_C1, GB18030EXTP5_C2, gb18030decmap) print "Generating GB18030 extension decode map index..." print_decmapindex(omap, "gb18030_decode", gb18030decmap) print "Generating Constants..." for mnam in ('GB2312', 'GBKL1', 'GBKL2', 'GB18030EXTP1', 'GB18030EXTP2', 'GB18030EXTP3', 'GB18030EXTP4', 'GB18030EXTP5'): for c in ('C1', 'C2'): mappfx = mnam + '_' + c maprange = eval(mappfx) print >> omap, "#define %-23s 0x%02x" % ( mappfx+'_BOTTOM', maprange[0]) print >> omap, "#define %-23s 0x%02x" % ( mappfx+'_TOP', maprange[1]) print "Generating GBK Special Map Macroes..." if difmap: print >> omap, "#define GBK_PREDECODE(dc1, dc2, assi)", elsereq = 0 for (c1, c2), (gb2312code, gbkcode) in difmap.items(): if elsereq: print >> omap, "\\\n\telse if", else: print >> omap, "\\\n\tif", elsereq = 1 print >> omap, "((dc1) == 0x%02x && (dc2) == 0x%02x) " \ "(assi) = 0x%04x;" % (c1, c2, gbkcode), print >> omap print >> omap, "#define GBK_TWEAKUNIMAP(umap)", for (c1, c2), (gb2312code, gbkcode) in difmap.items(): print >> omap, "\\" print >> omap, "\t(umap)[0x%02x][0x%02x] = 0x%02x%02x; \\" % ( gbkcode >> 8, gbkcode & 0xFF, c1, c2) if c1 != 0x20 and c2 != 0x15: print >> omap, "\t(umap)[0x%02x][0x%02x] = NOCHAR;" % ( gb2312code >> 8, gb2312code & 0xFF), print >> omap print "Generating GB18030 Unicode BMP Mapping Ranges..." ranges = [[-1, -1, -1]] gblinnum = 0 print >> omap, """ static const struct _gb18030_to_unibmp_ranges { Py_UNICODE first, last; DBCHAR base; } gb18030_to_unibmp_ranges[] = {""" for uni in gb18030unilinear: if uni == ranges[-1][1] + 1: ranges[-1][1] = uni else: ranges.append([uni, uni, gblinnum]) gblinnum += 1 for first, last, base in ranges[1:]: print >> omap, " { 0x%04x, 0x%04x, 0x%04x }," % (first, last, base) print >> omap, """\ { 0x0000, 0x0000, 0x%04x }, };""" % (ranges[-1][2] + ranges[-1][1] - ranges[-1][0] + 1) print "\nDone!" # ex: ts=8 sts=4 et 1.1 cjkcodecs/tools/genmap_zh_TW_codecs.py Index: genmap_zh_TW_codecs.py =================================================================== # # genmap_zh_TW_codecs.py: Traditional Chinese Codecs Map Generator # # Copyright (C) 2003 Hye-Shik Chang <pe...@Fr...>. # All rights reserved. # # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions # are met: # # 1. Redistributions of source code must retain the above copyright # notice, this list of conditions and the following disclaimer. # 2. Redistributions in binary form must reproduce the above copyright # notice, this list of conditions and the following disclaimer in the # documentation and/or other materials provided with the distribution. # # THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR # IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, # INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) # HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, # STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING # IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE # POSSIBILITY OF SUCH DAMAGE. # # $Id: genmap_zh_TW_codecs.py,v 1.1 2003/04/20 17:35:32 perky Exp $ # from genmap_support import * BIG5_C1 = (0xa1, 0xfe) BIG5_C2 = (0x40, 0xfe) # big5 map doesn't have 0xA3E1 (EURO SIGN), but we ignore # that for forward compatiblilty. "Hey! we have the euro-big5!" :) CP950_C1 = BIG5_C1 CP950_C2 = BIG5_C2 try: big5map = open('BIG5.TXT') except IOError: print "=>> Please download mapping table from http://www.unicode." \ "org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT" raise SystemExit try: cp950map = open('CP950.TXT') except IOError: print "=>> Please download mapping table from http://www.unicode." \ "org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT" raise SystemExit omap = open('_zh_TW_codecs.h', 'w') print "Loading Mapping File..." datever, cp950decmap = loadmap(cp950map) big5_datever, big5decmap = loadmap(big5map) difpages = {} for c1, m in cp950decmap.items(): for c2, code in m.items(): if (not big5decmap.has_key(c1) or not big5decmap[c1].has_key(c2) or big5decmap[c1][c2] != code): difpages[c1] = True for c1, m in big5decmap.items(): for c2, code in m.items(): if not cp950decmap.has_key(c1) or not cp950decmap[c1].has_key(c2): difpages[c1] = True difpages = difpages.keys() print "Printing Copyright..." printcopyright(omap, filename='_zh_TW_codecs.h', encodingnames='Traditional Chinese Encodings', sourcename='CP950.TXT', sourceversion=datever) print "Generating BIG5 decode map..." genmap_decode(omap, "big5_decode", BIG5_C1, BIG5_C2, big5decmap) print "Generating BIG5 decode map index..." print_decmapindex(omap, "big5_decode", big5decmap) print "Generating CP950 decode map..." genmap_decode(omap, "cp950_decode", BIG5_C1, BIG5_C2, cp950decmap, difpages) print "Generating CP950 decode map index..." print_decmapindex(omap, "cp950_decode", cp950decmap, big5decmap, "big5_decode") print "Generating Constants..." for mnam in ('BIG5', 'CP950'): for c in ('C1', 'C2'): mappfx = mnam + '_' + c maprange = eval(mappfx) print >> omap, "#define %-19s 0x%02x" % ( mappfx+'_BOTTOM', maprange[0]) print >> omap, "#define %-19s 0x%02x" % ( mappfx+'_TOP', maprange[1]) print "\nDone!" # ex: ts=8 sts=4 et |
From: Hye-Shik C. <pe...@us...> - 2003-04-20 17:35:33
|
perky 03/04/20 10:35:32 Added: src _ja_codecs.c _ko_codecs.c _zh_CN_codecs.c _zh_TW_codecs.c multibytecodec.c multibytecodec.h Log: Import codec implementations from Multibyte Codecs patch. Revision Changes Path 1.1 cjkcodecs/src/_ja_codecs.c Index: _ja_codecs.c =================================================================== /* * _ja_codecs.c: Japanese Codecs Implementation * * Copyright (C) 2003 Hye-Shik Chang <pe...@Fr...>. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * $Id: _ja_codecs.c,v 1.1 2003/04/20 17:35:31 perky Exp $ */ #include "Python.h" #include "multibytecodec.h" typedef unsigned short DBCHAR; #define UNIINV Py_UNICODE_REPLACEMENT_CHARACTER #define NOCHAR 0xFFFF #include "_ja_codecs.h" static DBCHAR *jisx0208_encode_map[256]; static DBCHAR *jisx0212_encode_map[256]; static DBCHAR *cp932_encode_map[256]; #define JISX0201_DECODE(c, assi) \ if ((c) < 0x5c) (assi) = (c); \ else if ((c) == 0x5c) (assi) = 0x00a5; \ else if ((c) < 0x7e) (assi) = (c); \ else if ((c) == 0x7e) (assi) = 0x203e; \ else if ((c) >= 0xa1 && (c) <= 0xdf) \ (assi) = 0xfec0 + (c); #define JISX0201_ENCODE(c, assi) \ if ((c) < 0x5c) (assi) = (c); \ else if ((c) > 0x5c && (c) < 0x7e) \ (assi) = (c); \ else if ((c) == 0x00a5) (assi) = 0x5c; \ else if ((c) == 0x203e) (assi) = 0x7e; \ else if ((c) >= 0xff61 && (c) <= 0xff9f) \ (assi) = (c) - 0xfec0; #define IN_RANGE(val, pfx) (pfx##_BOTTOM <= (val) && (val) <= pfx##_TOP) #define IN_RANGE2(c1, c2, pfx) \ (IN_RANGE(c1, pfx##_C1) && IN_RANGE(c2, pfx##_C2)) struct euc_jp_decode_state { unsigned char pending[2]; size_t pendingsize; }; /* * SHIFTJIS */ static int shiftjis_encode(PyMultibyteEncoder_Handle *hdl, PyMultibyteEncoder_Context *ctx, PyMultibyteEncoder_Buffer *buf, PyMultibyteEncoder_Error *err) { DBCHAR *map, code; while (buf->inbuf < buf->inbuf_end) { Py_UNICODE nc = *buf->inbuf; unsigned char c1, c2; JISX0201_ENCODE(nc, code) #if Py_UNICODE_SIZE == 4 else if (nc >= 0x10000) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } #endif else code = NOCHAR; if (code < 0x80 || (code >= 0xa1 && code <= 0xdf)) { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = (unsigned char)code; buf->inbuf++; continue; } if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; if (code == NOCHAR) { map = jisx0208_encode_map[nc >> 8]; if (map == NULL || (code = map[nc & 0xff]) == NOCHAR) { if (nc >= 0xe000 && nc < 0xe758) { /* user-defined area */ c1 = (Py_UNICODE)(nc - 0xe000) / 188; c2 = (Py_UNICODE)(nc - 0xe000) % 188; *buf->outbuf++ = c1 + 0xf0; *buf->outbuf++ = (c2 < 0x3f ? c2 + 0x40 : c2 + 0x41); buf->inbuf++; continue; } else { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } } } c1 = code >> 8; c2 = code & 0xff; if (IN_RANGE2(c1, c2, JISX0208)) { c2 = (((c1 - 0x21) & 1) ? 0x5e : 0) + (c2 - 0x21); c1 = (c1 - 0x21) >> 1; *buf->outbuf++ = c1 < 0x1f ? c1 + 0x81 : c1 + 0xc1; *buf->outbuf++ = c2 < 0x3f ? c2 + 0x40 : c2 + 0x41; buf->inbuf++; continue; } else { PyErr_SetString(PyExc_RuntimeError, "internal logic error"); return MBERR_INTERNAL; } } return 0; } static int shiftjis_decode_open(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { *ctx = NULL; return 0; } static int shiftjis_decode(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { Py_UNICODE code; unsigned char pending; pending = (unsigned char)(long)*ctx; *ctx = NULL; while (buf->inbuf < buf->inbuf_end) { unsigned char nc = *buf->inbuf; if (!pending) { JISX0201_DECODE(nc, code) else { pending = nc; buf->inbuf++; continue; } if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = code; buf->inbuf++; } else { if ((pending >= 0x81 && pending <= 0x9f) || (pending >= 0xe0 && pending <= 0xea)) { unsigned char c1, c2; if (nc < 0x40 || (nc > 0x7e && nc < 0x80) || nc > 0xfc) goto illegalseq; c1 = (pending < 0xe0 ? pending - 0x81 : pending - 0xc1); c2 = (nc < 0x80 ? nc - 0x40 : nc - 0x41); c1 = (2 * c1 + (c2 < 0x5e ? 0 : 1) + 0x21); c2 = (c2 < 0x5e ? c2 : c2 - 0x5e) + 0x21; if (c1 < JISX0208_C1_BOTTOM || c1 > JISX0208_C1_TOP || c2 < JISX0208_C2_BOTTOM || c2 > JISX0208_C2_TOP || jisx0208_decode_map[c1] == NULL || (code = jisx0208_decode_map[c1][ c2 - JISX0208_C2_BOTTOM]) == UNIINV) goto illegalseq; } else if (pending >= 0xf0 && pending <= 0xf9) { if ((nc >= 0x40 && nc <= 0x7e) || (nc >= 0x80 && nc <= 0xfc)) code = 0xe000 + 188 * (pending - 0xf0) + (nc < 0x80 ? nc - 0x40 : nc - 0x41); else goto illegalseq; } else goto illegalseq; if (HAS_NOT_ENOUGH_SPACE(buf, 1)) { *ctx = (PyMultibyteDecoder_Context)(long)pending; return MBERR_TOOSMALL; } *buf->outbuf++ = code; buf->inbuf++; pending = 0; continue; illegalseq: if (INBUFPOS(buf) < 1) { /* the pending character is from previous buffer */ err->object = PyMem_Malloc(2); err->objlength = 2; err->object[0] = pending; err->object[1] = nc; err->start = 0; err->end = 1; } else { SETERR_INBUF(err, buf); err->end = INBUFPOS(buf) + 1; err->start = err->end - 2; } *ctx = (PyMultibyteDecoder_Context)(long)pending; return MBERR_ILLSEQ; } } if (pending) *ctx = (PyMultibyteDecoder_Context)(long)pending; return 0; } static int shiftjis_decode_flush(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { if (*ctx != NULL) { if (INBUFPOS(buf) < 1) { /* the pending character is from the previous buffer */ err->object = PyMem_Malloc(1); err->objlength = 1; *err->object = (unsigned char)(long)*ctx; err->start = err->end = 0; /* no character on current buffer */ } else { /* we can sure that the last character on inbuf is the pending * one, here. all error situation clears dstate and it's the * only way to move the buffer cursor discontinuously. */ SETERR_INBUF(err, buf); err->end = INBUFPOS(buf); err->start = err->end - 1; } *ctx = NULL; return MBERR_TOOFEW; } return 0; } static int shiftjis_decode_reset(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { *ctx = NULL; return 0; } static PyMultibyteEncoder_Codec shiftjis_codec_encoder = { "shiftjis", /* name */ 0, /* init */ 0, /* shutdown */ 0, /* open */ 0, /* close */ shiftjis_encode, /* encode */ 0, /* flush */ 0, /* reset */ 0, /* putrepl */ }; static PyMultibyteDecoder_Codec shiftjis_codec_decoder = { "shiftjis", /* name */ 0, /* init */ 0, /* shutdown */ shiftjis_decode_open, /* open */ 0, /* close */ shiftjis_decode, /* decode */ shiftjis_decode_flush, /* flush */ shiftjis_decode_reset, /* reset */ }; /* * CP932: Microsoft extension of Shift-JIS */ static int cp932_encode(PyMultibyteEncoder_Handle *hdl, PyMultibyteEncoder_Context *ctx, PyMultibyteEncoder_Buffer *buf, PyMultibyteEncoder_Error *err) { DBCHAR *map, code; while (buf->inbuf < buf->inbuf_end) { Py_UNICODE nc = *buf->inbuf; unsigned char c1, c2; if (nc < 0x80 || (nc >= 0xff61 && nc <= 0xff9f)) { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = (unsigned char)(nc < 0x80 ? nc : nc - 0xfec0); buf->inbuf++; continue; } #if Py_UNICODE_SIZE == 4 else if (nc >= 0x10000) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } #endif else code = NOCHAR; if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; map = cp932_encode_map[nc >> 8]; if (map == NULL || (code = map[nc & 0xff]) == NOCHAR) { map = jisx0208_encode_map[nc >> 8]; if (map == NULL || (code = map[nc & 0xff]) == NOCHAR) { if (nc >= 0xe000 && nc < 0xe758) { /* user-defined area */ c1 = (Py_UNICODE)(nc - 0xe000) / 188; c2 = (Py_UNICODE)(nc - 0xe000) % 188; *buf->outbuf++ = c1 + 0xf0; *buf->outbuf++ = (c2 < 0x3f ? c2 + 0x40 : c2 + 0x41); buf->inbuf++; continue; } else { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } } c1 = code >> 8; c2 = code & 0xff; if (IN_RANGE2(c1, c2, JISX0208)) { c2 = (((c1 - 0x21) & 1) ? 0x5e : 0) + (c2 - 0x21); c1 = (c1 - 0x21) >> 1; *buf->outbuf++ = c1 < 0x1f ? c1 + 0x81 : c1 + 0xc1; *buf->outbuf++ = c2 < 0x3f ? c2 + 0x40 : c2 + 0x41; buf->inbuf++; } else { PyErr_SetString(PyExc_RuntimeError, "internal logic error"); return MBERR_INTERNAL; } } else { *buf->outbuf++ = code >> 8; *buf->outbuf++ = code & 0xff; buf->inbuf++; } } return 0; } static int cp932_decode(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { Py_UNICODE code; unsigned char pending; pending = (unsigned char)(long)*ctx; *ctx = NULL; while (buf->inbuf < buf->inbuf_end) { unsigned char nc = *buf->inbuf; if (!pending) { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; if (nc < 0x80) { *buf->outbuf++ = nc; buf->inbuf++; continue; } else if (nc >= 0xa1 && nc <= 0xdf) { *buf->outbuf++ = 0xfec0 + nc; buf->inbuf++; continue; } else { pending = nc; buf->inbuf++; continue; } } if (IN_RANGE2(pending, nc, CP932P0) && (code = cp932_decode_map[pending & 0x7f][ nc - CP932P0_C2_BOTTOM]) != UNIINV) /* yeah */; else if (IN_RANGE2(pending, nc, CP932P1) && (code = cp932_decode_map[pending & 0x7f][ nc - CP932P1_C2_BOTTOM]) != UNIINV) /* go! */; else if (IN_RANGE2(pending, nc, CP932P2) && (code = cp932_decode_map[pending & 0x7f][ nc - CP932P2_C2_BOTTOM]) != UNIINV) /* okay */; else if ((pending >= 0x81 && pending <= 0x9f) || (pending >= 0xe0 && pending <= 0xea)) { unsigned char c1, c2; if (nc < 0x40 || (nc > 0x7e && nc < 0x80) || nc > 0xfc) goto illegalseq; c1 = (pending < 0xe0 ? pending - 0x81 : pending - 0xc1); c2 = (nc < 0x80 ? nc - 0x40 : nc - 0x41); c1 = (2 * c1 + (c2 < 0x5e ? 0 : 1) + 0x21); c2 = (c2 < 0x5e ? c2 : c2 - 0x5e) + 0x21; if ((!IN_RANGE2(c1, c2, JISX0208)) || jisx0208_decode_map[c1] == NULL || (code = jisx0208_decode_map[c1][ c2 - JISX0208_C2_BOTTOM]) == UNIINV) goto illegalseq; } else if (pending >= 0xf0 && pending <= 0xf9) { if ((nc >= 0x40 && nc <= 0x7e) || (nc >= 0x80 && nc <= 0xfc)) code = 0xe000 + 188 * (pending - 0xf0) + (nc < 0x80 ? nc - 0x40 : nc - 0x41); else goto illegalseq; } else goto illegalseq; if (HAS_NOT_ENOUGH_SPACE(buf, 1)) { *ctx = (PyMultibyteDecoder_Context)(long)pending; return MBERR_TOOSMALL; } *buf->outbuf++ = code; buf->inbuf++; pending = 0; continue; illegalseq: if (INBUFPOS(buf) < 1) { /* the pending character is from previous buffer */ err->object = PyMem_Malloc(2); err->objlength = 2; err->object[0] = pending; err->object[1] = nc; err->start = 0; err->end = 1; } else { SETERR_INBUF(err, buf); err->end = INBUFPOS(buf) + 1; err->start = err->end - 2; } *ctx = (PyMultibyteDecoder_Context)(long)pending; return MBERR_ILLSEQ; } if (pending) *ctx = (PyMultibyteDecoder_Context)(long)pending; return 0; } static PyMultibyteEncoder_Codec cp932_codec_encoder = { "cp932", /* name */ 0, /* init */ 0, /* shutdown */ 0, /* open */ 0, /* close */ cp932_encode, /* encode */ 0, /* flush */ 0, /* reset */ 0, /* putrepl */ }; static PyMultibyteDecoder_Codec cp932_codec_decoder = { "cp932", /* name */ 0, /* init */ 0, /* shutdown */ shiftjis_decode_open, /* open */ 0, /* close */ cp932_decode, /* decode */ shiftjis_decode_flush, /* flush */ shiftjis_decode_reset, /* reset */ }; /* * EUC-JP */ static int euc_jp_encode(PyMultibyteEncoder_Handle *hdl, PyMultibyteEncoder_Context *ctx, PyMultibyteEncoder_Buffer *buf, PyMultibyteEncoder_Error *err) { DBCHAR *map, code; while (buf->inbuf < buf->inbuf_end) { Py_UNICODE nc = *buf->inbuf; if (nc < 0x80) { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = nc; buf->inbuf++; continue; } #if Py_UNICODE_SIZE == 4 else if (nc >= 0x10000) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } #endif /* JIS X 0208 */ map = jisx0208_encode_map[nc >> 8]; if (map != NULL && (code = map[nc & 0xff]) != NOCHAR) { if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; *buf->outbuf++ = (code >> 8) + 0x80; *buf->outbuf++ = (code & 0xff) + 0x80; buf->inbuf++; continue; } /* JIS X 0201 half-width katakana */ if (nc >= 0xff61 && nc <= 0xff9f) { if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; *buf->outbuf++ = 0x8e; *buf->outbuf++ = (unsigned char)(nc - 0xfec0); buf->inbuf++; continue; } /* JIS X 0212 */ map = jisx0212_encode_map[nc >> 8]; if (map != NULL && (code = map[nc & 0xff]) != NOCHAR) { if (HAS_NOT_ENOUGH_SPACE(buf, 3)) return MBERR_TOOSMALL; *buf->outbuf++ = 0x8f; *buf->outbuf++ = (code >> 8) + 0x80; *buf->outbuf++ = (code & 0xff) + 0x80; buf->inbuf++; continue; } /* user-defined area */ if (nc >= 0xe000 && nc < 0xe758) { if (nc < 0xe3ac) { if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; *buf->outbuf++ = (Py_UNICODE)(nc - 0xe000) / 94 + 0xf5; *buf->outbuf++ = (Py_UNICODE)(nc - 0xe000) % 94 + 0xa1; } else { if (HAS_NOT_ENOUGH_SPACE(buf, 3)) return MBERR_TOOSMALL; *buf->outbuf++ = 0x8f; *buf->outbuf++ = (Py_UNICODE)(nc - 0xe3ac) / 94 + 0xf5; *buf->outbuf++ = (Py_UNICODE)(nc - 0xe3ac) % 94 + 0xa1; } buf->inbuf++; continue; } SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } return 0; } static int euc_jp_decode_open(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { struct euc_jp_decode_state *state; state = PyMem_New(struct euc_jp_decode_state, 1); if (state == NULL) return -1; state->pendingsize = 0; *ctx = state; return 0; } static void euc_jp_decode_close(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { PyMem_Del(*ctx); } static int euc_jp_decode(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { struct euc_jp_decode_state *state = *ctx; Py_UNICODE code; while (buf->inbuf < buf->inbuf_end) { unsigned char nc = *buf->inbuf; if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; switch (state->pendingsize) { case 0: if (nc < 0x80) { *buf->outbuf++ = nc; buf->inbuf++; } else { state->pending[0] = nc; state->pendingsize = 1; buf->inbuf++; } break; case 1: if (0xa1 <= state->pending[0] && state->pending[0] < 0xff) { if (state->pending[0] < 0xf5) { /* JIS X 0208 */ unsigned char c1, c2; c1 = state->pending[0] - 0x80; c2 = nc - 0x80; if (IN_RANGE2(c1, c2, JISX0208) && jisx0208_decode_map[c1] != NULL && (code = jisx0208_decode_map[c1][ c2 - JISX0208_C2_BOTTOM]) != UNIINV) { *buf->outbuf++ = code; buf->inbuf++; state->pendingsize = 0; } else goto illegalseq; } else { /* 2bytes user-defined area */ if (0xa1 <= nc && nc < 0xff) { *buf->outbuf++ = 0xe000 + 94 * ( state->pending[0] - 0xf5) + (nc - 0xa1); buf->inbuf++; state->pendingsize = 0; } else goto illegalseq; } } else if (state->pending[0] == 0x8e) { /* half-width katakana */ if (nc >= 0xa1 && nc <= 0xdf) { *buf->outbuf++ = 0xfec0 + nc; buf->inbuf++; state->pendingsize = 0; } else goto illegalseq; } else if (state->pending[0] == 0x8f) { /* 3-bytes seq */ buf->inbuf++; state->pending[1] = nc; state->pendingsize = 2; } else goto illegalseq; break; case 2: assert(state->pending[0] == 0x8f); if (0xa1 <= state->pending[1] && state->pending[1] < 0xff) { if (state->pending[1] < 0xf5) { /* JIS X 0212 */ unsigned char c1, c2; c1 = state->pending[1] - 0x80; c2 = nc - 0x80; if (IN_RANGE2(c1, c2, JISX0212) && jisx0212_decode_map[c1] != NULL && (code = jisx0212_decode_map[c1][ c2 - JISX0212_C2_BOTTOM]) != UNIINV) { *buf->outbuf++ = code; buf->inbuf++; state->pendingsize = 0; } else goto illegalseq; } else { /* 3bytes user-defined area */ if (0xa1 <= nc && nc < 0xff) { *buf->outbuf++ = 0xe3ac + 94 * ( state->pending[1] - 0xf5) + (nc - 0xa1); buf->inbuf++; state->pendingsize = 0; } else goto illegalseq; } } else goto illegalseq; break; default: PyErr_SetString(PyExc_RuntimeError, "internal logic error"); return MBERR_INTERNAL; } continue; illegalseq: if (INBUFPOS(buf) < state->pendingsize) { err->objlength = state->pendingsize + 1; err->object = PyMem_Malloc(err->objlength); if (err->object == NULL) return MBERR_INTERNAL; memcpy(err->object, state->pending, state->pendingsize); err->object[state->pendingsize] = nc; err->start = 0; err->end = INBUFPOS(buf) + 1; } else { SETERR_INBUF(err, buf); err->end = INBUFPOS(buf) + 1; err->start = err->end - 1 - state->pendingsize; } return MBERR_ILLSEQ; } return 0; } static int euc_jp_decode_flush(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { struct euc_jp_decode_state *state = *ctx; if (state->pendingsize > 0) { if (INBUFPOS(buf) < state->pendingsize) { err->objlength = state->pendingsize; err->object = PyMem_Malloc(err->objlength); if (err->object == NULL) return MBERR_INTERNAL; memcpy(err->object, state->pending, state->pendingsize); err->start = 0; err->end = INBUFPOS(buf); } else { SETERR_INBUF(err, buf); err->end = INBUFPOS(buf); err->start = err->end - state->pendingsize; } return MBERR_TOOFEW; } return 0; } static int euc_jp_decode_reset(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { struct euc_jp_decode_state *state = *ctx; state->pendingsize = 0; return 0; } static PyMultibyteEncoder_Codec euc_jp_codec_encoder = { "euc_jp", /* name */ 0, /* init */ 0, /* shutdown */ 0, /* open */ 0, /* close */ euc_jp_encode, /* encode */ 0, /* flush */ 0, /* reset */ 0, /* putrepl */ }; static PyMultibyteDecoder_Codec euc_jp_codec_decoder = { "euc_jp", /* name */ 0, /* init */ 0, /* shutdown */ euc_jp_decode_open, /* open */ euc_jp_decode_close, /* close */ euc_jp_decode, /* decode */ euc_jp_decode_flush, /* flush */ euc_jp_decode_reset, /* reset */ }; static int build_encode_map(DBCHAR **encmap, const Py_UNICODE **decmap, unsigned char c1bottom, unsigned char c1top, unsigned char c2bottom, unsigned char c2top) { unsigned char c1, c2; const Py_UNICODE *umap; for (c1 = c1bottom; c1 <= c1top; c1++) { umap = decmap[c1 & 0x7f]; if (umap == NULL) continue; for (c2 = c2bottom; c2 <= c2top; c2++) { Py_UNICODE uni; int upage, i; uni = umap[c2 - c2bottom]; if (uni == UNIINV) continue; upage = uni >> 8; if (encmap[upage] == NULL) { encmap[upage] = PyMem_New(DBCHAR, 256); if (encmap[upage] == NULL) return -1; for (i = 0; i <= 255; i++) encmap[upage][i] = NOCHAR; } if (encmap[upage][uni & 0xff] == NOCHAR) encmap[upage][uni & 0xff] = c1 << 8 | c2; } } return 0; } static int init_maps(void) { int i; for (i = 0; i < 256; i++) jisx0208_encode_map[i] = jisx0212_encode_map[i] = cp932_encode_map[i] = NULL; #define BUILD_MAP(emap, dmap, pfx) \ build_encode_map(emap##_encode_map, dmap##_decode_map, \ pfx##_C1_BOTTOM, pfx##_C1_TOP, \ pfx##_C2_BOTTOM, pfx##_C2_TOP) if (BUILD_MAP(jisx0208, jisx0208, JISX0208) || BUILD_MAP(jisx0212, jisx0212, JISX0212) || BUILD_MAP(cp932, cp932, CP932P0) || BUILD_MAP(cp932, cp932, CP932P1) || BUILD_MAP(cp932, cp932, CP932P2)) { for (i = 0; i < 256; i++) { if (jisx0208_encode_map[i] != NULL) PyMem_Del(jisx0208_encode_map[i]); if (jisx0212_encode_map[i] != NULL) PyMem_Del(jisx0212_encode_map[i]); if (cp932_encode_map[i] != NULL) PyMem_Del(cp932_encode_map[i]); } return -1; } #undef BUILD_MAP /* resolve duplicated mappings between jisx0208 and cp932 */ CP932_TWEAKUNIMAP(cp932_encode_map) return 0; } static struct PyMethodDef _ja_codecs_methods[] = { {NULL, NULL}, }; void init_ja_codecs(void) { PyObject *m; m = Py_InitModule("_ja_codecs", _ja_codecs_methods); PyModule_AddObject(m, "shiftjis_encode", _PyMultibyteEncoder_Create(&shiftjis_codec_encoder, "shiftjis")); PyModule_AddObject(m, "shiftjis_decode", _PyMultibyteDecoder_Create(&shiftjis_codec_decoder, "shiftjis")); PyModule_AddObject(m, "cp932_encode", _PyMultibyteEncoder_Create(&cp932_codec_encoder, "cp932")); PyModule_AddObject(m, "cp932_decode", _PyMultibyteDecoder_Create(&cp932_codec_decoder, "cp932")); PyModule_AddObject(m, "euc_jp_encode", _PyMultibyteEncoder_Create(&euc_jp_codec_encoder, "euc_jp")); PyModule_AddObject(m, "euc_jp_decode", _PyMultibyteDecoder_Create(&euc_jp_codec_decoder, "euc_jp")); if (PyErr_Occurred() || init_maps()) Py_FatalError("can't initialize the _ja_codecs module"); } /* * ex: ts=8 sts=4 et */ 1.1 cjkcodecs/src/_ko_codecs.c Index: _ko_codecs.c =================================================================== /* * _ko_codecs.c: Korean Codecs Implementation * * Copyright (C) 2003 Hye-Shik Chang <pe...@Fr...>. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * $Id: _ko_codecs.c,v 1.1 2003/04/20 17:35:31 perky Exp $ */ #include "Python.h" #include "multibytecodec.h" typedef unsigned short DBCHAR; #define UNIINV Py_UNICODE_REPLACEMENT_CHARACTER #define NOCHAR 0x0000 #include "_ko_codecs.h" static DBCHAR *ksx1001_encode_map[256]; static DBCHAR *uhc_encode_map[256]; /* * EUC-KR: KS X 1001:1998 */ static int euc_kr_encode(PyMultibyteEncoder_Handle *hdl, PyMultibyteEncoder_Context *ctx, PyMultibyteEncoder_Buffer *buf, PyMultibyteEncoder_Error *err) { DBCHAR *map, code; while (buf->inbuf < buf->inbuf_end) { Py_UNICODE nc = *buf->inbuf; if (nc < 0x80) { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = (unsigned char)nc; buf->inbuf++; continue; } #if Py_UNICODE_SIZE == 4 else if (nc >= 0x10000) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; /* all characters of ks x 1001 are included in BMP. */ } #endif map = ksx1001_encode_map[nc >> 8]; if (map == NULL || (code = map[nc & 0xff]) == NOCHAR) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; *buf->outbuf++ = code >> 8; *buf->outbuf++ = code & 0xFF; buf->inbuf++; } return 0; } static int euc_kr_decode_open(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { *ctx = NULL; return 0; } static int euc_kr_decode(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { Py_UNICODE code; unsigned char pending; pending = (unsigned char)(long)*ctx; *ctx = NULL; while (buf->inbuf < buf->inbuf_end) { unsigned char nc = *buf->inbuf; if (!pending) { if (nc & 0x80) { pending = nc; buf->inbuf++; } else { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = nc; buf->inbuf++; } } else { if (nc < KSX1001_C2_BOTTOM || nc > KSX1001_C2_TOP || ksx1001_decode_map[pending & 0x7F] == NULL || (code = ksx1001_decode_map[pending & 0x7F][ nc - KSX1001_C2_BOTTOM]) == UNIINV) { if (INBUFPOS(buf) < 1) { /* the pending character is from previous buffer */ err->object = PyMem_Malloc(2); err->objlength = 2; err->object[0] = pending; err->object[1] = nc; /* huh? characters from current buffer only! */ err->start = 0; err->end = 1; } else { SETERR_INBUF(err, buf); err->end = INBUFPOS(buf) + 1; err->start = err->end - 2; } *ctx = (PyMultibyteDecoder_Context)(long)pending; if (pending < KSX1001_C1_BOTTOM || nc < KSX1001_C2_BOTTOM) return MBERR_ILLSEQ; else return MBERR_UNDEFINED; } if (HAS_NOT_ENOUGH_SPACE(buf, 1)) { *ctx = (PyMultibyteDecoder_Context)(long)pending; return MBERR_TOOSMALL; } *buf->outbuf++ = code; buf->inbuf++; pending = 0; } } if (pending) *ctx = (PyMultibyteDecoder_Context)(long)pending; return 0; } static int euc_kr_decode_flush(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { if (*ctx != NULL) { if (INBUFPOS(buf) < 1) { /* the pending character is from the previous buffer */ err->object = PyMem_Malloc(1); err->objlength = 1; *err->object = (unsigned char)(long)*ctx; err->start = err->end = 0; /* no character on current buffer */ } else { /* we can sure that the last character on inbuf is the pending * one, here. all error situation clears dstate and it's the * only way to move the buffer cursor discontinuously. */ SETERR_INBUF(err, buf); err->end = INBUFPOS(buf); err->start = err->end - 1; } *ctx = NULL; return MBERR_TOOFEW; } return 0; } static int euc_kr_decode_reset(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { *ctx = NULL; return 0; } static PyMultibyteEncoder_Codec euc_kr_codec_encoder = { "euc_kr", /* name */ 0, /* init */ 0, /* shutdown */ 0, /* open */ 0, /* close */ euc_kr_encode, /* encode */ 0, /* flush */ 0, /* reset */ 0, /* putrepl */ }; static PyMultibyteDecoder_Codec euc_kr_codec_decoder = { "euc_kr", /* name */ 0, /* init */ 0, /* shutdown */ euc_kr_decode_open, /* open */ 0, /* close */ euc_kr_decode, /* decode */ euc_kr_decode_flush, /* flush */ euc_kr_decode_reset, /* reset */ }; /* * CP949: Microsoft CodePage 949, a.k.a. Unified Hangul Code */ static int cp949_encode(PyMultibyteEncoder_Handle *hdl, PyMultibyteEncoder_Context *ctx, PyMultibyteEncoder_Buffer *buf, PyMultibyteEncoder_Error *err) { DBCHAR *map, code; while (buf->inbuf < buf->inbuf_end) { Py_UNICODE nc = *buf->inbuf; if (nc < 0x80) { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = (unsigned char)nc; buf->inbuf++; continue; } #if Py_UNICODE_SIZE == 4 else if (nc >= 0x10000) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; /* all characters of ks x 1001 are included in BMP. */ } #endif map = ksx1001_encode_map[nc >> 8]; if (map == NULL || (code = map[nc & 0xff]) == NOCHAR) { map = uhc_encode_map[nc >> 8]; if (map == NULL || (code = map[nc & 0xff]) == NOCHAR) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } } if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; *buf->outbuf++ = code >> 8; *buf->outbuf++ = code & 0xFF; buf->inbuf++; } return 0; } static int cp949_decode(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { Py_UNICODE code; unsigned char pending; pending = (unsigned char)(long)*ctx; *ctx = NULL; while (buf->inbuf < buf->inbuf_end) { unsigned char nc = *buf->inbuf; if (!pending) { if (nc & 0x80) { pending = nc; buf->inbuf++; } else { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = nc; buf->inbuf++; } } else { code = UNIINV; if (pending >= KSX1001_C1_BOTTOM && pending <= KSX1001_C1_TOP && ksx1001_decode_map[pending & 0x7F] != NULL && nc >= KSX1001_C2_BOTTOM && nc <= KSX1001_C2_TOP) /* ks x 1001 */ code = ksx1001_decode_map[pending & 0x7F][ nc - KSX1001_C2_BOTTOM]; else if (pending >= UHCL1_C1_BOTTOM && pending <= UHCL1_C1_TOP && uhc_decode_map[pending & 0x7F] != NULL && nc >= UHCL1_C2_BOTTOM && nc <= UHCL1_C2_TOP) /* uhc level 1 */ code = uhc_decode_map[pending & 0x7F][ nc - UHCL1_C2_BOTTOM]; else if (pending >= UHCL2_C1_BOTTOM && pending <= UHCL2_C1_TOP && uhc_decode_map[pending & 0x7F] != NULL && nc >= UHCL2_C2_BOTTOM && nc <= UHCL2_C2_TOP) /* uhc level 2 */ code = uhc_decode_map[pending & 0x7F][ nc - UHCL2_C2_BOTTOM]; if (code == UNIINV) { if (INBUFPOS(buf) < 1) { /* the pending character is from previous buffer */ err->object = PyMem_Malloc(2); err->objlength = 2; err->object[0] = pending; err->object[1] = nc; /* huh? characters from current buffer only! */ err->start = 0; err->end = 1; } else { SETERR_INBUF(err, buf); err->end = INBUFPOS(buf) + 1; err->start = err->end - 2; } *ctx = (PyMultibyteDecoder_Context)(long)pending; /* unlike euc-kr, * cp949 has complete map region when high bit is set */ return MBERR_UNDEFINED; } if (HAS_NOT_ENOUGH_SPACE(buf, 1)) { *ctx = (PyMultibyteDecoder_Context)(long)pending; return MBERR_TOOSMALL; } *buf->outbuf++ = code; buf->inbuf++; pending = 0; } } if (pending) *ctx = (PyMultibyteDecoder_Context)(long)pending; return 0; } static PyMultibyteEncoder_Codec cp949_codec_encoder = { "cp949", /* name */ 0, /* init */ 0, /* shutdown */ 0, /* open */ 0, /* close */ cp949_encode, /* encode */ 0, /* flush */ 0, /* reset */ 0, /* putrepl */ }; static PyMultibyteDecoder_Codec cp949_codec_decoder = { "cp949", /* name */ 0, /* init */ 0, /* shutdown */ euc_kr_decode_open, /* open */ 0, /* close */ cp949_decode, /* decode */ euc_kr_decode_flush, /* flush */ euc_kr_decode_reset, /* reset */ }; static int build_encode_map(DBCHAR **encmap, const Py_UNICODE **decmap, unsigned char c1bottom, unsigned char c1top, unsigned char c2bottom, unsigned char c2top) { unsigned char c1, c2; const Py_UNICODE *umap; for (c1 = c1bottom; c1 <= c1top; c1++) { umap = decmap[c1 & 0x7f]; if (umap == NULL) continue; for (c2 = c2bottom; c2 <= c2top; c2++) { Py_UNICODE uni; int upage, i; uni = umap[c2 - c2bottom]; if (uni == UNIINV) continue; upage = uni >> 8; if (encmap[upage] == NULL) { encmap[upage] = PyMem_New(DBCHAR, 256); if (encmap[upage] == NULL) return -1; for (i = 0; i <= 255; i++) encmap[upage][i] = NOCHAR; } encmap[upage][uni & 0xff] = c1 << 8 | c2; } } return 0; } static int init_maps(void) { int i; for (i = 0; i < 256; i++) ksx1001_encode_map[i] = uhc_encode_map[i] = NULL; if (build_encode_map(ksx1001_encode_map, ksx1001_decode_map, KSX1001_C1_BOTTOM, KSX1001_C1_TOP, KSX1001_C2_BOTTOM, KSX1001_C2_TOP) || build_encode_map(uhc_encode_map, uhc_decode_map, UHCL1_C1_BOTTOM, UHCL1_C1_TOP, UHCL1_C2_BOTTOM, UHCL1_C2_TOP) || build_encode_map(uhc_encode_map, uhc_decode_map, UHCL2_C1_BOTTOM, UHCL2_C1_TOP, UHCL2_C2_BOTTOM, UHCL2_C2_TOP)) { /* memory error */ for (i = 0; i < 256; i++) { if (ksx1001_encode_map[i] != NULL) PyMem_Del(ksx1001_encode_map[i]); if (uhc_encode_map[i] != NULL) PyMem_Del(uhc_encode_map[i]); } return -1; } return 0; } static struct PyMethodDef _ko_codecs_methods[] = { {NULL, NULL}, }; void init_ko_codecs(void) { PyObject *m; m = Py_InitModule("_ko_codecs", _ko_codecs_methods); PyModule_AddObject(m, "euc_kr_encode", _PyMultibyteEncoder_Create(&euc_kr_codec_encoder, "euc_kr")); PyModule_AddObject(m, "euc_kr_decode", _PyMultibyteDecoder_Create(&euc_kr_codec_decoder, "euc_kr")); PyModule_AddObject(m, "cp949_encode", _PyMultibyteEncoder_Create(&cp949_codec_encoder, "cp949")); PyModule_AddObject(m, "cp949_decode", _PyMultibyteDecoder_Create(&cp949_codec_decoder, "cp949")); if (PyErr_Occurred() || init_maps()) Py_FatalError("can't initialize the _ko_codecs module"); } /* * ex: ts=8 sts=4 et */ 1.1 cjkcodecs/src/_zh_CN_codecs.c Index: _zh_CN_codecs.c =================================================================== /* * _zh_CN_codecs.c: Simplified Chinese Codecs Implementation * * Copyright (C) 2003 Hye-Shik Chang <pe...@Fr...>. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * $Id: _zh_CN_codecs.c,v 1.1 2003/04/20 17:35:31 perky Exp $ */ #include "Python.h" #include "multibytecodec.h" typedef unsigned short DBCHAR; #define UNIINV Py_UNICODE_REPLACEMENT_CHARACTER #define NOCHAR 0x0000 #include "_zh_CN_codecs.h" static DBCHAR *gb2312_encode_map[256]; static DBCHAR *gbk_encode_map[256]; static DBCHAR *gb18030_encode_map[256]; struct gb18030dec_state { unsigned char pending[4]; size_t pendingsize; }; #define DBCS_DECODE(c1, c2, pfx, map, ass) \ if ((c1) >= pfx##_C1_BOTTOM && \ (c1) <= pfx##_C1_TOP && \ (map)[(c1) & 0x7f] != NULL && \ (c2) >= pfx##_C2_BOTTOM && \ (c2) <= pfx##_C2_TOP && \ ((ass) = (map)[(c1) & 0x7f][(c2) - pfx##_C2_BOTTOM]) \ != UNIINV) ; /* * EUC (the most popular) instance of GB2312 */ static int gb2312_encode(PyMultibyteEncoder_Handle *hdl, PyMultibyteEncoder_Context *ctx, PyMultibyteEncoder_Buffer *buf, PyMultibyteEncoder_Error *err) { DBCHAR *map, code; while (buf->inbuf < buf->inbuf_end) { Py_UNICODE nc = *buf->inbuf; if (nc < 0x80) { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = (unsigned char)nc; buf->inbuf++; continue; } #if Py_UNICODE_SIZE == 4 else if (nc >= 0x10000) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; /* all characters of gb2312 are included in BMP. */ } #endif map = gb2312_encode_map[nc >> 8]; if (map == NULL || (code = map[nc & 0xff]) == NOCHAR) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; *buf->outbuf++ = code >> 8; *buf->outbuf++ = code & 0xFF; buf->inbuf++; } return 0; } static int gb2312_decode_open(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { *ctx = NULL; return 0; } static int gb2312_decode(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { Py_UNICODE code; unsigned char pending; pending = (unsigned char)(long)*ctx; *ctx = NULL; while (buf->inbuf < buf->inbuf_end) { unsigned char nc = *buf->inbuf; if (!pending) { if (nc & 0x80) { pending = nc; buf->inbuf++; } else { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = nc; buf->inbuf++; } } else { if (nc < GB2312_C2_BOTTOM || nc > GB2312_C2_TOP || gb2312_decode_map[pending & 0x7F] == NULL || (code = gb2312_decode_map[pending & 0x7F][ nc - GB2312_C2_BOTTOM]) == UNIINV) { if (INBUFPOS(buf) < 1) { /* the pending character is from previous buffer */ err->object = PyMem_Malloc(2); err->objlength = 2; err->object[0] = pending; err->object[1] = nc; /* huh? characters from current buffer only! */ err->start = 0; err->end = 1; } else { SETERR_INBUF(err, buf); err->end = INBUFPOS(buf) + 1; err->start = err->end - 2; } *ctx = (PyMultibyteDecoder_Context)(long)pending; if (pending < GB2312_C1_BOTTOM || nc < GB2312_C2_BOTTOM) return MBERR_ILLSEQ; else return MBERR_UNDEFINED; } if (HAS_NOT_ENOUGH_SPACE(buf, 1)) { *ctx = (PyMultibyteDecoder_Context)(long)pending; return MBERR_TOOSMALL; } *buf->outbuf++ = code; buf->inbuf++; pending = 0; } } if (pending) *ctx = (PyMultibyteDecoder_Context)(long)pending; return 0; } static int gb2312_decode_flush(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { if (*ctx != NULL) { if (INBUFPOS(buf) < 1) { /* the pending character is from the previous buffer */ err->object = PyMem_Malloc(1); err->objlength = 1; *err->object = (unsigned char)(long)*ctx; err->start = err->end = 0; /* no character on current buffer */ } else { /* we can sure that the last character on inbuf is the pending * one, here. all error situation clears dstate and it's the * only way to move the buffer cursor discontinuously. */ SETERR_INBUF(err, buf); err->end = INBUFPOS(buf); err->start = err->end - 1; } *ctx = NULL; return MBERR_TOOFEW; } return 0; } static int gb2312_decode_reset(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx) { *ctx = NULL; return 0; } static PyMultibyteEncoder_Codec gb2312_codec_encoder = { "gb2312", /* name */ 0, /* init */ 0, /* shutdown */ 0, /* open */ 0, /* close */ gb2312_encode, /* encode */ 0, /* flush */ 0, /* reset */ 0, /* putrepl */ }; static PyMultibyteDecoder_Codec gb2312_codec_decoder = { "gb2312", /* name */ 0, /* init */ 0, /* shutdown */ gb2312_decode_open, /* open */ 0, /* close */ gb2312_decode, /* decode */ gb2312_decode_flush, /* flush */ gb2312_decode_reset, /* reset */ }; /* * CP936: Microsoft CodePage 936, a.k.a. GBK * * - GBK is backward compatible to gb2312 and incorporated Big5, * GB12345 and GB13000 characters. */ static int cp936_encode(PyMultibyteEncoder_Handle *hdl, PyMultibyteEncoder_Context *ctx, PyMultibyteEncoder_Buffer *buf, PyMultibyteEncoder_Error *err) { DBCHAR *map, code; while (buf->inbuf < buf->inbuf_end) { Py_UNICODE nc = *buf->inbuf; if (nc < 0x80) { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = (unsigned char)nc; buf->inbuf++; continue; } #if Py_UNICODE_SIZE == 4 else if (nc >= 0x10000) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; /* all characters of cp936 are included in BMP. */ } #endif map = gbk_encode_map[nc >> 8]; if (map == NULL || (code = map[nc & 0xff]) == NOCHAR) { SETERR_INBUF(err, buf); err->start = INBUFPOS(buf); err->end = err->start + 1; return MBERR_UNDEFINED; } if (HAS_NOT_ENOUGH_SPACE(buf, 2)) return MBERR_TOOSMALL; *buf->outbuf++ = code >> 8; *buf->outbuf++ = code & 0xFF; buf->inbuf++; } return 0; } static int cp936_decode(PyMultibyteDecoder_Handle *hdl, PyMultibyteDecoder_Context *ctx, PyMultibyteDecoder_Buffer *buf, PyMultibyteDecoder_Error *err) { Py_UNICODE code; unsigned char pending; pending = (unsigned char)(long)*ctx; *ctx = NULL; while (buf->inbuf < buf->inbuf_end) { unsigned char nc = *buf->inbuf; if (!pending) { if (nc & 0x80) { pending = nc; buf->inbuf++; } else { if (HAS_NOT_ENOUGH_SPACE(buf, 1)) return MBERR_TOOSMALL; *buf->outbuf++ = nc; buf->inbuf++; } } else { code = UNIINV; GBK_PREDECODE(pending, nc, code) else DBCS_DECODE(pending, nc, GB2312, gb2312_decode_map, code) else DBCS_DECODE(pending, nc, GBKL1, gbk_decode_map, code) else DBCS_DECODE(pending, nc, GBKL2, gbk_decode_map, code) if (code == UNIINV) { if (INBUFPOS(buf) < 1) { /* the pending character is from previous buffer */ err->object = PyMem_Malloc(2); err->objlength = 2; err->object[0] = pending; err->object[1] = nc; /* huh? characters from current buffer only! */ err->start = 0; err->end = 1; } else { SETERR_INBUF(err, buf); err->end = INBUFPOS(buf) + 1; err->start = err->end - 2; } *ctx = (PyMultibyteDecoder_Context)(long)pending; /* unlike gb2312, * cp936 has c... [truncated message content] |
From: Hye-Shik C. <pe...@us...> - 2003-04-20 17:34:34
|
perky 03/04/20 10:34:33 cjkcodecs/tools - New directory |
From: Hye-Shik C. <pe...@us...> - 2003-04-20 17:34:34
|
perky 03/04/20 10:34:33 cjkcodecs/src - New directory |
From: Hye-Shik C. <pe...@us...> - 2003-04-20 17:23:48
|
perky 03/04/20 10:23:46 Log: Import CJK codecs from python multibytecodecs patch. Status: Vendor Tag: PYMULTIBYTECODEC Release Tags: PYTHON N cjkcodecs/setup.py N cjkcodecs/cjkcodecs/big5.py N cjkcodecs/cjkcodecs/cp932.py N cjkcodecs/cjkcodecs/cp936.py N cjkcodecs/cjkcodecs/cp949.py N cjkcodecs/cjkcodecs/cp950.py N cjkcodecs/cjkcodecs/euc_jp.py N cjkcodecs/cjkcodecs/euc_kr.py N cjkcodecs/cjkcodecs/gb18030.py N cjkcodecs/cjkcodecs/gb2312.py N cjkcodecs/cjkcodecs/shiftjis.py N cjkcodecs/cjkcodecs/__init__.py No conflicts created by this import |
From: Hye-Shik C. <pe...@us...> - 2003-02-22 08:31:19
|
perky 03/02/22 00:31:18 Modified: korean qwerty2bul.py Log: Fix a bug that duplicates after non-hangul character following non-completed character. Spotted by: Kwon Soon-Kook <ne...@ne...> Revision Changes Path 1.8 +11 -7 KoreanCodecs/korean/qwerty2bul.py Index: qwerty2bul.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/qwerty2bul.py,v retrieving revision 1.7 retrieving revision 1.8 diff -u -r1.7 -r1.8 --- qwerty2bul.py 13 Jan 2003 09:09:56 -0000 1.7 +++ qwerty2bul.py 22 Feb 2003 08:31:18 -0000 1.8 @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: qwerty2bul.py,v 1.7 2003/01/13 09:09:56 perky Exp $ +# $Id: qwerty2bul.py,v 1.8 2003/02/22 08:31:18 perky Exp $ # import codecs @@ -60,7 +60,7 @@ self.clear() def pushcomp(self): - if self.chosung and not self.jungsung: + if not (self.chosung and self.jungsung): self.word_valid = 0 self.word_comp.append(join([ self.chosung, self.jungsung, self.jongsung @@ -92,12 +92,16 @@ self.pushcomp() if self.word_raw or self.word_comp: if self.word_valid: - self.buff.append(u''.join(self.word_comp)) + rjoi = u''.join(self.word_comp) else: self.word_valid = 1 - self.buff.append(u''.join(self.word_raw)) - + rjoi = u''.join(self.word_raw) + self.word_raw, self.word_comp = [], [] + if rjoi: + self.buff.append(rjoi) + return 1 + return 0 def feed(self, c): self.word_raw.append(c) @@ -154,8 +158,8 @@ self.chosung = njong self.jungsung = code else: # non key code - self.finalize() - self.buff.append(c) + if not self.finalize(): + self.buff.append(c) class Codec(codecs.Codec): |
From: Hye-Shik C. <pe...@us...> - 2003-02-20 11:09:28
|
perky 03/02/20 03:09:27 Modified: . log_accum.pl Log: Run! Revision Changes Path 1.3 +1 -3 CVSROOT/log_accum.pl Index: log_accum.pl =================================================================== RCS file: /cvsroot/koco/CVSROOT/log_accum.pl,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- log_accum.pl 20 Feb 2003 11:09:08 -0000 1.2 +++ log_accum.pl 20 Feb 2003 11:09:27 -0000 1.3 @@ -12,7 +12,7 @@ # Roy Fielding removed useless code and added log/mail of new files # Ken Coar added special processing (i.e., no diffs) for binary files # -# $Id: log_accum.pl,v 1.2 2003/02/20 11:09:08 perky Exp $ +# $Id: log_accum.pl,v 1.3 2003/02/20 11:09:27 perky Exp $ ############################################################ # @@ -293,8 +293,6 @@ sub mail_notification { local(@text) = @_; - - system("rm -rf $CVSROOT/py-unicodec"); print "Mailing the commit message...\n"; |
From: Hye-Shik C. <pe...@us...> - 2003-02-20 11:09:09
|
perky 03/02/20 03:09:08 Modified: . log_accum.pl Log: Remove py-unicodec Revision Changes Path 1.2 +3 -1 CVSROOT/log_accum.pl Index: log_accum.pl =================================================================== RCS file: /cvsroot/koco/CVSROOT/log_accum.pl,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- log_accum.pl 1 Apr 2002 08:47:36 -0000 1.1 +++ log_accum.pl 20 Feb 2003 11:09:08 -0000 1.2 @@ -12,7 +12,7 @@ # Roy Fielding removed useless code and added log/mail of new files # Ken Coar added special processing (i.e., no diffs) for binary files # -# $Id: log_accum.pl,v 1.1 2002/04/01 08:47:36 perky Exp $ +# $Id: log_accum.pl,v 1.2 2003/02/20 11:09:08 perky Exp $ ############################################################ # @@ -293,6 +293,8 @@ sub mail_notification { local(@text) = @_; + + system("rm -rf $CVSROOT/py-unicodec"); print "Mailing the commit message...\n"; |
From: Hye-Shik C. <pe...@us...> - 2003-01-14 15:20:43
|
perky 03/01/14 07:20:40 Modified: korean error_callback.py Log: Remove debugging routine. Revision Changes Path 1.6 +1 -2 KoreanCodecs/korean/error_callback.py Index: error_callback.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/error_callback.py,v retrieving revision 1.5 retrieving revision 1.6 diff -u -r1.5 -r1.6 --- error_callback.py 14 Jan 2003 15:13:44 -0000 1.5 +++ error_callback.py 14 Jan 2003 15:20:35 -0000 1.6 @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: error_callback.py,v 1.5 2003/01/14 15:13:44 perky Exp $ +# $Id: error_callback.py,v 1.6 2003/01/14 15:20:35 perky Exp $ # try: @@ -38,7 +38,6 @@ class UnicodeDecodeError(UnicodeError): def __init__(self, encoding, object, start, end, reason): - print repr((encoding, object, start, end, reason)) UnicodeError.__init__(self, ("encoding '%s' can't decode characters " "in positions %d-%d: %s") % (encoding, |
From: Hye-Shik C. <pe...@us...> - 2003-01-14 15:13:49
|
perky 03/01/14 07:13:48 Modified: korean error_callback.py Log: Fix syntax error on wrong parenthesis Revision Changes Path 1.5 +6 -5 KoreanCodecs/korean/error_callback.py Index: error_callback.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/error_callback.py,v retrieving revision 1.4 retrieving revision 1.5 diff -u -r1.4 -r1.5 --- error_callback.py 13 Jan 2003 09:09:56 -0000 1.4 +++ error_callback.py 14 Jan 2003 15:13:44 -0000 1.5 @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: error_callback.py,v 1.4 2003/01/13 09:09:56 perky Exp $ +# $Id: error_callback.py,v 1.5 2003/01/14 15:13:44 perky Exp $ # try: @@ -27,8 +27,8 @@ class UnicodeEncodeError(UnicodeError): def __init__(self, encoding, object, start, end, reason): UnicodeError.__init__(self, - "encoding '%s' can't encode characters " + - "in positions %d-%d: %s" % (encoding, + ("encoding '%s' can't encode characters " + "in positions %d-%d: %s") % (encoding, start, end-1, reason)) self.encoding = encoding self.object = object @@ -38,9 +38,10 @@ class UnicodeDecodeError(UnicodeError): def __init__(self, encoding, object, start, end, reason): + print repr((encoding, object, start, end, reason)) UnicodeError.__init__(self, - "encoding '%s' can't decode characters " + - "in positions %d-%d: %s" % (encoding, + ("encoding '%s' can't decode characters " + "in positions %d-%d: %s") % (encoding, start, end-1, reason)) self.encoding = encoding self.object = object |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 09:10:01
|
perky 03/01/13 01:10:00 Modified: tools generate_codec_mapping.py generate_mackorean_mapping.py Log: LGPL starts with version 2.1 not 2 Revision Changes Path 1.2 +6 -6 KoreanCodecs/tools/generate_codec_mapping.py Index: generate_codec_mapping.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/tools/generate_codec_mapping.py,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- generate_codec_mapping.py 12 Jan 2003 23:12:48 -0000 1.1 +++ generate_codec_mapping.py 13 Jan 2003 09:10:00 -0000 1.2 @@ -1,17 +1,17 @@ # -# generate_codec_mapping.py - $Revision: 1.1 $ +# generate_codec_mapping.py - $Revision: 1.2 $ # # Code Table Generator # # Author: Hye-Shik Chang <pe...@Fr...> -# Date : $Date: 2003/01/12 23:12:48 $ +# Date : $Date: 2003/01/13 09:10:00 $ # # # This file is part of KoreanCodecs. # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -34,7 +34,7 @@ * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -47,7 +47,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Generated by generate_codec_mapping.py on %s - * $Id: generate_codec_mapping.py,v 1.1 2003/01/12 23:12:48 perky Exp $ + * $Id: generate_codec_mapping.py,v 1.2 2003/01/13 09:10:00 perky Exp $ */ """ % time.asctime(time.gmtime()) @@ -218,6 +218,6 @@ encodemapgen(encmapfile, "wansung_encode", ksc5601_encoding, 512) # -# $Id: generate_codec_mapping.py,v 1.1 2003/01/12 23:12:48 perky Exp $ +# $Id: generate_codec_mapping.py,v 1.2 2003/01/13 09:10:00 perky Exp $ # # -*- End-Of-File -*- 1.2 +2 -2 KoreanCodecs/tools/generate_mackorean_mapping.py Index: generate_mackorean_mapping.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/tools/generate_mackorean_mapping.py,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- generate_mackorean_mapping.py 10 Jan 2003 01:56:48 -0000 1.1 +++ generate_mackorean_mapping.py 13 Jan 2003 09:10:00 -0000 1.2 @@ -6,7 +6,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -18,7 +18,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: generate_mackorean_mapping.py,v 1.1 2003/01/10 01:56:48 perky Exp $ +# $Id: generate_mackorean_mapping.py,v 1.2 2003/01/13 09:10:00 perky Exp $ # decmap = {} |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 09:10:00
|
perky 03/01/13 01:10:00 Modified: test CodecTestBase.py test_all.py test_cp949.py test_euc_kr.py test_hangul.py test_iso_2022_kr.py test_johab.py test_mackorean.py test_qwerty2bul.py test_unijohab.py Log: LGPL starts with version 2.1 not 2 Revision Changes Path 1.13 +2 -2 KoreanCodecs/test/CodecTestBase.py Index: CodecTestBase.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/CodecTestBase.py,v retrieving revision 1.12 retrieving revision 1.13 diff -u -r1.12 -r1.13 --- CodecTestBase.py 13 Jan 2003 08:43:47 -0000 1.12 +++ CodecTestBase.py 13 Jan 2003 09:09:59 -0000 1.13 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: CodecTestBase.py,v 1.12 2003/01/13 08:43:47 perky Exp $ +# $Id: CodecTestBase.py,v 1.13 2003/01/13 09:09:59 perky Exp $ # import StringIO 1.9 +2 -2 KoreanCodecs/test/test_all.py Index: test_all.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_all.py,v retrieving revision 1.8 retrieving revision 1.9 diff -u -r1.8 -r1.9 --- test_all.py 12 Jan 2003 22:54:13 -0000 1.8 +++ test_all.py 13 Jan 2003 09:09:59 -0000 1.9 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_all.py,v 1.8 2003/01/12 22:54:13 perky Exp $ +# $Id: test_all.py,v 1.9 2003/01/13 09:09:59 perky Exp $ # import CodecTestBase 1.14 +2 -2 KoreanCodecs/test/test_cp949.py Index: test_cp949.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_cp949.py,v retrieving revision 1.13 retrieving revision 1.14 diff -u -r1.13 -r1.14 --- test_cp949.py 13 Jan 2003 08:43:48 -0000 1.13 +++ test_cp949.py 13 Jan 2003 09:09:59 -0000 1.14 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_cp949.py,v 1.13 2003/01/13 08:43:48 perky Exp $ +# $Id: test_cp949.py,v 1.14 2003/01/13 09:09:59 perky Exp $ # import CodecTestBase 1.11 +2 -2 KoreanCodecs/test/test_euc_kr.py Index: test_euc_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_euc_kr.py,v retrieving revision 1.10 retrieving revision 1.11 diff -u -r1.10 -r1.11 --- test_euc_kr.py 13 Jan 2003 08:43:48 -0000 1.10 +++ test_euc_kr.py 13 Jan 2003 09:09:59 -0000 1.11 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_euc_kr.py,v 1.10 2003/01/13 08:43:48 perky Exp $ +# $Id: test_euc_kr.py,v 1.11 2003/01/13 09:09:59 perky Exp $ # import CodecTestBase 1.12 +2 -2 KoreanCodecs/test/test_hangul.py Index: test_hangul.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_hangul.py,v retrieving revision 1.11 retrieving revision 1.12 diff -u -r1.11 -r1.12 --- test_hangul.py 12 Jan 2003 22:54:13 -0000 1.11 +++ test_hangul.py 13 Jan 2003 09:09:59 -0000 1.12 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_hangul.py,v 1.11 2003/01/12 22:54:13 perky Exp $ +# $Id: test_hangul.py,v 1.12 2003/01/13 09:09:59 perky Exp $ # import unittest 1.9 +2 -2 KoreanCodecs/test/test_iso_2022_kr.py Index: test_iso_2022_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_iso_2022_kr.py,v retrieving revision 1.8 retrieving revision 1.9 diff -u -r1.8 -r1.9 --- test_iso_2022_kr.py 13 Jan 2003 08:43:48 -0000 1.8 +++ test_iso_2022_kr.py 13 Jan 2003 09:09:59 -0000 1.9 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_iso_2022_kr.py,v 1.8 2003/01/13 08:43:48 perky Exp $ +# $Id: test_iso_2022_kr.py,v 1.9 2003/01/13 09:09:59 perky Exp $ # import CodecTestBase 1.6 +2 -2 KoreanCodecs/test/test_johab.py Index: test_johab.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_johab.py,v retrieving revision 1.5 retrieving revision 1.6 diff -u -r1.5 -r1.6 --- test_johab.py 13 Jan 2003 08:38:36 -0000 1.5 +++ test_johab.py 13 Jan 2003 09:09:59 -0000 1.6 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_johab.py,v 1.5 2003/01/13 08:38:36 perky Exp $ +# $Id: test_johab.py,v 1.6 2003/01/13 09:09:59 perky Exp $ # import CodecTestBase 1.8 +2 -2 KoreanCodecs/test/test_mackorean.py Index: test_mackorean.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_mackorean.py,v retrieving revision 1.7 retrieving revision 1.8 diff -u -r1.7 -r1.8 --- test_mackorean.py 13 Jan 2003 08:43:48 -0000 1.7 +++ test_mackorean.py 13 Jan 2003 09:09:59 -0000 1.8 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_mackorean.py,v 1.7 2003/01/13 08:43:48 perky Exp $ +# $Id: test_mackorean.py,v 1.8 2003/01/13 09:09:59 perky Exp $ # import CodecTestBase 1.7 +2 -2 KoreanCodecs/test/test_qwerty2bul.py Index: test_qwerty2bul.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_qwerty2bul.py,v retrieving revision 1.6 retrieving revision 1.7 diff -u -r1.6 -r1.7 --- test_qwerty2bul.py 13 Jan 2003 08:43:48 -0000 1.6 +++ test_qwerty2bul.py 13 Jan 2003 09:09:59 -0000 1.7 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_qwerty2bul.py,v 1.6 2003/01/13 08:43:48 perky Exp $ +# $Id: test_qwerty2bul.py,v 1.7 2003/01/13 09:09:59 perky Exp $ # import CodecTestBase 1.5 +2 -2 KoreanCodecs/test/test_unijohab.py Index: test_unijohab.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_unijohab.py,v retrieving revision 1.4 retrieving revision 1.5 diff -u -r1.4 -r1.5 --- test_unijohab.py 13 Jan 2003 08:43:48 -0000 1.4 +++ test_unijohab.py 13 Jan 2003 09:09:59 -0000 1.5 @@ -4,7 +4,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_unijohab.py,v 1.4 2003/01/13 08:43:48 perky Exp $ +# $Id: test_unijohab.py,v 1.5 2003/01/13 09:09:59 perky Exp $ # import CodecTestBase |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 09:09:59
|
perky 03/01/13 01:09:59 Modified: src _koco.c _koco_ksc5601.h _koco_uhc.h _koco_wansungenc.h cp949_codec.h euckr_codec.h hangul.c koco_stream.h Log: LGPL starts with version 2.1 not 2 Revision Changes Path 1.27 +5 -5 KoreanCodecs/src/_koco.c Index: _koco.c =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/_koco.c,v retrieving revision 1.26 retrieving revision 1.27 diff -u -r1.26 -r1.27 --- _koco.c 2 Jan 2003 10:02:56 -0000 1.26 +++ _koco.c 13 Jan 2003 09:09:57 -0000 1.27 @@ -1,17 +1,17 @@ /* - * _koco.c - $Revision: 1.26 $ + * _koco.c - $Revision: 1.27 $ * * KoreanCodecs C Implementations * * Author : Hye-Shik Chang <pe...@Fr...> - * Date : $Date: 2003/01/02 10:02:56 $ + * Date : $Date: 2003/01/13 09:09:57 $ * Created : 15 March 2002 * * This file is part of KoreanCodecs. * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -24,7 +24,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -static char *version = "$Revision: 1.26 $"; +static char *version = "$Revision: 1.27 $"; #include "Python.h" @@ -177,6 +177,6 @@ } /* - * $Id: _koco.c,v 1.26 2003/01/02 10:02:56 perky Exp $ + * $Id: _koco.c,v 1.27 2003/01/13 09:09:57 perky Exp $ * ex: ts=8 sts=4 et */ 1.11 +2 -2 KoreanCodecs/src/_koco_ksc5601.h Index: _koco_ksc5601.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/_koco_ksc5601.h,v retrieving revision 1.10 retrieving revision 1.11 diff -u -r1.10 -r1.11 --- _koco_ksc5601.h 2 Jan 2003 10:02:56 -0000 1.10 +++ _koco_ksc5601.h 13 Jan 2003 09:09:57 -0000 1.11 @@ -3,7 +3,7 @@ * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Generated by tablegen.py on Thu Jan 2 08:47:50 2003 - * $Id: _koco_ksc5601.h,v 1.10 2003/01/02 10:02:56 perky Exp $ + * $Id: _koco_ksc5601.h,v 1.11 2003/01/13 09:09:57 perky Exp $ */ #define ksc5601_decode_bottom 161 1.10 +2 -2 KoreanCodecs/src/_koco_uhc.h Index: _koco_uhc.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/_koco_uhc.h,v retrieving revision 1.9 retrieving revision 1.10 diff -u -r1.9 -r1.10 --- _koco_uhc.h 2 Jan 2003 10:02:56 -0000 1.9 +++ _koco_uhc.h 13 Jan 2003 09:09:58 -0000 1.10 @@ -3,7 +3,7 @@ * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Generated by tablegen.py on Thu Jan 2 08:47:50 2003 - * $Id: _koco_uhc.h,v 1.9 2003/01/02 10:02:56 perky Exp $ + * $Id: _koco_uhc.h,v 1.10 2003/01/13 09:09:58 perky Exp $ */ #define uhc_page0_bottom 0x41 1.2 +2 -2 KoreanCodecs/src/_koco_wansungenc.h Index: _koco_wansungenc.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/_koco_wansungenc.h,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- _koco_wansungenc.h 2 Jan 2003 10:02:57 -0000 1.1 +++ _koco_wansungenc.h 13 Jan 2003 09:09:58 -0000 1.2 @@ -3,7 +3,7 @@ * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -16,7 +16,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Generated by tablegen.py on Thu Jan 2 08:47:50 2003 - * $Id: _koco_wansungenc.h,v 1.1 2003/01/02 10:02:57 perky Exp $ + * $Id: _koco_wansungenc.h,v 1.2 2003/01/13 09:09:58 perky Exp $ */ static const DBYTECHAR wansung_encode_page0[945] = { /* 0x00a1 - 0x0451 */ 1.11 +4 -4 KoreanCodecs/src/cp949_codec.h Index: cp949_codec.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/cp949_codec.h,v retrieving revision 1.10 retrieving revision 1.11 diff -u -r1.10 -r1.11 --- cp949_codec.h 2 Jan 2003 10:02:57 -0000 1.10 +++ cp949_codec.h 13 Jan 2003 09:09:58 -0000 1.11 @@ -1,17 +1,17 @@ /* - * cp949_codec.h - $Revision: 1.10 $ + * cp949_codec.h - $Revision: 1.11 $ * * KoreanCodecs CP949 Codec C Implementation * * Author : Hye-Shik Chang <pe...@Fr...> - * Date : $Date: 2003/01/02 10:02:57 $ + * Date : $Date: 2003/01/13 09:09:58 $ * Created : 15 March 2002 * * This file is part of KoreanCodecs. * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -184,6 +184,6 @@ } /* - * $Id: cp949_codec.h,v 1.10 2003/01/02 10:02:57 perky Exp $ + * $Id: cp949_codec.h,v 1.11 2003/01/13 09:09:58 perky Exp $ * ex: ts=8 sts=4 et */ 1.11 +4 -4 KoreanCodecs/src/euckr_codec.h Index: euckr_codec.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/euckr_codec.h,v retrieving revision 1.10 retrieving revision 1.11 diff -u -r1.10 -r1.11 --- euckr_codec.h 2 Jan 2003 10:02:57 -0000 1.10 +++ euckr_codec.h 13 Jan 2003 09:09:58 -0000 1.11 @@ -1,17 +1,17 @@ /* - * euckr_codec.h - $Revision: 1.10 $ + * euckr_codec.h - $Revision: 1.11 $ * * KoreanCodecs EUC-KR Codec C Implementation * * Author : Hye-Shik Chang <pe...@Fr...> - * Date : $Date: 2003/01/02 10:02:57 $ + * Date : $Date: 2003/01/13 09:09:58 $ * Created : 15 March 2002 * * This file is part of KoreanCodecs. * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -159,6 +159,6 @@ } /* - * $Id: euckr_codec.h,v 1.10 2003/01/02 10:02:57 perky Exp $ + * $Id: euckr_codec.h,v 1.11 2003/01/13 09:09:58 perky Exp $ * ex: ts=8 sts=4 et */ 1.17 +5 -5 KoreanCodecs/src/hangul.c Index: hangul.c =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/hangul.c,v retrieving revision 1.16 retrieving revision 1.17 diff -u -r1.16 -r1.17 --- hangul.c 2 Jan 2003 03:44:41 -0000 1.16 +++ hangul.c 13 Jan 2003 09:09:58 -0000 1.17 @@ -1,17 +1,17 @@ /* - * hangul.c - $Revision: 1.16 $ + * hangul.c - $Revision: 1.17 $ * * KoreanCodecs Hangul Module C Implementation * * Author : Hye-Shik Chang <pe...@Fr...> - * Date : $Date: 2003/01/02 03:44:41 $ + * Date : $Date: 2003/01/13 09:09:58 $ * Created : 25 April 2002 * * This file is part of KoreanCodecs. * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -24,7 +24,7 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ -static char *version = "$Revision: 1.16 $"; +static char *version = "$Revision: 1.17 $"; #include "Python.h" @@ -829,6 +829,6 @@ } /* - * $Id: hangul.c,v 1.16 2003/01/02 03:44:41 perky Exp $ + * $Id: hangul.c,v 1.17 2003/01/13 09:09:58 perky Exp $ * ex: ts=8 sts=4 et */ 1.13 +4 -4 KoreanCodecs/src/koco_stream.h Index: koco_stream.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/koco_stream.h,v retrieving revision 1.12 retrieving revision 1.13 diff -u -r1.12 -r1.13 --- koco_stream.h 2 Jan 2003 07:44:40 -0000 1.12 +++ koco_stream.h 13 Jan 2003 09:09:58 -0000 1.13 @@ -1,17 +1,17 @@ /* - * euckr_stream.c - $Revision: 1.12 $ + * euckr_stream.c - $Revision: 1.13 $ * * KoreanCodecs EUC-KR StreamReader C Implementation * * Author : Hye-Shik Chang <pe...@Fr...> - * Date : $Date: 2003/01/02 07:44:40 $ + * Date : $Date: 2003/01/13 09:09:58 $ * Created : 28 April 2002 * * This file is part of KoreanCodecs. * * KoreanCodecs is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published - * by the Free Software Foundation; either version 2 of the License, or + * by the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * KoreanCodecs is distributed in the hope that it will be useful, @@ -608,6 +608,6 @@ }; /* - * $Id: koco_stream.h,v 1.12 2003/01/02 07:44:40 perky Exp $ + * $Id: koco_stream.h,v 1.13 2003/01/13 09:09:58 perky Exp $ * ex: ts=8 sts=4 et */ |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 09:09:58
|
perky 03/01/13 01:09:57 Modified: korean/mappings __init__.py appleextension.py johab_ideograph.py Log: LGPL starts with version 2.1 not 2 Revision Changes Path 1.5 +2 -2 KoreanCodecs/korean/mappings/__init__.py Index: __init__.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/mappings/__init__.py,v retrieving revision 1.4 retrieving revision 1.5 diff -u -r1.4 -r1.5 --- __init__.py 9 Jan 2003 21:35:49 -0000 1.4 +++ __init__.py 13 Jan 2003 09:09:57 -0000 1.5 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,5 +17,5 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: __init__.py,v 1.4 2003/01/09 21:35:49 perky Exp $ +# $Id: __init__.py,v 1.5 2003/01/13 09:09:57 perky Exp $ # 1.3 +2 -2 KoreanCodecs/korean/mappings/appleextension.py Index: appleextension.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/mappings/appleextension.py,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- appleextension.py 10 Jan 2003 03:14:22 -0000 1.2 +++ appleextension.py 13 Jan 2003 09:09:57 -0000 1.3 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: appleextension.py,v 1.2 2003/01/10 03:14:22 perky Exp $ +# $Id: appleextension.py,v 1.3 2003/01/13 09:09:57 perky Exp $ # decoding_map = { 1.5 +2 -2 KoreanCodecs/korean/mappings/johab_ideograph.py Index: johab_ideograph.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/mappings/johab_ideograph.py,v retrieving revision 1.4 retrieving revision 1.5 diff -u -r1.4 -r1.5 --- johab_ideograph.py 9 Jan 2003 21:35:49 -0000 1.4 +++ johab_ideograph.py 13 Jan 2003 09:09:57 -0000 1.5 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: johab_ideograph.py,v 1.4 2003/01/09 21:35:49 perky Exp $ +# $Id: johab_ideograph.py,v 1.5 2003/01/13 09:09:57 perky Exp $ # decoding_map = { |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 09:09:57
|
perky 03/01/13 01:09:56 Modified: korean __init__.py aliases.py cp949.py error_callback.py euc_kr.py iso_2022_kr.py mac_korean.py qwerty2bul.py unijohab.py Log: LGPL starts with version 2.1 not 2 Revision Changes Path 1.8 +2 -2 KoreanCodecs/korean/__init__.py Index: __init__.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/__init__.py,v retrieving revision 1.7 retrieving revision 1.8 diff -u -r1.7 -r1.8 --- __init__.py 9 Jan 2003 21:35:48 -0000 1.7 +++ __init__.py 13 Jan 2003 09:09:55 -0000 1.8 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,5 +17,5 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: __init__.py,v 1.7 2003/01/09 21:35:48 perky Exp $ +# $Id: __init__.py,v 1.8 2003/01/13 09:09:55 perky Exp $ # 1.12 +2 -2 KoreanCodecs/korean/aliases.py Index: aliases.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/aliases.py,v retrieving revision 1.11 retrieving revision 1.12 diff -u -r1.11 -r1.12 --- aliases.py 12 Jan 2003 23:04:56 -0000 1.11 +++ aliases.py 13 Jan 2003 09:09:56 -0000 1.12 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: aliases.py,v 1.11 2003/01/12 23:04:56 perky Exp $ +# $Id: aliases.py,v 1.12 2003/01/13 09:09:56 perky Exp $ # import encodings.aliases 1.6 +2 -2 KoreanCodecs/korean/cp949.py Index: cp949.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/cp949.py,v retrieving revision 1.5 retrieving revision 1.6 diff -u -r1.5 -r1.6 --- cp949.py 12 Jan 2003 22:54:12 -0000 1.5 +++ cp949.py 13 Jan 2003 09:09:56 -0000 1.6 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: cp949.py,v 1.5 2003/01/12 22:54:12 perky Exp $ +# $Id: cp949.py,v 1.6 2003/01/13 09:09:56 perky Exp $ # import codecs 1.4 +2 -2 KoreanCodecs/korean/error_callback.py Index: error_callback.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/error_callback.py,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- error_callback.py 13 Jan 2003 07:52:47 -0000 1.3 +++ error_callback.py 13 Jan 2003 09:09:56 -0000 1.4 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: error_callback.py,v 1.3 2003/01/13 07:52:47 perky Exp $ +# $Id: error_callback.py,v 1.4 2003/01/13 09:09:56 perky Exp $ # try: 1.6 +2 -2 KoreanCodecs/korean/euc_kr.py Index: euc_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/euc_kr.py,v retrieving revision 1.5 retrieving revision 1.6 diff -u -r1.5 -r1.6 --- euc_kr.py 12 Jan 2003 22:54:12 -0000 1.5 +++ euc_kr.py 13 Jan 2003 09:09:56 -0000 1.6 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: euc_kr.py,v 1.5 2003/01/12 22:54:12 perky Exp $ +# $Id: euc_kr.py,v 1.6 2003/01/13 09:09:56 perky Exp $ # import codecs 1.7 +2 -2 KoreanCodecs/korean/iso_2022_kr.py Index: iso_2022_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/iso_2022_kr.py,v retrieving revision 1.6 retrieving revision 1.7 diff -u -r1.6 -r1.7 --- iso_2022_kr.py 12 Jan 2003 23:01:34 -0000 1.6 +++ iso_2022_kr.py 13 Jan 2003 09:09:56 -0000 1.7 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: iso_2022_kr.py,v 1.6 2003/01/12 23:01:34 perky Exp $ +# $Id: iso_2022_kr.py,v 1.7 2003/01/13 09:09:56 perky Exp $ # import codecs 1.3 +2 -2 KoreanCodecs/korean/mac_korean.py Index: mac_korean.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/mac_korean.py,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- mac_korean.py 12 Jan 2003 23:22:32 -0000 1.2 +++ mac_korean.py 13 Jan 2003 09:09:56 -0000 1.3 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: mac_korean.py,v 1.2 2003/01/12 23:22:32 perky Exp $ +# $Id: mac_korean.py,v 1.3 2003/01/13 09:09:56 perky Exp $ # import codecs 1.7 +2 -2 KoreanCodecs/korean/qwerty2bul.py Index: qwerty2bul.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/qwerty2bul.py,v retrieving revision 1.6 retrieving revision 1.7 diff -u -r1.6 -r1.7 --- qwerty2bul.py 12 Jan 2003 22:57:19 -0000 1.6 +++ qwerty2bul.py 13 Jan 2003 09:09:56 -0000 1.7 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: qwerty2bul.py,v 1.6 2003/01/12 22:57:19 perky Exp $ +# $Id: qwerty2bul.py,v 1.7 2003/01/13 09:09:56 perky Exp $ # import codecs 1.6 +2 -2 KoreanCodecs/korean/unijohab.py Index: unijohab.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/unijohab.py,v retrieving revision 1.5 retrieving revision 1.6 diff -u -r1.5 -r1.6 --- unijohab.py 12 Jan 2003 22:54:12 -0000 1.5 +++ unijohab.py 13 Jan 2003 09:09:56 -0000 1.6 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: unijohab.py,v 1.5 2003/01/12 22:54:12 perky Exp $ +# $Id: unijohab.py,v 1.6 2003/01/13 09:09:56 perky Exp $ # import codecs |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 09:09:56
|
perky 03/01/13 01:09:55 Modified: . setup.py Log: LGPL starts with version 2.1 not 2 Revision Changes Path 1.34 +2 -2 KoreanCodecs/setup.py Index: setup.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/setup.py,v retrieving revision 1.33 retrieving revision 1.34 diff -u -r1.33 -r1.34 --- setup.py 12 Jan 2003 23:46:36 -0000 1.33 +++ setup.py 13 Jan 2003 09:09:54 -0000 1.34 @@ -6,7 +6,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -18,7 +18,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: setup.py,v 1.33 2003/01/12 23:46:36 perky Exp $ +# $Id: setup.py,v 1.34 2003/01/13 09:09:54 perky Exp $ # import sys |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 08:43:49
|
perky 03/01/13 00:43:48 Modified: test CodecTestBase.py test_cp949.py test_euc_kr.py test_iso_2022_kr.py test_mackorean.py test_qwerty2bul.py test_unijohab.py Log: Add callback tests to all codecs! Revision Changes Path 1.12 +3 -3 KoreanCodecs/test/CodecTestBase.py Index: CodecTestBase.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/CodecTestBase.py,v retrieving revision 1.11 retrieving revision 1.12 diff -u -r1.11 -r1.12 --- CodecTestBase.py 13 Jan 2003 08:38:36 -0000 1.11 +++ CodecTestBase.py 13 Jan 2003 08:43:47 -0000 1.12 @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: CodecTestBase.py,v 1.11 2003/01/13 08:38:36 perky Exp $ +# $Id: CodecTestBase.py,v 1.12 2003/01/13 08:43:47 perky Exp $ # import StringIO @@ -83,14 +83,14 @@ class TestCodecErrorCallback: if sys.hexversion >= 0x2030000: - def test_xmlcharrefreplace(self): + def test_StandardReplaceCallback(self): s = u"\u30b9\u30d1\u30e2 \xe4nd eggs" self.assertEqual( s.encode(self.encoding, "xmlcharrefreplace"), "スパモ änd eggs" ) - def test_xmlcharnamereplace(self): + def test_CustomReplaceCallback(self): import htmlentitydefs names = {} 1.13 +4 -2 KoreanCodecs/test/test_cp949.py Index: test_cp949.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_cp949.py,v retrieving revision 1.12 retrieving revision 1.13 diff -u -r1.12 -r1.13 --- test_cp949.py 12 Jan 2003 22:54:13 -0000 1.12 +++ test_cp949.py 13 Jan 2003 08:43:48 -0000 1.13 @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_cp949.py,v 1.12 2003/01/12 22:54:13 perky Exp $ +# $Id: test_cp949.py,v 1.13 2003/01/13 08:43:48 perky Exp $ # import CodecTestBase @@ -24,7 +24,9 @@ def unichrs(s): return u''.join(map(unichr, map(eval, s.split('+')))) -class TestCP949(CodecTestBase.TestStreamReader, CodecTestBase.CodecTestBase): +class TestCP949(CodecTestBase.TestCodecErrorCallback, + CodecTestBase.TestStreamReader, + CodecTestBase.CodecTestBase): encoding = 'korean.cp949' textfile_chunk = ('texts/cp949', 'texts/cp949.utf-8') errortests = ( 1.10 +4 -2 KoreanCodecs/test/test_euc_kr.py Index: test_euc_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_euc_kr.py,v retrieving revision 1.9 retrieving revision 1.10 diff -u -r1.9 -r1.10 --- test_euc_kr.py 12 Jan 2003 22:54:13 -0000 1.9 +++ test_euc_kr.py 13 Jan 2003 08:43:48 -0000 1.10 @@ -16,12 +16,14 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_euc_kr.py,v 1.9 2003/01/12 22:54:13 perky Exp $ +# $Id: test_euc_kr.py,v 1.10 2003/01/13 08:43:48 perky Exp $ # import CodecTestBase -class TestEUCKR(CodecTestBase.TestStreamReader, CodecTestBase.CodecTestBase): +class TestEUCKR(CodecTestBase.TestCodecErrorCallback, + CodecTestBase.TestStreamReader, + CodecTestBase.CodecTestBase): encoding = 'korean.euc-kr' textfile_chunk = ('texts/euc-kr', 'texts/euc-kr.utf-8') errortests = ( 1.8 +3 -2 KoreanCodecs/test/test_iso_2022_kr.py Index: test_iso_2022_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_iso_2022_kr.py,v retrieving revision 1.7 retrieving revision 1.8 diff -u -r1.7 -r1.8 --- test_iso_2022_kr.py 12 Jan 2003 22:54:13 -0000 1.7 +++ test_iso_2022_kr.py 13 Jan 2003 08:43:48 -0000 1.8 @@ -16,12 +16,13 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_iso_2022_kr.py,v 1.7 2003/01/12 22:54:13 perky Exp $ +# $Id: test_iso_2022_kr.py,v 1.8 2003/01/13 08:43:48 perky Exp $ # import CodecTestBase -class TestISO_2022_KR(CodecTestBase.CodecTestBase): +class TestISO_2022_KR(CodecTestBase.TestCodecErrorCallback, + CodecTestBase.CodecTestBase): encoding = 'korean.iso-2022-kr' textfile_chunk = ('texts/iso-2022-kr.roundrobin', 'texts/iso-2022-kr.utf-8') 1.7 +4 -2 KoreanCodecs/test/test_mackorean.py Index: test_mackorean.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_mackorean.py,v retrieving revision 1.6 retrieving revision 1.7 diff -u -r1.6 -r1.7 --- test_mackorean.py 12 Jan 2003 23:04:56 -0000 1.6 +++ test_mackorean.py 13 Jan 2003 08:43:48 -0000 1.7 @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_mackorean.py,v 1.6 2003/01/12 23:04:56 perky Exp $ +# $Id: test_mackorean.py,v 1.7 2003/01/13 08:43:48 perky Exp $ # import CodecTestBase @@ -24,7 +24,9 @@ def unichrs(s): return u''.join(map(unichr, map(eval, s.split('+')))) -class TestMacKorean(CodecTestBase.TestStreamReader, CodecTestBase.CodecTestBase): +class TestMacKorean(CodecTestBase.TestCodecErrorCallback, + CodecTestBase.TestStreamReader, + CodecTestBase.CodecTestBase): encoding = 'korean.mac_korean' textfile_chunk = ('texts/mackorean', 'texts/mackorean.utf-8') errortests = ( 1.6 +3 -2 KoreanCodecs/test/test_qwerty2bul.py Index: test_qwerty2bul.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_qwerty2bul.py,v retrieving revision 1.5 retrieving revision 1.6 diff -u -r1.5 -r1.6 --- test_qwerty2bul.py 12 Jan 2003 22:54:13 -0000 1.5 +++ test_qwerty2bul.py 13 Jan 2003 08:43:48 -0000 1.6 @@ -16,12 +16,13 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_qwerty2bul.py,v 1.5 2003/01/12 22:54:13 perky Exp $ +# $Id: test_qwerty2bul.py,v 1.6 2003/01/13 08:43:48 perky Exp $ # import CodecTestBase -class TestQWERTY2BUL(CodecTestBase.CodecTestBase): +class TestQWERTY2BUL(CodecTestBase.TestCodecErrorCallback, + CodecTestBase.CodecTestBase): encoding = 'korean.qwerty2bul' errortests = ( # invalid bytes 1.4 +3 -2 KoreanCodecs/test/test_unijohab.py Index: test_unijohab.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_unijohab.py,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- test_unijohab.py 12 Jan 2003 22:54:13 -0000 1.3 +++ test_unijohab.py 13 Jan 2003 08:43:48 -0000 1.4 @@ -16,12 +16,13 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_unijohab.py,v 1.3 2003/01/12 22:54:13 perky Exp $ +# $Id: test_unijohab.py,v 1.4 2003/01/13 08:43:48 perky Exp $ # import CodecTestBase -class TestUNIJOHAB(CodecTestBase.CodecTestBase): +class TestUNIJOHAB(CodecTestBase.TestCodecErrorCallback, + CodecTestBase.CodecTestBase): encoding = 'korean.unijohab' errortests = () # error handling is relying UTF-8 codec. |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 08:40:21
|
perky 03/01/13 00:40:19 Modified: korean johab.py Log: Prepare for `c` isn't two-letters. Revision Changes Path 1.7 +2 -3 KoreanCodecs/korean/johab.py Index: johab.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/johab.py,v retrieving revision 1.6 retrieving revision 1.7 diff -u -r1.6 -r1.7 --- johab.py 13 Jan 2003 08:10:35 -0000 1.6 +++ johab.py 13 Jan 2003 08:40:17 -0000 1.7 @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: johab.py,v 1.6 2003/01/13 08:10:35 perky Exp $ +# $Id: johab.py,v 1.7 2003/01/13 08:40:17 perky Exp $ # import codecs @@ -146,8 +146,7 @@ continue exc = UnicodeDecodeError(ENCODING, data, p-2, p, - "unexpected byte 0x%02x%02x found" % ( - ord(c[0]), ord(c[1]))) + "unexpected byte %s found" % repr(c)) repl, p = errcb(exc) buffer.append(repl) |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 08:38:38
|
perky 03/01/13 00:38:37 Modified: test CodecTestBase.py test_johab.py Log: Add PEP293 unit test framework Revision Changes Path 1.11 +44 -1 KoreanCodecs/test/CodecTestBase.py Index: CodecTestBase.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/CodecTestBase.py,v retrieving revision 1.10 retrieving revision 1.11 diff -u -r1.10 -r1.11 --- CodecTestBase.py 12 Jan 2003 22:54:13 -0000 1.10 +++ CodecTestBase.py 13 Jan 2003 08:38:36 -0000 1.11 @@ -16,7 +16,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: CodecTestBase.py,v 1.10 2003/01/12 22:54:13 perky Exp $ +# $Id: CodecTestBase.py,v 1.11 2003/01/13 08:38:36 perky Exp $ # import StringIO @@ -43,6 +43,7 @@ errortests = None # must set. error test tuple roundtriptest = 1 # set if roundtrip is possible with unicode + has_iso10646 = 0 # set if this encoding contains whole iso10646 map def setUp(self): if not self.textfile_chunk: @@ -77,6 +78,48 @@ except UnicodeError: continue self.fail('UnicodeError expected') + + +class TestCodecErrorCallback: + + if sys.hexversion >= 0x2030000: + def test_xmlcharrefreplace(self): + s = u"\u30b9\u30d1\u30e2 \xe4nd eggs" + self.assertEqual( + s.encode(self.encoding, "xmlcharrefreplace"), + "スパモ änd eggs" + ) + + def test_xmlcharnamereplace(self): + import htmlentitydefs + + names = {} + for (key, value) in htmlentitydefs.entitydefs.items(): + if len(value)==1: + names[unicode(value, 'latin-1')] = \ + unicode(key, self.encoding) + else: + names[unichr(int(value[2:-1]))] = \ + unicode(key, self.encoding) + + def xmlcharnamereplace(exc): + if not isinstance(exc, UnicodeEncodeError): + raise TypeError("don't know how to handle %r" % exc) + l = [] + for c in exc.object[exc.start:exc.end]: + try: + l.append(u"&%s;" % names[c]) + except KeyError: + l.append(u"&#%d;" % ord(c)) + return (u"".join(l), exc.end) + + codecs.register_error( + "test.xmlcharnamereplace", xmlcharnamereplace) + + sin = u"\xab\u211c\xbb = \u2329\u1234\u20ac\u232a" + sout = "«ℜ» = ⟨ሴ€⟩" + self.assertEqual(sin.encode(self.encoding, + "test.xmlcharnamereplace"), sout) class TestStreamReader: 1.5 +5 -2 KoreanCodecs/test/test_johab.py Index: test_johab.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_johab.py,v retrieving revision 1.4 retrieving revision 1.5 diff -u -r1.4 -r1.5 --- test_johab.py 12 Jan 2003 22:54:13 -0000 1.4 +++ test_johab.py 13 Jan 2003 08:38:36 -0000 1.5 @@ -16,12 +16,15 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: test_johab.py,v 1.4 2003/01/12 22:54:13 perky Exp $ +# $Id: test_johab.py,v 1.5 2003/01/13 08:38:36 perky Exp $ # import CodecTestBase -class TestJOHAB(CodecTestBase.TestStreamReader, CodecTestBase.CodecTestBase): +class TestJOHAB(CodecTestBase.TestCodecErrorCallback, + #CodecTestBase.TestStreamWriter, + CodecTestBase.TestStreamReader, + CodecTestBase.CodecTestBase): encoding = 'korean.johab' errortests = ( # invalid bytes |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 08:10:37
|
perky 03/01/13 00:10:35 Modified: korean johab.py Log: Add PEP293 support to johab codec. Revision Changes Path 1.6 +30 -22 KoreanCodecs/korean/johab.py Index: johab.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/johab.py,v retrieving revision 1.5 retrieving revision 1.6 diff -u -r1.5 -r1.6 --- johab.py 12 Jan 2003 22:54:12 -0000 1.5 +++ johab.py 13 Jan 2003 08:10:35 -0000 1.6 @@ -5,7 +5,7 @@ # # KoreanCodecs is free software; you can redistribute it and/or modify # it under the terms of the GNU Lesser General Public License as published -# by the Free Software Foundation; either version 2 of the License, or +# by the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # # KoreanCodecs is distributed in the hope that it will be useful, @@ -17,14 +17,16 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: johab.py,v 1.5 2003/01/12 22:54:12 perky Exp $ +# $Id: johab.py,v 1.6 2003/01/13 08:10:35 perky Exp $ # import codecs - +from korean.error_callback import * from korean.hangul import Jaeum, Moeum, ishangul, split, join -encmap, decmap = {}, {} +ENCODING = 'korean.johab' + +encmap, decmap = {}, {} johab2uni_chosung = { 1: u'', 2: Jaeum.G, 3: Jaeum.GG, 4: Jaeum.N, 5: Jaeum.D, 6: Jaeum.DD, 7: Jaeum.L, 8: Jaeum.M, @@ -66,13 +68,16 @@ def encode(self, data, errors='strict'): global encmap - if errors not in ('strict', 'ignore', 'replace'): - raise ValueError, "unknown error handling" - buffer = [] + errcb = lookup_error(errors) + buffer = [] + pos = 0 + size = len(data) + + while pos < size: + c = data[pos] - for c in data: if c < u'\u0080': - buffer.append(c.encode("ascii", errors)) + buffer.append(chr(ord(c))) elif ishangul(c): cho, jung, jong = split(c) # all hangul can success cho, jung, jong = ( @@ -89,10 +94,14 @@ if encmap.has_key(c): buffer.append(encmap[c]) - elif errors == 'replace': - buffer.append('\x84\x41') - elif errors == 'strict': - raise UnicodeError, "cannot map \\u%04x to JOHAB" % ord(c) + else: + exc = UnicodeEncodeError(ENCODING, data, pos, pos+1, + "cannot map \\u%04x to JOHAB" % ord(c)) + repl, pos = errcb(exc) + buffer.append(repl.encode(ENCODING)) # must be 'strict'. + continue + + pos += 1 return (''.join(buffer), len(data)) @@ -100,8 +109,7 @@ def decode(self, data, errors='strict'): global decmap - if errors not in ('strict', 'ignore', 'replace'): - raise ValueError, "unknown error handling" + errcb = lookup_error(errors) buffer = [] data = str(data) # character buffer compatible object @@ -109,7 +117,7 @@ p = 0 while p < size: if data[p] < '\x80': - buffer.append(unicode(data[p], "ascii", errors)) + buffer.append(unichr(ord(data[p]))) p += 1 else: c = data[p:p+2] @@ -137,10 +145,11 @@ buffer.append(decmap[c]) continue - if errors == 'replace': - buffer.append(u'\uFFFD') # REPLACEMENT CHARACTER - elif errors == 'strict': - raise UnicodeError, "unexpected byte 0x%02x%02x found" % tuple(map(ord, c)) + exc = UnicodeDecodeError(ENCODING, data, p-2, p, + "unexpected byte 0x%02x%02x found" % ( + ord(c[0]), ord(c[1]))) + repl, p = errcb(exc) + buffer.append(repl) return (u''.join(buffer), size) @@ -197,8 +206,7 @@ def reset(self): self.data = '' -### encodings module API - def getregentry(): return (Codec().encode,Codec().decode,StreamReader,StreamWriter) +# ex: ts=8 sts=4 et |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 07:52:48
|
perky 03/01/12 23:52:47 Modified: korean error_callback.py Log: Fix one missing _ Revision Changes Path 1.3 +2 -2 KoreanCodecs/korean/error_callback.py Index: error_callback.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/error_callback.py,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- error_callback.py 13 Jan 2003 07:37:57 -0000 1.2 +++ error_callback.py 13 Jan 2003 07:52:47 -0000 1.3 @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: error_callback.py,v 1.2 2003/01/13 07:37:57 perky Exp $ +# $Id: error_callback.py,v 1.3 2003/01/13 07:52:47 perky Exp $ # try: @@ -69,7 +69,7 @@ } def lookup_error(name): - cb = error_callbacks.get(name) + cb = _error_callbacks.get(name) if cb: return cb else: |
From: Hye-Shik C. <pe...@us...> - 2003-01-13 07:37:58
|
perky 03/01/12 23:37:57 Modified: korean error_callback.py Log: Hide internal symbols from 'from import *' Revision Changes Path 1.2 +8 -8 KoreanCodecs/korean/error_callback.py Index: error_callback.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/error_callback.py,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- error_callback.py 13 Jan 2003 07:25:26 -0000 1.1 +++ error_callback.py 13 Jan 2003 07:37:57 -0000 1.2 @@ -17,7 +17,7 @@ # along with KoreanCodecs; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # -# $Id: error_callback.py,v 1.1 2003/01/13 07:25:26 perky Exp $ +# $Id: error_callback.py,v 1.2 2003/01/13 07:37:57 perky Exp $ # try: @@ -48,24 +48,24 @@ self.end = end self.reason = reason - def errcb_strict(exc): + def _errcb_strict(exc): raise exc - def errcb_ignore(exc): + def _errcb_ignore(exc): if isinstance(exc, UnicodeError): return (u"", exc.end) else: raise TypeError("can't handle %s" % exc.__name__) - def errcb_replace(exc): + def _errcb_replace(exc): if isinstance(exc, UnicodeEncodeError): return ((exc.end-exc.start)*u"?", exc.end) elif isinstance(exc, UnicodeDecodeError): return (u"\ufffd", exc.end) else: raise TypeError("can't handle %s" % exc.__name__) - error_callbacks = { - 'strict': errcb_strict, - 'ignore': errcb_ignore, - 'replace': errcb_replace, + _error_callbacks = { + 'strict': _errcb_strict, + 'ignore': _errcb_ignore, + 'replace': _errcb_replace, } def lookup_error(name): |