My version of ChineseCodecs

Status: Alpha

Brought to you by: andy_robinson, lemburg, tree

#1 My version of ChineseCodecs

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2001-12-28

Created: 2001-12-28

Creator: Adoal Xu

Private: No

I am working on a version of ChineseCodecs based from
Tamito KAJIYAMA's JapaneseCodecs. It is quite
different from Chen Chien-Hsun's version. Since I am
from Mainland China, I am focusing on the de facto
standard GBK ahead of official standard GB2312.

I hope some part of my work can be merged into this
python-codecs project. But there is something wrong
with my network connection. Route to the shell host
(and cvs host) of sourceforge is blocked by China
Tellecomm. So I can only access
http://sourceforge.net/xxx.

Is there anyone who can help me to move my work into
this project? Please visit
http://sourceforge.net/projects/pythonzh/

Thanks a lot!

Discussion

Bjorn Stabell - 2003-03-20

Logged In: YES
user_id=55032

Anything happened with this?

Does this mean the current Chinese support is only GB2312
not GBK?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adoal Xu - 2003-03-20

Logged In: YES
user_id=166186

As far as I browse cvs tree for python-codecs project, the
ChineseCodecs subject still doesn't support GBK, only
GB2312. And, the implementation is mixed with Python and
C. My version is the same as Tamito KAJIYAMA's
JapaneseCodecs. My version (and Tamito's) is easy to build
and install, but Chen's is efficient.
In the cvs tree, any of the Chinese, Japanese, Korean version
use it's own method. But since there is something common
between the three languages, it is a good idea to make a
unified approach. I wonder when can this appear.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bjorn Stabell - 2003-03-20

Logged In: YES
user_id=55032

So your version is slightly less efficient, but shares code/
structure with the Japanese version, and also supports GBK?

Personally, this is how I would prioritize work on the Chinese
codec:

1. Inclusion into standard distribution
2. Support for GBK
3. Performance
4. Beauty and code sharing

The sooner we can get wider distribution of the codec, the
sooner we'll get more feedback and help to continue
improving it. As long as it's correct (it only says it does
GB2312, if that's all it does, no problem) and reasonably
fast. I think few people find their way to python-codecs, get
the codecs, and install them. I had problems with installing
them under 2.2, but managed with some hackish copying.

If you already have support for GBK, even better! :) GBK is,
as you say, the standard in the mainland. Zhu Rongji can't
write his name with GB2312, right. :)

Performance and beauty are secondary to actually having
something that works. Of course, if we can do it now, even
better. I just hope we're not waiting for the "perfect"
implementation before pushing this into the standard distro.

The dream is to see Python, out-of-the-box, have excellent
support for Chinese, Japanese, and Korean. I think it'll help
immensely with the adoption of Python.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tom Emerson - 2003-10-04

Logged In: YES
user_id=37068

Support for GBK (Microsoft's CP936) is pretty important:
supporting GB2312 encodings goes part of the way, but not all the
way.

What we *really* want though is support for GB 18030, which
covers the repertoirs of GBK and 2312 in the same character
positions, as well as the rest of Unicode and beyond.

For a brief description of GB 18030, see

http://lisa.org/archive_domain/newsletters/2002/2.3/
emerson.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bjorn Stabell - 2003-10-05

Logged In: YES
user_id=55032

Hi Adoal.

I am also in mainland China (Beijing), and I've seen the same
problem that sometimes (not lately, though) I cannot access,
e.g., python-codecs.sf.net, but I can access
sf.net/projects/python-codecs.

I'd love to help you put these changes into the python-
codecs CVS, but I don't have write access to the repository.
[Can someone give Adoal and me access?]

( PS! If you cannot access python-codecs, how come you
can access your pythonzh project? Or can't you? )

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bjorn Stabell - 2003-10-05

Logged In: YES
user_id=55032

Hm, I was tricked by the formatting of the email from
sourceforge, so I replied to the initial request... Please ignore.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bjorn Stabell - 2003-10-05

Logged In: YES
user_id=55032

How about this project?
http://developer.berlios.de/projects/cjkpython/

It looks like it supports GB18030 already, and it is
implemented in C so it should be fast and compact.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.