I am working on a version of ChineseCodecs based from
Tamito KAJIYAMA's JapaneseCodecs. It is quite
different from Chen Chien-Hsun's version. Since I am
from Mainland China, I am focusing on the de facto
standard GBK ahead of official standard GB2312.
I hope some part of my work can be merged into this
python-codecs project. But there is something wrong
with my network connection. Route to the shell host
(and cvs host) of sourceforge is blocked by China
Tellecomm. So I can only access
http://sourceforge.net/xxx.
Is there anyone who can help me to move my work into
this project? Please visit
http://sourceforge.net/projects/pythonzh/
Thanks a lot!
Logged In: YES
user_id=55032
Anything happened with this?
Does this mean the current Chinese support is only GB2312
not GBK?
Logged In: YES
user_id=166186
As far as I browse cvs tree for python-codecs project, the
ChineseCodecs subject still doesn't support GBK, only
GB2312. And, the implementation is mixed with Python and
C. My version is the same as Tamito KAJIYAMA's
JapaneseCodecs. My version (and Tamito's) is easy to build
and install, but Chen's is efficient.
In the cvs tree, any of the Chinese, Japanese, Korean version
use it's own method. But since there is something common
between the three languages, it is a good idea to make a
unified approach. I wonder when can this appear.
Logged In: YES
user_id=55032
So your version is slightly less efficient, but shares code/
structure with the Japanese version, and also supports GBK?
Personally, this is how I would prioritize work on the Chinese
codec:
1. Inclusion into standard distribution
2. Support for GBK
3. Performance
4. Beauty and code sharing
The sooner we can get wider distribution of the codec, the
sooner we'll get more feedback and help to continue
improving it. As long as it's correct (it only says it does
GB2312, if that's all it does, no problem) and reasonably
fast. I think few people find their way to python-codecs, get
the codecs, and install them. I had problems with installing
them under 2.2, but managed with some hackish copying.
If you already have support for GBK, even better! :) GBK is,
as you say, the standard in the mainland. Zhu Rongji can't
write his name with GB2312, right. :)
Performance and beauty are secondary to actually having
something that works. Of course, if we can do it now, even
better. I just hope we're not waiting for the "perfect"
implementation before pushing this into the standard distro.
The dream is to see Python, out-of-the-box, have excellent
support for Chinese, Japanese, and Korean. I think it'll help
immensely with the adoption of Python.
Logged In: YES
user_id=37068
Support for GBK (Microsoft's CP936) is pretty important:
supporting GB2312 encodings goes part of the way, but not all the
way.
What we *really* want though is support for GB 18030, which
covers the repertoirs of GBK and 2312 in the same character
positions, as well as the rest of Unicode and beyond.
For a brief description of GB 18030, see
http://lisa.org/archive_domain/newsletters/2002/2.3/
emerson.html
Logged In: YES
user_id=55032
Hi Adoal.
I am also in mainland China (Beijing), and I've seen the same
problem that sometimes (not lately, though) I cannot access,
e.g., python-codecs.sf.net, but I can access
sf.net/projects/python-codecs.
I'd love to help you put these changes into the python-
codecs CVS, but I don't have write access to the repository.
[Can someone give Adoal and me access?]
( PS! If you cannot access python-codecs, how come you
can access your pythonzh project? Or can't you? )
Logged In: YES
user_id=55032
Hm, I was tricked by the formatting of the email from
sourceforge, so I replied to the initial request... Please ignore.
Logged In: YES
user_id=55032
How about this project?
http://developer.berlios.de/projects/cjkpython/
It looks like it supports GB18030 already, and it is
implemented in C so it should be fast and compact.