Menu

#1 My version of ChineseCodecs

open
nobody
None
5
2001-12-28
2001-12-28
Adoal Xu
No

I am working on a version of ChineseCodecs based from
Tamito KAJIYAMA's JapaneseCodecs. It is quite
different from Chen Chien-Hsun's version. Since I am
from Mainland China, I am focusing on the de facto
standard GBK ahead of official standard GB2312.

I hope some part of my work can be merged into this
python-codecs project. But there is something wrong
with my network connection. Route to the shell host
(and cvs host) of sourceforge is blocked by China
Tellecomm. So I can only access
http://sourceforge.net/xxx.

Is there anyone who can help me to move my work into
this project? Please visit
http://sourceforge.net/projects/pythonzh/

Thanks a lot!

Discussion

  • Bjorn Stabell

    Bjorn Stabell - 2003-03-20

    Logged In: YES
    user_id=55032

    Anything happened with this?

    Does this mean the current Chinese support is only GB2312
    not GBK?

     
  • Adoal Xu

    Adoal Xu - 2003-03-20

    Logged In: YES
    user_id=166186

    As far as I browse cvs tree for python-codecs project, the
    ChineseCodecs subject still doesn't support GBK, only
    GB2312. And, the implementation is mixed with Python and
    C. My version is the same as Tamito KAJIYAMA's
    JapaneseCodecs. My version (and Tamito's) is easy to build
    and install, but Chen's is efficient.
    In the cvs tree, any of the Chinese, Japanese, Korean version
    use it's own method. But since there is something common
    between the three languages, it is a good idea to make a
    unified approach. I wonder when can this appear.

     
  • Bjorn Stabell

    Bjorn Stabell - 2003-03-20

    Logged In: YES
    user_id=55032

    So your version is slightly less efficient, but shares code/
    structure with the Japanese version, and also supports GBK?

    Personally, this is how I would prioritize work on the Chinese
    codec:

    1. Inclusion into standard distribution
    2. Support for GBK
    3. Performance
    4. Beauty and code sharing

    The sooner we can get wider distribution of the codec, the
    sooner we'll get more feedback and help to continue
    improving it. As long as it's correct (it only says it does
    GB2312, if that's all it does, no problem) and reasonably
    fast. I think few people find their way to python-codecs, get
    the codecs, and install them. I had problems with installing
    them under 2.2, but managed with some hackish copying.

    If you already have support for GBK, even better! :) GBK is,
    as you say, the standard in the mainland. Zhu Rongji can't
    write his name with GB2312, right. :)

    Performance and beauty are secondary to actually having
    something that works. Of course, if we can do it now, even
    better. I just hope we're not waiting for the "perfect"
    implementation before pushing this into the standard distro.

    The dream is to see Python, out-of-the-box, have excellent
    support for Chinese, Japanese, and Korean. I think it'll help
    immensely with the adoption of Python.

     
  • Tom Emerson

    Tom Emerson - 2003-10-04

    Logged In: YES
    user_id=37068

    Support for GBK (Microsoft's CP936) is pretty important:
    supporting GB2312 encodings goes part of the way, but not all the
    way.

    What we *really* want though is support for GB 18030, which
    covers the repertoirs of GBK and 2312 in the same character
    positions, as well as the rest of Unicode and beyond.

    For a brief description of GB 18030, see

    http://lisa.org/archive_domain/newsletters/2002/2.3/
    emerson.html

     
  • Bjorn Stabell

    Bjorn Stabell - 2003-10-05

    Logged In: YES
    user_id=55032

    Hi Adoal.

    I am also in mainland China (Beijing), and I've seen the same
    problem that sometimes (not lately, though) I cannot access,
    e.g., python-codecs.sf.net, but I can access
    sf.net/projects/python-codecs.

    I'd love to help you put these changes into the python-
    codecs CVS, but I don't have write access to the repository.
    [Can someone give Adoal and me access?]

    ( PS! If you cannot access python-codecs, how come you
    can access your pythonzh project? Or can't you? )

     
  • Bjorn Stabell

    Bjorn Stabell - 2003-10-05

    Logged In: YES
    user_id=55032

    Hm, I was tricked by the formatting of the email from
    sourceforge, so I replied to the initial request... Please ignore.

     
  • Bjorn Stabell

    Bjorn Stabell - 2003-10-05

    Logged In: YES
    user_id=55032

    How about this project?
    http://developer.berlios.de/projects/cjkpython/

    It looks like it supports GB18030 already, and it is
    implemented in C so it should be fast and compact.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.