From: Mehendran T <TMe...@no...> - 2007-11-22 11:02:01
|
Hi,=20 I started to implement unicodedata module. And I have just done some=20 investigation on that.=20 In Python, it is written in C. It has a unicodename_db.h which contains=20= entire Unicode info which had taken from the UnicodeData.txt=20 from ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt.=20 and unicodename_db.h is a generated file from the Tools/makeunicodedata.py.= =20 In Java, I have seen two classes Character and a static class=20 Character.UnicodeBlock where we can get the Unicode values. But I think = those=20 mayl not be useful to write the module properly.=20 Please tell me how to proceed to implement the Unicodedata module.=20 My question is :=20 Shall I follow the same design as CPython does or=20 Does java itself support to map unicode data values, if so, how can I = make use=20 of that?=20 Thanks,=20 Mehendran=20 |
From: Philip J. <pj...@gr...> - 2007-11-22 19:28:02
|
On Nov 22, 2007, at 2:57 AM, Mehendran T wrote: > Hi, > > I started to implement unicodedata module. And I have just done some > investigation on that. > > In Python, it is written in C. It has a unicodename_db.h which > contains > entire Unicode info which had taken from the UnicodeData.txt > from ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt. > and unicodename_db.h is a generated file from the Tools/ > makeunicodedata.py. > > In Java, I have seen two classes Character and a static class > Character.UnicodeBlock where we can get the Unicode values. But I > think those > mayl not be useful to write the module properly. > > Please tell me how to proceed to implement the Unicodedata module. > > My question is : > Shall I follow the same design as CPython does or > Does java itself support to map unicode data values, if so, how > can I make use > of that? > James Abley has actually already started working on a unicodedata module -- I know this because he's blogged about his progress a few times. Though I haven't seen a blog entry about it in a little while. You should shoot him an email and maybe you two can collaborate on finishing it up. The jython section of his blog is here: http://eternusuk.blogspot.com/search/label/jython -- Philip Jenvey |
From: James A. <jam...@gm...> - 2007-11-23 09:38:53
|
Yeah, sorry guys, I've been a bit slack with that. I should have some free time this weekend. I've got an implementation that has most of the work done, and supports Unicode 5.0. Done: * category * combining * decimal * decomposition * digit * mirrored * name * numeric * unidata_version Outstanding * east_asian_width * lookup * normal * ucd_4_1_0 ? * ucd_3_2_0 ? I can make the code available - I've got some space and should be able to expose it via hg or bzr (I think), or just send you a .tgz if you're interested. But first I need to pull in the updates from main trunk - I've not got the latest build changes in, so I'll have a small merge to do first. There are some open questions, such as do we want to provide an implementation that supports Unicode 4.1, as per CPython 2.5 and Unicode 3.2, as per CPython 2.3. I think that the worth of such implementations is questionable. Performance can probably be improved - for my first draft I went for a simple brute force approach, but a smaller table lookup would be possible, a la Xerces XMLChar or XOM Verifier. Cheers, James On 22/11/2007, Philip Jenvey <pj...@gr...> wrote: > > On Nov 22, 2007, at 2:57 AM, Mehendran T wrote: > > > Hi, > > > > I started to implement unicodedata module. And I have just done some > > investigation on that. > > > > In Python, it is written in C. It has a unicodename_db.h which > > contains > > entire Unicode info which had taken from the UnicodeData.txt > > from ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt. > > and unicodename_db.h is a generated file from the Tools/ > > makeunicodedata.py. > > > > In Java, I have seen two classes Character and a static class > > Character.UnicodeBlock where we can get the Unicode values. But I > > think those > > mayl not be useful to write the module properly. > > > > Please tell me how to proceed to implement the Unicodedata module. > > > > My question is : > > Shall I follow the same design as CPython does or > > Does java itself support to map unicode data values, if so, how > > can I make use > > of that? > > > > James Abley has actually already started working on a unicodedata > module -- I know this because he's blogged about his progress a few > times. > > Though I haven't seen a blog entry about it in a little while. You > should shoot him an email and maybe you two can collaborate on > finishing it up. > > The jython section of his blog is here: > > http://eternusuk.blogspot.com/search/label/jython > > -- > Philip Jenvey > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Jython-dev mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-dev > |