koco-cvs Mailing List for Python Korean Codecs (Page 18)
Brought to you by:
perky
You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
(88) |
May
(5) |
Jun
|
Jul
(27) |
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(77) |
Feb
(3) |
Mar
|
Apr
(22) |
May
(123) |
Jun
(80) |
Jul
(83) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Chang <pe...@us...> - 2002-04-30 01:06:40
|
perky 02/04/29 18:06:39 Modified: . ChangeLog Log: - Update to 2.0.4 Revision Changes Path 1.10 +23 -0 KoreanCodecs/ChangeLog Index: ChangeLog =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/ChangeLog,v retrieving revision 1.9 retrieving revision 1.10 diff -u -r1.9 -r1.10 --- ChangeLog 29 Apr 2002 02:13:28 -0000 1.9 +++ ChangeLog 30 Apr 2002 01:06:39 -0000 1.10 @@ -1,4 +1,27 @@ ----------------------------------------------------------------------------- +Version 2.0.4 (2002-04-30) Tag: RELENG_2_0_4_RELEASE + +2002-04-30 10:05 Hye-Shik Chang <pe...@fa...> + + * LICENSE (1.6), README.en (1.17), README.ko (1.16), setup.py + (1.24): + + - Update to 2.0.4 and change copyright to LGPL + +2002-04-29 23:24 Hye-Shik Chang <pe...@fa...> + + * korean/python/hangul.py (1.9), src/hangul.c (1.10): + + - Add 'L', 'R', 'Z' as pseudo final alphabets + +2002-04-29 11:13 Hye-Shik Chang <pe...@fa...> + + * ChangeLog (1.9): + + - Update to 2.0.3 Release + + +----------------------------------------------------------------------------- Version 2.0.3 (2002-04-29) Tag: RELENG_2_0_3_RELEASE 2002-04-29 11:10 Hye-Shik Chang <pe...@fa...> |
From: Chang <pe...@us...> - 2002-04-30 01:05:18
|
perky 02/04/29 18:05:15 Modified: . LICENSE README.en README.ko setup.py Log: - Update to 2.0.4 and change copyright to LGPL Revision Changes Path 1.6 +454 -48 KoreanCodecs/LICENSE Index: LICENSE =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/LICENSE,v retrieving revision 1.5 retrieving revision 1.6 diff -u -r1.5 -r1.6 --- LICENSE 14 Mar 2002 20:30:39 -0000 1.5 +++ LICENSE 30 Apr 2002 01:05:15 -0000 1.6 @@ -1,52 +1,458 @@ -# $Id: LICENSE,v 1.5 2002/03/14 20:30:39 perky Exp $ + GNU LESSER GENERAL PUBLIC LICENSE + Version 2.1, February 1999 - KoreanCodecs License Agreement - ============================== + Copyright (C) 1991, 1999 Free Software Foundation, Inc. + 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. -1. This LICENSE AGREEMENT is between the Hye-Shik Chang("H.-S. Chang"), - and the Individual or Organization ("Licensee") accessing and otherwise - using KoreanCodecs software in source or binary form and its associated - documentation. - -2. Subject to the terms and conditions of this License Agreement, H.-S. Chang - hereby grants Licensee a nonexclusive, royalty-free, world-wide license to - reproduce, analyze, test, perform and/or display publicly, prepare - derivative works, distribute, and otherwise use KoreanCodecs alone or in - any derivative version, provided, however, that H.-S. Chang's License - Agreement and his notice of copyright, i.e., - - Copyright (c) 2002 Hye-Shik Chang; All Rights Reserved - - are retained in KoreanCodecs alone or in any derivative version prepared by - Licensee. - -3. In the event Licensee prepares a derivative work that is based on or - incorporates H.-S. Chang or any part thereof, and wants to make the - derivative work available to others as provided herein, then Licensee - hereby agrees to include in any such work a brief summary of the changes - made to KoreanCodecs. - -4. H.-S. Chang is making KoreanCodecs available to Licensee on an "AS IS" basis. - HYE-SHIK CHANG MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. - BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY - REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR - PURPOSE OR THAT THE USE OF KOREANCODECS WILL NOT INFRINGE ANY THIRD PARTY - RIGHTS. - -5. HYE-SHIK CHANG SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF - KOREANCODECS FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS - AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING KOREANCODECS, - OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. - -6. This License Agreement will automatically terminate upon a material breach - of its terms and conditions. - -7. Nothing in this License Agreement shall be deemed to create any relationship - of agency, partnership, or joint venture between H.-S. Chang and Licensee. - This License Agreement does not grant permission to use H.-S. Chang - trademarks or trade name in a trademark sense to endorse or promote products - or services of Licensee, or any third party. +[This is the first released version of the Lesser GPL. It also counts + as the successor of the GNU Library Public License, version 2, hence + the version number 2.1.] -8. By copying, installing or otherwise using KoreanCodecs, Licensee agrees to - be bound by the terms and conditions of this License Agreement. + Preamble + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +Licenses are intended to guarantee your freedom to share and change +free software--to make sure the software is free for all its users. + + This license, the Lesser General Public License, applies to some +specially designated software packages--typically libraries--of the +Free Software Foundation and other authors who decide to use it. You +can use it too, but we suggest you first think carefully about whether +this license or the ordinary General Public License is the better +strategy to use in any particular case, based on the explanations below. + + When we speak of free software, we are referring to freedom of use, +not price. Our General Public Licenses are designed to make sure that +you have the freedom to distribute copies of free software (and charge +for this service if you wish); that you receive source code or can get +it if you want it; that you can change the software and use pieces of +it in new free programs; and that you are informed that you can do +these things. + + To protect your rights, we need to make restrictions that forbid +distributors to deny you these rights or to ask you to surrender these +rights. These restrictions translate to certain responsibilities for +you if you distribute copies of the library or if you modify it. + + For example, if you distribute copies of the library, whether gratis +or for a fee, you must give the recipients all the rights that we gave +you. You must make sure that they, too, receive or can get the source +code. If you link other code with the library, you must provide +complete object files to the recipients, so that they can relink them +with the library after making changes to the library and recompiling +it. And you must show them these terms so they know their rights. + + We protect your rights with a two-step method: (1) we copyright the +library, and (2) we offer you this license, which gives you legal +permission to copy, distribute and/or modify the library. + + To protect each distributor, we want to make it very clear that +there is no warranty for the free library. Also, if the library is +modified by someone else and passed on, the recipients should know +that what they have is not the original version, so that the original +author's reputation will not be affected by problems that might be +introduced by others. + + Finally, software patents pose a constant threat to the existence of +any free program. We wish to make sure that a company cannot +effectively restrict the users of a free program by obtaining a +restrictive license from a patent holder. Therefore, we insist that +any patent license obtained for a version of the library must be +consistent with the full freedom of use specified in this license. + + Most GNU software, including some libraries, is covered by the +ordinary GNU General Public License. This license, the GNU Lesser +General Public License, applies to certain designated libraries, and +is quite different from the ordinary General Public License. We use +this license for certain libraries in order to permit linking those +libraries into non-free programs. + + When a program is linked with a library, whether statically or using +a shared library, the combination of the two is legally speaking a +combined work, a derivative of the original library. The ordinary +General Public License therefore permits such linking only if the +entire combination fits its criteria of freedom. The Lesser General +Public License permits more lax criteria for linking other code with +the library. + + We call this license the "Lesser" General Public License because it +does Less to protect the user's freedom than the ordinary General +Public License. It also provides other free software developers Less +of an advantage over competing non-free programs. These disadvantages +are the reason we use the ordinary General Public License for many +libraries. However, the Lesser license provides advantages in certain +special circumstances. + + For example, on rare occasions, there may be a special need to +encourage the widest possible use of a certain library, so that it becomes +a de-facto standard. To achieve this, non-free programs must be +allowed to use the library. A more frequent case is that a free +library does the same job as widely used non-free libraries. In this +case, there is little to gain by limiting the free library to free +software only, so we use the Lesser General Public License. + + In other cases, permission to use a particular library in non-free +programs enables a greater number of people to use a large body of +free software. For example, permission to use the GNU C Library in +non-free programs enables many more people to use the whole GNU +operating system, as well as its variant, the GNU/Linux operating +system. + + Although the Lesser General Public License is Less protective of the +users' freedom, it does ensure that the user of a program that is +linked with the Library has the freedom and the wherewithal to run +that program using a modified version of the Library. + + The precise terms and conditions for copying, distribution and +modification follow. Pay close attention to the difference between a +"work based on the library" and a "work that uses the library". The +former contains code derived from the library, whereas the latter must +be combined with the library in order to run. + + GNU LESSER GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License Agreement applies to any software library or other +program which contains a notice placed by the copyright holder or +other authorized party saying it may be distributed under the terms of +this Lesser General Public License (also called "this License"). +Each licensee is addressed as "you". + + A "library" means a collection of software functions and/or data +prepared so as to be conveniently linked with application programs +(which use some of those functions and data) to form executables. + + The "Library", below, refers to any such software library or work +which has been distributed under these terms. A "work based on the +Library" means either the Library or any derivative work under +copyright law: that is to say, a work containing the Library or a +portion of it, either verbatim or with modifications and/or translated +straightforwardly into another language. (Hereinafter, translation is +included without limitation in the term "modification".) + + "Source code" for a work means the preferred form of the work for +making modifications to it. For a library, complete source code means +all the source code for all modules it contains, plus any associated +interface definition files, plus the scripts used to control compilation +and installation of the library. + + Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running a program using the Library is not restricted, and output from +such a program is covered only if its contents constitute a work based +on the Library (independent of the use of the Library in a tool for +writing it). Whether that is true depends on what the Library does +and what the program that uses the Library does. + + 1. You may copy and distribute verbatim copies of the Library's +complete source code as you receive it, in any medium, provided that +you conspicuously and appropriately publish on each copy an +appropriate copyright notice and disclaimer of warranty; keep intact +all the notices that refer to this License and to the absence of any +warranty; and distribute a copy of this License along with the +Library. + + You may charge a fee for the physical act of transferring a copy, +and you may at your option offer warranty protection in exchange for a +fee. + + 2. You may modify your copy or copies of the Library or any portion +of it, thus forming a work based on the Library, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) The modified work must itself be a software library. + + b) You must cause the files modified to carry prominent notices + stating that you changed the files and the date of any change. + + c) You must cause the whole of the work to be licensed at no + charge to all third parties under the terms of this License. + + d) If a facility in the modified Library refers to a function or a + table of data to be supplied by an application program that uses + the facility, other than as an argument passed when the facility + is invoked, then you must make a good faith effort to ensure that, + in the event an application does not supply such function or + table, the facility still operates, and performs whatever part of + its purpose remains meaningful. + + (For example, a function in a library to compute square roots has + a purpose that is entirely well-defined independent of the + application. Therefore, Subsection 2d requires that any + application-supplied function or table used by this function must + be optional: if the application does not supply it, the square + root function must still compute square roots.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Library, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Library, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote +it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Library. + +In addition, mere aggregation of another work not based on the Library +with the Library (or with a work based on the Library) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may opt to apply the terms of the ordinary GNU General Public +License instead of this License to a given copy of the Library. To do +this, you must alter all the notices that refer to this License, so +that they refer to the ordinary GNU General Public License, version 2, +instead of to this License. (If a newer version than version 2 of the +ordinary GNU General Public License has appeared, then you can specify +that version instead if you wish.) Do not make any other change in +these notices. + + Once this change is made in a given copy, it is irreversible for +that copy, so the ordinary GNU General Public License applies to all +subsequent copies and derivative works made from that copy. + + This option is useful when you wish to copy part of the code of +the Library into a program that is not a library. + + 4. You may copy and distribute the Library (or a portion or +derivative of it, under Section 2) in object code or executable form +under the terms of Sections 1 and 2 above provided that you accompany +it with the complete corresponding machine-readable source code, which +must be distributed under the terms of Sections 1 and 2 above on a +medium customarily used for software interchange. + + If distribution of object code is made by offering access to copy +from a designated place, then offering equivalent access to copy the +source code from the same place satisfies the requirement to +distribute the source code, even though third parties are not +compelled to copy the source along with the object code. + + 5. A program that contains no derivative of any portion of the +Library, but is designed to work with the Library by being compiled or +linked with it, is called a "work that uses the Library". Such a +work, in isolation, is not a derivative work of the Library, and +therefore falls outside the scope of this License. + + However, linking a "work that uses the Library" with the Library +creates an executable that is a derivative of the Library (because it +contains portions of the Library), rather than a "work that uses the +library". The executable is therefore covered by this License. +Section 6 states terms for distribution of such executables. + + When a "work that uses the Library" uses material from a header file +that is part of the Library, the object code for the work may be a +derivative work of the Library even though the source code is not. +Whether this is true is especially significant if the work can be +linked without the Library, or if the work is itself a library. The +threshold for this to be true is not precisely defined by law. + + If such an object file uses only numerical parameters, data +structure layouts and accessors, and small macros and small inline +functions (ten lines or less in length), then the use of the object +file is unrestricted, regardless of whether it is legally a derivative +work. (Executables containing this object code plus portions of the +Library will still fall under Section 6.) + + Otherwise, if the work is a derivative of the Library, you may +distribute the object code for the work under the terms of Section 6. +Any executables containing that work also fall under Section 6, +whether or not they are linked directly with the Library itself. + + 6. As an exception to the Sections above, you may also combine or +link a "work that uses the Library" with the Library to produce a +work containing portions of the Library, and distribute that work +under terms of your choice, provided that the terms permit +modification of the work for the customer's own use and reverse +engineering for debugging such modifications. + + You must give prominent notice with each copy of the work that the +Library is used in it and that the Library and its use are covered by +this License. You must supply a copy of this License. If the work +during execution displays copyright notices, you must include the +copyright notice for the Library among them, as well as a reference +directing the user to the copy of this License. Also, you must do one +of these things: + + a) Accompany the work with the complete corresponding + machine-readable source code for the Library including whatever + changes were used in the work (which must be distributed under + Sections 1 and 2 above); and, if the work is an executable linked + with the Library, with the complete machine-readable "work that + uses the Library", as object code and/or source code, so that the + user can modify the Library and then relink to produce a modified + executable containing the modified Library. (It is understood + that the user who changes the contents of definitions files in the + Library will not necessarily be able to recompile the application + to use the modified definitions.) + + b) Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (1) uses at run time a + copy of the library already present on the user's computer system, + rather than copying library functions into the executable, and (2) + will operate properly with a modified version of the library, if + the user installs one, as long as the modified version is + interface-compatible with the version that the work was made with. + + c) Accompany the work with a written offer, valid for at + least three years, to give the same user the materials + specified in Subsection 6a, above, for a charge no more + than the cost of performing this distribution. + + d) If distribution of the work is made by offering access to copy + from a designated place, offer equivalent access to copy the above + specified materials from the same place. + + e) Verify that the user has already received a copy of these + materials or that you have already sent this user a copy. + + For an executable, the required form of the "work that uses the +Library" must include any data and utility programs needed for +reproducing the executable from it. However, as a special exception, +the materials to be distributed need not include anything that is +normally distributed (in either source or binary form) with the major +components (compiler, kernel, and so on) of the operating system on +which the executable runs, unless that component itself accompanies +the executable. + + It may happen that this requirement contradicts the license +restrictions of other proprietary libraries that do not normally +accompany the operating system. Such a contradiction means you cannot +use both them and the Library together in an executable that you +distribute. + + 7. You may place library facilities that are a work based on the +Library side-by-side in a single library together with other library +facilities not covered by this License, and distribute such a combined +library, provided that the separate distribution of the work based on +the Library and of the other library facilities is otherwise +permitted, and provided that you do these two things: + + a) Accompany the combined library with a copy of the same work + based on the Library, uncombined with any other library + facilities. This must be distributed under the terms of the + Sections above. + + b) Give prominent notice with the combined library of the fact + that part of it is a work based on the Library, and explaining + where to find the accompanying uncombined form of the same work. + + 8. You may not copy, modify, sublicense, link with, or distribute +the Library except as expressly provided under this License. Any +attempt otherwise to copy, modify, sublicense, link with, or +distribute the Library is void, and will automatically terminate your +rights under this License. However, parties who have received copies, +or rights, from you under this License will not have their licenses +terminated so long as such parties remain in full compliance. + + 9. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Library or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Library (or any work based on the +Library), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Library or works based on it. + + 10. Each time you redistribute the Library (or any work based on the +Library), the recipient automatically receives a license from the +original licensor to copy, distribute, link with or modify the Library +subject to these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties with +this License. + + 11. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Library at all. For example, if a patent +license would not permit royalty-free redistribution of the Library by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Library. + +If any portion of this section is held invalid or unenforceable under any +particular circumstance, the balance of the section is intended to apply, +and the section as a whole is intended to apply in other circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 12. If the distribution and/or use of the Library is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Library under this License may add +an explicit geographical distribution limitation excluding those countries, +so that distribution is permitted only in or among countries not thus +excluded. In such case, this License incorporates the limitation as if +written in the body of this License. + + 13. The Free Software Foundation may publish revised and/or new +versions of the Lesser General Public License from time to time. +Such new versions will be similar in spirit to the present version, +but may differ in detail to address new problems or concerns. + +Each version is given a distinguishing version number. If the Library +specifies a version number of this License which applies to it and +"any later version", you have the option of following the terms and +conditions either of that version or of any later version published by +the Free Software Foundation. If the Library does not specify a +license version number, you may choose any version ever published by +the Free Software Foundation. + + 14. If you wish to incorporate parts of the Library into other free +programs whose distribution conditions are incompatible with these, +write to the author to ask for permission. For software which is +copyrighted by the Free Software Foundation, write to the Free +Software Foundation; we sometimes make exceptions for this. Our +decision will be guided by the two goals of preserving the free status +of all derivatives of our free software and of promoting the sharing +and reuse of software generally. + + NO WARRANTY + + 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO +WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. +EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR +OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY +KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE +LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME +THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN +WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY +AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU +FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR +CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE +LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING +RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A +FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF +SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH +DAMAGES. + + END OF TERMS AND CONDITIONS 1.17 +5 -7 KoreanCodecs/README.en Index: README.en =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README.en,v retrieving revision 1.16 retrieving revision 1.17 diff -u -r1.16 -r1.17 --- README.en 29 Apr 2002 02:10:20 -0000 1.16 +++ README.en 30 Apr 2002 01:05:15 -0000 1.17 @@ -1,8 +1,8 @@ -KoreanCodecs version 2.0.3 +KoreanCodecs version 2.0.4 ========================== Copyright(C) Hye-Shik Chang, 2002. -$Id: README.en,v 1.16 2002/04/29 02:10:20 perky Exp $ +$Id: README.en,v 1.17 2002/04/30 01:05:15 perky Exp $ @@ -93,16 +93,11 @@ o ISO-2022-KR (RFC1557) - korean.iso-2022-kr -o ISO-2022-KR-1 - - korean.iso-2022-kr-1 (proposed on 2.1) - o Unicode Johab - korean.unijohab o Qwerty Key Stroke Mapping - korean.qwerty2bul - - korean.qwerty3bul (proposed on 2.1) - - korean.qwerty3bul-390 (proposed on 2.1) You can omit 'korean.' after importing 'korean.aliases' module. @@ -118,6 +113,9 @@ History ------- + +o Version 2.0.4 - 30 April 2002 + - Copyright has changed to LGPL o Version 2.0.3 - 29 April 2002 - added hangul module C implementation 1.16 +5 -11 KoreanCodecs/README.ko Index: README.ko =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README.ko,v retrieving revision 1.15 retrieving revision 1.16 diff -u -r1.15 -r1.16 --- README.ko 29 Apr 2002 02:10:21 -0000 1.15 +++ README.ko 30 Apr 2002 01:05:15 -0000 1.16 @@ -1,8 +1,8 @@ -ÇѱÛÄÚµ¦ ¹öÁ¯ 2.0.3 +ÇѱÛÄÚµ¦ ¹öÁ¯ 2.0.4 =================== Copyright(C) Hye-Shik Chang, 2002. -$Id: README.ko,v 1.15 2002/04/29 02:10:21 perky Exp $ +$Id: README.ko,v 1.16 2002/04/30 01:05:15 perky Exp $ *Ä·ÆäÀÎ* ÀÎÅͳݿ¡¼ ÇÑ±Û ¸ÂÃã¹ýÀ» Áöŵ½Ã´Ù. ^-^/~ @@ -96,15 +96,6 @@ o Qwerty ÀÚÆÇ ¸ÅÇÎ - korean.qwerty2bul : 2¹ú½Ä - ÄõƼÀÚÆÇ ¸ÅÇÎ - -´ÙÀ½ ÄÚµ¦µéÀº 2.1 ¹öÁ¯¿¡¼ Á¦°øÇÏ·Á°í ÁغñÁßÀÔ´Ï´Ù. - -o ISO-2022-KR-1 - - korean.iso-2022-kr-1 -o Qwerty ÀÚÆÇ ¸ÅÇÎ - - korean.qwerty3bul : 3¹ú½Ä - ÄõƼÀÚÆÇ ¸ÅÇÎ - - korean.qwerty3bul-390 : 3¹ú½Ä 390 - ÄõƼÀÚÆÇ ¸ÅÇÎ - ÄÚµ¦À̸§¿¡¼ korean. ºÎºÐÀº korean.aliases¸ðµâÀ» ÀÓÆ÷Æ®ÇÏ¸é »ý·«ÇÒ ¼ö ÀÖ½À´Ï´Ù. @@ -121,6 +112,9 @@ ¿ª»ç ---- + +o ¹öÁ¯ 2.0.4 2002³â 4¿ù 30ÀÏ + - LGPL·Î ¶óÀ̼¾½º º¯°æ o ¹öÁ¯ 2.0.3 2002³â 4¿ù 29ÀÏ - hangul ¸ðµâ C ±¸Çö Ãß°¡ 1.24 +3 -3 KoreanCodecs/setup.py Index: setup.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/setup.py,v retrieving revision 1.23 retrieving revision 1.24 diff -u -r1.23 -r1.24 --- setup.py 29 Apr 2002 02:10:46 -0000 1.23 +++ setup.py 30 Apr 2002 01:05:15 -0000 1.24 @@ -1,5 +1,5 @@ #!/usr/bin/env python -# $Id: setup.py,v 1.23 2002/04/29 02:10:46 perky Exp $ +# $Id: setup.py,v 1.24 2002/04/30 01:05:15 perky Exp $ import sys from distutils.core import setup, Extension @@ -32,7 +32,7 @@ org_install_lib or self.install_purelib setup (name = "KoreanCodecs", - version = "2.0.3", + version = "2.0.4", description = "Korean Codecs for Python Unicode Support", long_description = "This package provides Unicode codecs that " "make Python aware of Korean character encodings such as " @@ -41,7 +41,7 @@ "instead of a byte sequence.", author = "Hye-Shik Chang", author_email = "pe...@fa...", - license = "Python License", + license = "LGPL", url = "http://sourceforge.net/projects/koco", cmdclass = {'install': Install}, packages = ['korean', |
From: Chang <pe...@us...> - 2002-04-29 14:24:27
|
perky 02/04/29 07:24:25 Modified: korean/python hangul.py Log: - Add 'L', 'R', 'Z' as pseudo final alphabets Revision Changes Path 1.9 +2 -2 KoreanCodecs/korean/python/hangul.py Index: hangul.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/python/hangul.py,v retrieving revision 1.8 retrieving revision 1.9 diff -u -r1.8 -r1.9 --- hangul.py 26 Apr 2002 07:46:59 -0000 1.8 +++ hangul.py 29 Apr 2002 14:24:25 -0000 1.9 @@ -15,7 +15,7 @@ # Conjoining Jamo Behavior: # http://www.unicode.org/unicode/uni2book/ch03.pdf (section 3.11) # -# $Id: hangul.py,v 1.8 2002/04/26 07:46:59 perky Exp $ +# $Id: hangul.py,v 1.9 2002/04/29 14:24:25 perky Exp $ # class UnicodeHangulError(Exception): @@ -208,7 +208,7 @@ if u'\uac00' <= c <= u'\ud7a3': # hangul return 1, (ord(c) - 0xac00) % 28 > 0 else: - return 0, c in u'013678.bklmnptMN' + return 0, c in u'013678.bklmnptLMNRZ' # Iterator Emulator for ancient versions before 2.1 try: |
From: Chang <pe...@us...> - 2002-04-29 14:24:27
|
perky 02/04/29 07:24:25 Modified: src hangul.c Log: - Add 'L', 'R', 'Z' as pseudo final alphabets Revision Changes Path 1.10 +5 -5 KoreanCodecs/src/hangul.c Index: hangul.c =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/hangul.c,v retrieving revision 1.9 retrieving revision 1.10 diff -u -r1.9 -r1.10 --- hangul.c 27 Apr 2002 21:01:19 -0000 1.9 +++ hangul.c 29 Apr 2002 14:24:25 -0000 1.10 @@ -4,14 +4,14 @@ * KoreanCodecs Hangul Module C Implementation * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/27 21:01:19 $ + * Date : $Date: 2002/04/29 14:24:25 $ * Created : 25 April 2002 * - * $Revision: 1.9 $ + * $Revision: 1.10 $ */ static char *version = -"$Id: hangul.c,v 1.9 2002/04/27 21:01:19 perky Exp $"; +"$Id: hangul.c,v 1.10 2002/04/29 14:24:25 perky Exp $"; #include "Python.h" @@ -425,8 +425,8 @@ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 1 */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, /* 2 */ 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, /* 3 */ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, /* 4 */ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 5 */ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, /* 4 */ + 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, /* 5 */ 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, /* 6 */ 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 7 */ }; |
From: Chang <pe...@us...> - 2002-04-29 02:13:29
|
perky 02/04/28 19:13:28 Modified: . ChangeLog Log: - Update to 2.0.3 Release Revision Changes Path 1.9 +28 -0 KoreanCodecs/ChangeLog Index: ChangeLog =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/ChangeLog,v retrieving revision 1.8 retrieving revision 1.9 diff -u -r1.8 -r1.9 --- ChangeLog 28 Apr 2002 09:39:23 -0000 1.8 +++ ChangeLog 29 Apr 2002 02:13:28 -0000 1.9 @@ -1,4 +1,32 @@ ----------------------------------------------------------------------------- +Version 2.0.3 (2002-04-29) Tag: RELENG_2_0_3_RELEASE + +2002-04-29 11:10 Hye-Shik Chang <pe...@fa...> + + * README.en (1.16), README.ko (1.15), setup.py (1.23): + + - Update to 2.0.3 Release + +2002-04-28 20:35 Hye-Shik Chang <pe...@fa...> + + * doc/benchmarks.txt (1.2): + + - Add hangul tests + +2002-04-28 19:27 Hye-Shik Chang <pe...@fa...> + + * doc/roadmap.txt (1.1): + + - Add roadmap ;) + +2002-04-28 18:39 Hye-Shik Chang <pe...@fa...> + + * ChangeLog (1.8): + + - Update to 2.0.3b2 + + +----------------------------------------------------------------------------- Version 2.0.3b2 (2002-04-28) Tag: RELENG_2_0_3_BETA1 (slided from b1) 2002-04-28 18:37 Hye-Shik Chang <pe...@fa...> |
From: Chang <pe...@us...> - 2002-04-29 02:10:46
|
perky 02/04/28 19:10:46 Modified: . setup.py Log: - Update to 2.0.3 Release Revision Changes Path 1.23 +2 -2 KoreanCodecs/setup.py Index: setup.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/setup.py,v retrieving revision 1.22 retrieving revision 1.23 diff -u -r1.22 -r1.23 --- setup.py 28 Apr 2002 09:37:19 -0000 1.22 +++ setup.py 29 Apr 2002 02:10:46 -0000 1.23 @@ -1,5 +1,5 @@ #!/usr/bin/env python -# $Id: setup.py,v 1.22 2002/04/28 09:37:19 perky Exp $ +# $Id: setup.py,v 1.23 2002/04/29 02:10:46 perky Exp $ import sys from distutils.core import setup, Extension @@ -32,7 +32,7 @@ org_install_lib or self.install_purelib setup (name = "KoreanCodecs", - version = "2.0.3b2", + version = "2.0.3", description = "Korean Codecs for Python Unicode Support", long_description = "This package provides Unicode codecs that " "make Python aware of Korean character encodings such as " |
From: Chang <pe...@us...> - 2002-04-29 02:10:24
|
perky 02/04/28 19:10:21 Modified: . README.en README.ko Log: - Update to 2.0.3 Release Revision Changes Path 1.16 +4 -4 KoreanCodecs/README.en Index: README.en =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README.en,v retrieving revision 1.15 retrieving revision 1.16 diff -u -r1.15 -r1.16 --- README.en 28 Apr 2002 09:37:19 -0000 1.15 +++ README.en 29 Apr 2002 02:10:20 -0000 1.16 @@ -1,8 +1,8 @@ -KoreanCodecs version 2.0.3b2 -============================ +KoreanCodecs version 2.0.3 +========================== Copyright(C) Hye-Shik Chang, 2002. -$Id: README.en,v 1.15 2002/04/28 09:37:19 perky Exp $ +$Id: README.en,v 1.16 2002/04/29 02:10:20 perky Exp $ @@ -119,7 +119,7 @@ History ------- -o Version 2.0.3 - April 2002 +o Version 2.0.3 - 29 April 2002 - added hangul module C implementation (which means, johab, unijohab and qwerty2bul have gotten faster) - added StreamReader C implementation for EUC-KR and CP949 1.15 +4 -4 KoreanCodecs/README.ko Index: README.ko =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README.ko,v retrieving revision 1.14 retrieving revision 1.15 diff -u -r1.14 -r1.15 --- README.ko 28 Apr 2002 09:37:19 -0000 1.14 +++ README.ko 29 Apr 2002 02:10:21 -0000 1.15 @@ -1,8 +1,8 @@ -ÇѱÛÄÚµ¦ ¹öÁ¯ 2.0.3b2 -===================== +ÇѱÛÄÚµ¦ ¹öÁ¯ 2.0.3 +=================== Copyright(C) Hye-Shik Chang, 2002. -$Id: README.ko,v 1.14 2002/04/28 09:37:19 perky Exp $ +$Id: README.ko,v 1.15 2002/04/29 02:10:21 perky Exp $ *Ä·ÆäÀÎ* ÀÎÅͳݿ¡¼ ÇÑ±Û ¸ÂÃã¹ýÀ» Áöŵ½Ã´Ù. ^-^/~ @@ -122,7 +122,7 @@ ¿ª»ç ---- -o ¹öÁ¯ 2.0.3 2002³â 4¿ù +o ¹öÁ¯ 2.0.3 2002³â 4¿ù 29ÀÏ - hangul ¸ðµâ C ±¸Çö Ãß°¡ (ÀÌ È®ÀåÀ¸·Î johab, unijohab, qwerty2bul ÄÚµ¦ÀÌ »¡¶óÁý´Ï´Ù.) - EUC-KR, CP949 ÄÚµ¦À» À§ÇÑ StreamReader C ±¸Çö Ãß°¡ |
From: Chang <pe...@us...> - 2002-04-28 22:15:41
|
perky 02/04/27 23:54:11 Modified: . setup.py Log: - Fix unlimited access on boundary problem on readline_finalize - Let python.c.euc_kr uses _koco.StreamReader as stream reader Revision Changes Path 1.19 +2 -3 KoreanCodecs/setup.py Index: setup.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/setup.py,v retrieving revision 1.18 retrieving revision 1.19 diff -u -r1.18 -r1.19 --- setup.py 27 Apr 2002 04:48:36 -0000 1.18 +++ setup.py 28 Apr 2002 06:54:11 -0000 1.19 @@ -1,5 +1,5 @@ #!/usr/bin/env python -# $Id: setup.py,v 1.18 2002/04/27 04:48:36 perky Exp $ +# $Id: setup.py,v 1.19 2002/04/28 06:54:11 perky Exp $ import sys from distutils.core import setup, Extension @@ -20,7 +20,7 @@ class Install(install): def initialize_options (self): install.initialize_options(self) - if with_aliases: + if flavors['aliases']: if sys.hexversion >= '0x2010000': self.extra_path = ("korean", "import korean.aliases") else: @@ -51,5 +51,4 @@ ext_modules = flavors['extension'] and [ Extension("korean.c._koco", ["src/_koco.c"]), Extension("korean.c.hangul", ["src/hangul.c"]), - Extension("korean.c.twobytestream", ["src/twobytestream.c"]), ] or []) |
From: Chang <pe...@us...> - 2002-04-28 22:13:59
|
perky 02/04/27 18:03:50 Modified: korean/python euc_kr.py Log: - Fix error handling on trailing uncompleted character Revision Changes Path 1.4 +3 -2 KoreanCodecs/korean/python/euc_kr.py Index: euc_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/python/euc_kr.py,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- euc_kr.py 16 Mar 2002 02:35:20 -0000 1.3 +++ euc_kr.py 28 Apr 2002 01:03:49 -0000 1.4 @@ -1,7 +1,7 @@ # Hye-Shik Chang <16 Feb 2002> # originally written by Tamito KAJIYAMA # -# $Id: euc_kr.py,v 1.3 2002/03/16 02:35:20 perky Exp $ +# $Id: euc_kr.py,v 1.4 2002/04/28 01:03:49 perky Exp $ import codecs @@ -89,7 +89,8 @@ if errors == 'replace': buffer.append(u'\uFFFD') # REPLACEMENT CHARACTER elif errors == 'strict': - raise UnicodeError, "unexpected byte 0x%02x%02x found" % tuple(map(ord, c)) + raise UnicodeError, "unexpected byte 0x%s found" % ( + ''.join(["%02x"%ord(x) for x in c]) ) return (u''.join(buffer), size) |
From: Chang <pe...@us...> - 2002-04-28 21:40:35
|
perky 02/04/27 23:54:11 Modified: korean/c euc_kr.py Log: - Fix unlimited access on boundary problem on readline_finalize - Let python.c.euc_kr uses _koco.StreamReader as stream reader Revision Changes Path 1.2 +3 -48 KoreanCodecs/korean/c/euc_kr.py Index: euc_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/c/euc_kr.py,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- euc_kr.py 16 Mar 2002 01:59:18 -0000 1.1 +++ euc_kr.py 28 Apr 2002 06:54:11 -0000 1.2 @@ -1,6 +1,6 @@ # Hye-Shik Chang <15 Mar 2002> # -# $Id: euc_kr.py,v 1.1 2002/03/16 01:59:18 perky Exp $ +# $Id: euc_kr.py,v 1.2 2002/04/28 06:54:11 perky Exp $ import codecs import _koco @@ -12,53 +12,8 @@ class StreamWriter(Codec, codecs.StreamWriter): pass -class StreamReader(Codec, codecs.StreamReader): - - def __init__(self, stream, errors='strict'): - codecs.StreamReader.__init__(self, stream, errors) - self.data = '' - - def _read(self, func, size): - if size == 0: - return u'' - if size is None or size < 0: - data = self.data + func() - self.data = '' - else: - data = self.data + func(max(size, 2) - len(self.data)) - size = len(data) - p = 0 - while p < size: - if data[p] < "\x80": - p = p + 1 - elif p + 2 <= size: - p = p + 2 - else: - break - data, self.data = data[:p], data[p:] - return self.decode(data)[0] - - def read(self, size=-1): - return self._read(self.stream.read, size) - - def readline(self, size=-1): - return self._read(self.stream.readline, size) - - def readlines(self, size=-1): - data = self._read(self.stream.read, size) - buffer = [] - end = 0 - while 1: - pos = data.find(u'\n', end) - if pos < 0: - if end < len(data): - buffer.append(data[end:]) - break - buffer.append(data[end:pos+1]) - end = pos+1 - return buffer - def reset(self): - self.data = '' +class StreamReader(Codec, _koco.StreamReader, codecs.StreamReader): + encoding = 'euc-kr' ### encodings module API |
From: Chang <pe...@us...> - 2002-04-28 21:33:17
|
perky 02/04/27 23:54:12 Modified: src _koco.c koco_stream.h Log: - Fix unlimited access on boundary problem on readline_finalize - Let python.c.euc_kr uses _koco.StreamReader as stream reader Revision Changes Path 1.18 +7 -7 KoreanCodecs/src/_koco.c Index: _koco.c =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/_koco.c,v retrieving revision 1.17 retrieving revision 1.18 diff -u -r1.17 -r1.18 --- _koco.c 28 Apr 2002 06:16:07 -0000 1.17 +++ _koco.c 28 Apr 2002 06:54:11 -0000 1.18 @@ -4,14 +4,14 @@ * KoreanCodecs C Implementations * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/28 06:16:07 $ + * Date : $Date: 2002/04/28 06:54:11 $ * Created : 15 March 2002 * - * $Revision: 1.17 $ + * $Revision: 1.18 $ */ static char *version = -"$Id: _koco.c,v 1.17 2002/04/28 06:16:07 perky Exp $"; +"$Id: _koco.c,v 1.18 2002/04/28 06:54:11 perky Exp $"; #define UNIFIL 0xfffd @@ -23,10 +23,10 @@ PyObject* (*decoder)(state_t*, char*, int slen, int errtype, PyObject* (*finalizer)(const Py_UNICODE *, int)); } streaminfo; #define STATE_EXIST 0x100 -#define HAS_STATE(c) ((*(c))&STATE_EXIST) -#define GET_STATE(c) (unsigned char)((*(c))&0xFF) -#define RESET_STATE(c) ((*(c))&=0xFE00) -#define SET_STATE(c, v) (*(c)=STATE_EXIST|(v)) +#define HAS_STATE(c) ((c)&STATE_EXIST) +#define GET_STATE(c) (unsigned char)((c)&0xFF) +#define RESET_STATE(c) ((c)&=0xFE00) +#define SET_STATE(c, v) ((c)=STATE_EXIST|(v)) #ifndef max #define max(a, b) ((a)<(b) ? (b) : (a)) 1.2 +54 -20 KoreanCodecs/src/koco_stream.h Index: koco_stream.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/koco_stream.h,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- koco_stream.h 28 Apr 2002 06:16:07 -0000 1.1 +++ koco_stream.h 28 Apr 2002 06:54:12 -0000 1.2 @@ -4,10 +4,10 @@ * KoreanCodecs EUC-KR StreamReader C Implementation * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/28 06:16:07 $ + * Date : $Date: 2002/04/28 06:54:12 $ * Created : 28 April 2002 * - * $Revision: 1.1 $ + * $Revision: 1.2 $ */ static PyObject * @@ -23,8 +23,8 @@ srccur = s; srcend = s + slen; - if (HAS_STATE(state)) { - unsigned char c = GET_STATE(state); + if (HAS_STATE(*state)) { + unsigned char c = GET_STATE(*state); if (c & 0x80) { if (slen > 0) { @@ -60,13 +60,13 @@ } else *(destcur++) = c; - RESET_STATE(state); + RESET_STATE(*state); } for (; srccur < srcend; srccur++) { if (*srccur & 0x80) { if (srccur+1 >= srcend) /* state out */ - SET_STATE(state, *srccur); + SET_STATE(*state, *srccur); else { codemap = ksc5601_decode_map[*srccur & 0x7F]; if (!codemap) @@ -111,7 +111,7 @@ if ((list = PyList_New(0)) == NULL) return NULL; - for (;datalen--; data++) { + for (;(datalen--) > 0; data++) { if (*data == '\n') { append: if ((uobj = PyUnicode_FromUnicode(linestart, data-linestart+1)) == NULL) { Py_DECREF(list); @@ -125,8 +125,10 @@ linestart = data+1; } } - if (linestart < data) + if (linestart < data) { + data--; goto append; /* datalen < 0 here */ + } return list; } @@ -159,7 +161,7 @@ return NULL; stnfo = PyMem_New(streaminfo, 1); - RESET_STATE(&(stnfo->state)); + RESET_STATE(stnfo->state); if (!strcmp(encoding, "euc-kr")) stnfo->decoder = __euc_kr_decode; @@ -194,13 +196,23 @@ static PyObject* StreamReader_read(PyObject *typeself, PyObject *args) { - PyObject *self, *tmp, *r = NULL; + PyObject *self, *tmp = NULL, *r = NULL; PyObject *stream, *stnfoobj; streaminfo *stnfo; - int size = -1, errtype; + long size = -1; + int errtype; + + if (!PyArg_ParseTuple(args, "O|O:read", &self, &tmp)) + return NULL; - if (!PyArg_ParseTuple(args, "O|i:read", &self, &size)) + if (tmp == Py_None || tmp == NULL) + size = -1; + else if (PyInt_Check(tmp)) + size = PyInt_AsLong(tmp); + else { + PyErr_SetString(PyExc_TypeError, "an integer is required"); return NULL; + } if (size == 0) return PyUnicode_FromUnicode(NULL, 0); @@ -227,7 +239,8 @@ if (size < 0) tmp = PyObject_CallMethod(stream, "read", NULL); /* without tuple */ else - tmp = PyObject_CallMethod(stream, "read", "i", size); + tmp = PyObject_CallMethod(stream, "read", "i", + HAS_STATE(stnfo->state) ? size : max(2, size) ); if (tmp == NULL) goto out; @@ -249,14 +262,24 @@ static PyObject* StreamReader_readline(PyObject *typeself, PyObject *args) { - PyObject *self, *tmp, *r = NULL; + PyObject *self, *tmp = NULL, *r = NULL; PyObject *stream, *stnfoobj; streaminfo *stnfo; - int size = -1, errtype; + long size = -1; + int errtype; - if (!PyArg_ParseTuple(args, "O|i:readline", &self, &size)) + if (!PyArg_ParseTuple(args, "O|O:readline", &self, &tmp)) return NULL; + if (tmp == Py_None || tmp == NULL) + size = -1; + else if (PyInt_Check(tmp)) + size = PyInt_AsLong(tmp); + else { + PyErr_SetString(PyExc_TypeError, "an integer is required"); + return NULL; + } + if (size == 0) return PyUnicode_FromUnicode(NULL, 0); @@ -282,7 +305,8 @@ if (size < 0) tmp = PyObject_CallMethod(stream, "readline", NULL); /* without tuple */ else - tmp = PyObject_CallMethod(stream, "readline", "i", size); + tmp = PyObject_CallMethod(stream, "readline", "i", + HAS_STATE(stnfo->state) ? size : max(2, size) ); if (tmp == NULL) goto out; @@ -304,14 +328,23 @@ static PyObject* StreamReader_readlines(PyObject *typeself, PyObject *args) { - PyObject *self, *tmp, *r = NULL; + PyObject *self, *r = NULL, *tmp = NULL; PyObject *stream, *stnfoobj; streaminfo *stnfo; int size = -1, errtype; - if (!PyArg_ParseTuple(args, "O|i:readlines", &self, &size)) + if (!PyArg_ParseTuple(args, "O|O:readlines", &self, &tmp)) return NULL; + if (tmp == Py_None || tmp == NULL) + size = -1; + else if (PyInt_Check(tmp)) + size = PyInt_AsLong(tmp); + else { + PyErr_SetString(PyExc_TypeError, "an integer is required"); + return NULL; + } + if (size == 0) return PyUnicode_FromUnicode(NULL, 0); @@ -337,7 +370,8 @@ if (size < 0) tmp = PyObject_CallMethod(stream, "read", NULL); /* without tuple */ else - tmp = PyObject_CallMethod(stream, "read", "i", size); + tmp = PyObject_CallMethod(stream, "read", "i", + HAS_STATE(stnfo->state) ? size : max(2, size) ); if (tmp == NULL) goto out; @@ -369,7 +403,7 @@ return NULL; if ((stnfo = (streaminfo*)PyCObject_AsVoidPtr(stnfoobj)) != NULL) - RESET_STATE(&(stnfo->state)); + RESET_STATE(stnfo->state); Py_DECREF(stnfoobj); Py_INCREF(Py_None); |
From: Chang <pe...@us...> - 2002-04-28 21:30:47
|
perky 02/04/27 17:55:28 Modified: test test_cp949.py test_euc_kr.py Log: - Add error handling tests for uncompleted characters Revision Changes Path 1.4 +2 -0 KoreanCodecs/test/test_cp949.py Index: test_cp949.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_cp949.py,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- test_cp949.py 16 Mar 2002 02:17:01 -0000 1.3 +++ test_cp949.py 28 Apr 2002 00:55:28 -0000 1.4 @@ -9,7 +9,9 @@ errortests = ( # invalid bytes ("abc\x80\x80\xc1\xc4", "strict", None), + ("abc\xc8", "strict", None), ("abc\x80\x80\xc1\xc4", "replace", u"abc\ufffd\uc894"), + ("abc\x80\x80\xc1\xc4\xc8", "replace", u"abc\ufffd\uc894\ufffd"), ("abc\x80\x80\xc1\xc4", "ignore", u"abc\uc894"), ) 1.4 +2 -0 KoreanCodecs/test/test_euc_kr.py Index: test_euc_kr.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/test/test_euc_kr.py,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- test_euc_kr.py 16 Mar 2002 02:17:01 -0000 1.3 +++ test_euc_kr.py 28 Apr 2002 00:55:28 -0000 1.4 @@ -9,7 +9,9 @@ errortests = ( # invalid bytes ("abc\x80\x80\xc1\xc4", "strict", None), + ("abc\xc8", "strict", None), ("abc\x80\x80\xc1\xc4", "replace", u"abc\ufffd\uc894"), + ("abc\x80\x80\xc1\xc4\xc8", "replace", u"abc\ufffd\uc894\ufffd"), ("abc\x80\x80\xc1\xc4", "ignore", u"abc\uc894"), ) |
From: Chang <pe...@us...> - 2002-04-28 21:30:47
|
perky 02/04/28 02:39:23 Modified: . ChangeLog Log: - Update to 2.0.3b2 Revision Changes Path 1.8 +30 -0 KoreanCodecs/ChangeLog Index: ChangeLog =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/ChangeLog,v retrieving revision 1.7 retrieving revision 1.8 diff -u -r1.7 -r1.8 --- ChangeLog 28 Apr 2002 08:09:42 -0000 1.7 +++ ChangeLog 28 Apr 2002 09:39:23 -0000 1.8 @@ -1,4 +1,34 @@ ----------------------------------------------------------------------------- +Version 2.0.3b2 (2002-04-28) Tag: RELENG_2_0_3_BETA1 (slided from b1) + +2002-04-28 18:37 Hye-Shik Chang <pe...@fa...> + + * README.en (1.15), README.ko (1.14), setup.py (1.22): + + - Bump beta level to 2.0.3b2 for fixing version detect + (SourceForge doesn't allow update distfile!!) + +2002-04-28 18:31 Hye-Shik Chang <pe...@fa...> + + * setup.py (1.21): + + - Fix version detect error (with slide BETA1 tag) + +2002-04-28 18:10 Hye-Shik Chang <pe...@fa...> + + * doc/quick_start.txt (1.2): + + - Mention about --without-extension option + - Add usage for StreamReader and Writer + +2002-04-28 17:09 Hye-Shik Chang <pe...@fa...> + + * ChangeLog (1.7): + + - Update generated cvslog for 2.0.3b1 + + +----------------------------------------------------------------------------- Version 2.0.3b1 (2002-04-28) Tag: RELENG_2_0_3_BETA1 2002-04-28 17:08 Hye-Shik Chang <pe...@fa...> |
From: Chang <pe...@us...> - 2002-04-28 21:30:46
|
perky 02/04/28 02:37:20 Modified: . README.en README.ko setup.py Log: - Bump beta level to 2.0.3b2 for fixing version detect (SourceForge doesn't allow update distfile!!) Revision Changes Path 1.15 +2 -2 KoreanCodecs/README.en Index: README.en =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README.en,v retrieving revision 1.14 retrieving revision 1.15 diff -u -r1.14 -r1.15 --- README.en 28 Apr 2002 08:08:04 -0000 1.14 +++ README.en 28 Apr 2002 09:37:19 -0000 1.15 @@ -1,8 +1,8 @@ -KoreanCodecs version 2.0.3b1 +KoreanCodecs version 2.0.3b2 ============================ Copyright(C) Hye-Shik Chang, 2002. -$Id: README.en,v 1.14 2002/04/28 08:08:04 perky Exp $ +$Id: README.en,v 1.15 2002/04/28 09:37:19 perky Exp $ 1.14 +2 -2 KoreanCodecs/README.ko Index: README.ko =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README.ko,v retrieving revision 1.13 retrieving revision 1.14 diff -u -r1.13 -r1.14 --- README.ko 28 Apr 2002 08:08:04 -0000 1.13 +++ README.ko 28 Apr 2002 09:37:19 -0000 1.14 @@ -1,8 +1,8 @@ -ÇѱÛÄÚµ¦ ¹öÁ¯ 2.0.3b1 +ÇѱÛÄÚµ¦ ¹öÁ¯ 2.0.3b2 ===================== Copyright(C) Hye-Shik Chang, 2002. -$Id: README.ko,v 1.13 2002/04/28 08:08:04 perky Exp $ +$Id: README.ko,v 1.14 2002/04/28 09:37:19 perky Exp $ *Ä·ÆäÀÎ* ÀÎÅͳݿ¡¼ ÇÑ±Û ¸ÂÃã¹ýÀ» Áöŵ½Ã´Ù. ^-^/~ 1.22 +2 -2 KoreanCodecs/setup.py Index: setup.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/setup.py,v retrieving revision 1.21 retrieving revision 1.22 diff -u -r1.21 -r1.22 --- setup.py 28 Apr 2002 09:31:56 -0000 1.21 +++ setup.py 28 Apr 2002 09:37:19 -0000 1.22 @@ -1,5 +1,5 @@ #!/usr/bin/env python -# $Id: setup.py,v 1.21 2002/04/28 09:31:56 perky Exp $ +# $Id: setup.py,v 1.22 2002/04/28 09:37:19 perky Exp $ import sys from distutils.core import setup, Extension @@ -32,7 +32,7 @@ org_install_lib or self.install_purelib setup (name = "KoreanCodecs", - version = "2.0.3b1", + version = "2.0.3b2", description = "Korean Codecs for Python Unicode Support", long_description = "This package provides Unicode codecs that " "make Python aware of Korean character encodings such as " |
From: Chang <pe...@us...> - 2002-04-28 21:10:08
|
perky 02/04/27 23:16:07 Modified: src _koco.c Added: src koco_stream.h Removed: src euckr_stream.h Log: - Rename euckr_stream.h to koco_stream.h - make euc_kr_StreamReader more generalized. - Add full support for StreamReader (read, readline, readlines, reset) - Fix some garbage leaking Revision Changes Path 1.17 +12 -8 KoreanCodecs/src/_koco.c Index: _koco.c =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/_koco.c,v retrieving revision 1.16 retrieving revision 1.17 diff -u -r1.16 -r1.17 --- _koco.c 28 Apr 2002 04:46:52 -0000 1.16 +++ _koco.c 28 Apr 2002 06:16:07 -0000 1.17 @@ -4,24 +4,28 @@ * KoreanCodecs C Implementations * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/28 04:46:52 $ + * Date : $Date: 2002/04/28 06:16:07 $ * Created : 15 March 2002 * - * $Revision: 1.16 $ + * $Revision: 1.17 $ */ static char *version = -"$Id: _koco.c,v 1.16 2002/04/28 04:46:52 perky Exp $"; +"$Id: _koco.c,v 1.17 2002/04/28 06:16:07 perky Exp $"; #define UNIFIL 0xfffd #include "Python.h" -typedef int *state_t; +typedef int state_t; +typedef struct _streaminfo { + int state; + PyObject* (*decoder)(state_t*, char*, int slen, int errtype, PyObject* (*finalizer)(const Py_UNICODE *, int)); +} streaminfo; #define STATE_EXIST 0x100 #define HAS_STATE(c) ((*(c))&STATE_EXIST) #define GET_STATE(c) (unsigned char)((*(c))&0xFF) -#define REMOVE_STATE(c) ((*(c))&=0xFE00) +#define RESET_STATE(c) ((*(c))&=0xFE00) #define SET_STATE(c, v) (*(c)=STATE_EXIST|(v)) #ifndef max @@ -102,7 +106,7 @@ #include "euckr_codec.h" #include "cp949_codec.h" -#include "euckr_stream.h" +#include "koco_stream.h" /* List of methods defined in the module */ @@ -129,8 +133,8 @@ /* Add some symbolic constants to the module */ d = PyModule_GetDict(m); - t = PyClass_New_WithMethods("euc_kr_StreamReader", euc_kr_StreamReader_methods); - PyDict_SetItemString(d, "euc_kr_StreamReader", t); + t = PyClass_New_WithMethods("StreamReader", StreamReader_methods); + PyDict_SetItemString(d, "StreamReader", t); Py_DECREF(t); t = PyString_FromString(version); 1.1 KoreanCodecs/src/koco_stream.h Index: koco_stream.h =================================================================== /* * euckr_stream.c * * KoreanCodecs EUC-KR StreamReader C Implementation * * Author : Hye-Shik Chang <pe...@fa...> * Date : $Date: 2002/04/28 06:16:07 $ * Created : 28 April 2002 * * $Revision: 1.1 $ */ static PyObject * __euc_kr_decode( state_t *state, char *s, int slen, int errtype, PyObject* (*finalizer)(const Py_UNICODE *, int) ) { unsigned char *srccur, *srcend; Py_UNICODE *destptr, *destcur, *codemap, code; PyObject *r; destcur = destptr = PyMem_New(Py_UNICODE, slen+1); srccur = s; srcend = s + slen; if (HAS_STATE(state)) { unsigned char c = GET_STATE(state); if (c & 0x80) { if (slen > 0) { codemap = ksc5601_decode_map[c & 0x7F]; if (!codemap) goto invalid_state; if (ksc5601_decode_bottom <= *srccur && *srccur <= ksc5601_decode_top) { code = codemap[*srccur - ksc5601_decode_bottom]; if (code == UNIFIL) goto invalid_state; *(destcur++) = code; srccur++; } else { invalid_state: switch (errtype) { case error_strict: PyErr_Format(PyExc_UnicodeError, "EUC-KR decoding error: invalid character \\x%02x%02x", c, srccur[0]); r = NULL; goto out; case error_replace: *(destcur++) = UNIFIL; break; case error_ignore: break; } srccur++; } } else { /* keep state */ r = PyUnicode_FromUnicode(NULL, 0); goto out; } } else *(destcur++) = c; RESET_STATE(state); } for (; srccur < srcend; srccur++) { if (*srccur & 0x80) { if (srccur+1 >= srcend) /* state out */ SET_STATE(state, *srccur); else { codemap = ksc5601_decode_map[*srccur & 0x7F]; if (!codemap) goto invalid; if (ksc5601_decode_bottom <= srccur[1] && srccur[1] <= ksc5601_decode_top) { code = codemap[srccur[1] - ksc5601_decode_bottom]; if (code == UNIFIL) goto invalid; *(destcur++) = code; srccur++; } else { invalid: switch (errtype) { case error_strict: PyErr_Format(PyExc_UnicodeError, "EUC-KR decoding error: invalid character \\x%02x%02x", srccur[0], srccur[1]); r = NULL; goto out; case error_replace: *(destcur++) = UNIFIL; break; case error_ignore: break; } srccur++; } } } else *(destcur++) = *srccur; } r = finalizer(destptr, destcur-destptr); out: PyMem_Del(destptr); return r; } PyObject* readline_finalizer(const Py_UNICODE *data, int datalen) { PyObject *list, *uobj; const Py_UNICODE *linestart = data; if ((list = PyList_New(0)) == NULL) return NULL; for (;datalen--; data++) { if (*data == '\n') { append: if ((uobj = PyUnicode_FromUnicode(linestart, data-linestart+1)) == NULL) { Py_DECREF(list); return NULL; } if (PyList_Append(list, uobj) == -1) { Py_DECREF(list); return NULL; } Py_DECREF(uobj); linestart = data+1; } } if (linestart < data) goto append; /* datalen < 0 here */ return list; } static void streaminfo_destroy(void *obj) { PyMem_Del(obj); } static char StreamReader___init____doc__[] = "StreamReader.__init__()"; static PyObject* StreamReader___init__(PyObject *typeself, PyObject *args, PyObject *kwargs) { PyObject *self, *stnfoobj, *encodingobj; PyObject *stream, *errors = NULL; streaminfo *stnfo; char *encoding; static char *kwlist[] = {"self", "stream", "errors", NULL}; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|O:__init__", kwlist, &self, &stream, &errors)) return NULL; if ((encodingobj = PyObject_GetAttrString(self, "encoding")) == NULL) return NULL; if ((encoding = PyString_AsString(encodingobj)) == NULL) return NULL; stnfo = PyMem_New(streaminfo, 1); RESET_STATE(&(stnfo->state)); if (!strcmp(encoding, "euc-kr")) stnfo->decoder = __euc_kr_decode; else if (!strcmp(encoding, "cp949")) stnfo->decoder = __euc_kr_decode; else { PyMem_Del(stnfo); PyErr_Format(PyExc_UnicodeError, "can't initialize StreamReader: not supported encoding '%s'", encoding); return NULL; } stnfoobj = PyCObject_FromVoidPtr((void*)stnfo, streaminfo_destroy); PyObject_SetAttrString(self, "_streaminfo", stnfoobj); Py_DECREF(stnfoobj); PyObject_SetAttrString(self, "stream", stream); if (errors) PyObject_SetAttrString(self, "errors", errors); else { errors = PyString_FromString("strict"); PyObject_SetAttrString(self, "errors", errors); Py_DECREF(errors); } Py_INCREF(Py_None); return Py_None; } static char StreamReader_read__doc__[] = "StreamReader.read()"; static PyObject* StreamReader_read(PyObject *typeself, PyObject *args) { PyObject *self, *tmp, *r = NULL; PyObject *stream, *stnfoobj; streaminfo *stnfo; int size = -1, errtype; if (!PyArg_ParseTuple(args, "O|i:read", &self, &size)) return NULL; if (size == 0) return PyUnicode_FromUnicode(NULL, 0); if ((stream = PyObject_GetAttrString(self, "stream")) == NULL) return NULL; if ((tmp = PyObject_GetAttrString(self, "errors")) == NULL) { Py_DECREF(stream); return NULL; } errtype = error_type(PyString_AsString(tmp)); Py_DECREF(tmp); if (errtype == error_undef) return NULL; if ((stnfoobj = PyObject_GetAttrString(self, "_streaminfo")) == NULL) { Py_DECREF(stream); return NULL; } if ((stnfo = (streaminfo*)PyCObject_AsVoidPtr(stnfoobj)) == NULL) goto out; if (size < 0) tmp = PyObject_CallMethod(stream, "read", NULL); /* without tuple */ else tmp = PyObject_CallMethod(stream, "read", "i", size); if (tmp == NULL) goto out; r = stnfo->decoder( &(stnfo->state), PyString_AS_STRING(tmp), PyString_GET_SIZE(tmp), errtype, PyUnicode_FromUnicode ); Py_DECREF(tmp); out: Py_DECREF(stream); Py_DECREF(stnfoobj); return r; } static char StreamReader_readline__doc__[] = "StreamReader.readline()"; static PyObject* StreamReader_readline(PyObject *typeself, PyObject *args) { PyObject *self, *tmp, *r = NULL; PyObject *stream, *stnfoobj; streaminfo *stnfo; int size = -1, errtype; if (!PyArg_ParseTuple(args, "O|i:readline", &self, &size)) return NULL; if (size == 0) return PyUnicode_FromUnicode(NULL, 0); if ((stream = PyObject_GetAttrString(self, "stream")) == NULL) return NULL; if ((tmp = PyObject_GetAttrString(self, "errors")) == NULL) { Py_DECREF(stream); return NULL; } errtype = error_type(PyString_AsString(tmp)); Py_DECREF(tmp); if (errtype == error_undef) return NULL; if ((stnfoobj = PyObject_GetAttrString(self, "_streaminfo")) == NULL) { Py_DECREF(stream); return NULL; } if ((stnfo = (streaminfo*)PyCObject_AsVoidPtr(stnfoobj)) == NULL) goto out; if (size < 0) tmp = PyObject_CallMethod(stream, "readline", NULL); /* without tuple */ else tmp = PyObject_CallMethod(stream, "readline", "i", size); if (tmp == NULL) goto out; r = stnfo->decoder( &(stnfo->state), PyString_AS_STRING(tmp), PyString_GET_SIZE(tmp), errtype, PyUnicode_FromUnicode ); Py_DECREF(tmp); out: Py_DECREF(stream); Py_DECREF(stnfoobj); return r; } static char StreamReader_readlines__doc__[] = "StreamReader.readlines()"; static PyObject* StreamReader_readlines(PyObject *typeself, PyObject *args) { PyObject *self, *tmp, *r = NULL; PyObject *stream, *stnfoobj; streaminfo *stnfo; int size = -1, errtype; if (!PyArg_ParseTuple(args, "O|i:readlines", &self, &size)) return NULL; if (size == 0) return PyUnicode_FromUnicode(NULL, 0); if ((stream = PyObject_GetAttrString(self, "stream")) == NULL) return NULL; if ((tmp = PyObject_GetAttrString(self, "errors")) == NULL) { Py_DECREF(stream); return NULL; } errtype = error_type(PyString_AsString(tmp)); Py_DECREF(tmp); if (errtype == error_undef) return NULL; if ((stnfoobj = PyObject_GetAttrString(self, "_streaminfo")) == NULL) { Py_DECREF(stream); return NULL; } if ((stnfo = (streaminfo*)PyCObject_AsVoidPtr(stnfoobj)) == NULL) goto out; if (size < 0) tmp = PyObject_CallMethod(stream, "read", NULL); /* without tuple */ else tmp = PyObject_CallMethod(stream, "read", "i", size); if (tmp == NULL) goto out; r = stnfo->decoder( &(stnfo->state), PyString_AS_STRING(tmp), PyString_GET_SIZE(tmp), errtype, readline_finalizer ); Py_DECREF(tmp); out: Py_DECREF(stream); Py_DECREF(stnfoobj); return r; } static char StreamReader_reset__doc__[] = "StreamReader.reset()"; static PyObject* StreamReader_reset(PyObject *typeself, PyObject *args) { PyObject *self, *stnfoobj; streaminfo *stnfo; if (!PyArg_ParseTuple(args, "O|:reset", &self)) return NULL; if ((stnfoobj = PyObject_GetAttrString(self, "_streaminfo")) == NULL) return NULL; if ((stnfo = (streaminfo*)PyCObject_AsVoidPtr(stnfoobj)) != NULL) RESET_STATE(&(stnfo->state)); Py_DECREF(stnfoobj); Py_INCREF(Py_None); return Py_None; } struct PyMethodDef StreamReader_methods[] = { {"__init__", (PyCFunction) StreamReader___init__, METH_VARARGS | METH_KEYWORDS, StreamReader___init____doc__}, {"read", (PyCFunction) StreamReader_read, METH_VARARGS, StreamReader_read__doc__}, {"readline", (PyCFunction) StreamReader_readline, METH_VARARGS, StreamReader_readline__doc__}, {"readlines",(PyCFunction) StreamReader_readlines, METH_VARARGS, StreamReader_readlines__doc__}, {"reset", (PyCFunction) StreamReader_reset, METH_VARARGS, StreamReader_reset__doc__}, {NULL,}, }; |
From: Chang <pe...@us...> - 2002-04-28 20:05:08
|
perky 02/04/28 01:09:42 Modified: . ChangeLog Log: - Update generated cvslog for 2.0.3b1 Revision Changes Path 1.7 +118 -1 KoreanCodecs/ChangeLog Index: ChangeLog =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/ChangeLog,v retrieving revision 1.6 retrieving revision 1.7 diff -u -r1.6 -r1.7 --- ChangeLog 26 Apr 2002 12:28:50 -0000 1.6 +++ ChangeLog 28 Apr 2002 08:09:42 -0000 1.7 @@ -1,6 +1,123 @@ ----------------------------------------------------------------------------- -Version 2.0.3a2 (2002-04-26) Tag: RELENG_2_0_3_ALPHA2 +Version 2.0.3b1 (2002-04-28) Tag: RELENG_2_0_3_BETA1 + +2002-04-28 17:08 Hye-Shik Chang <pe...@fa...> + + * README (1.2), README.en (1.14), README.ko (1.13), setup.py + (1.20): + + - Add descriptions for 2.0.3b1 + +2002-04-28 17:02 Hye-Shik Chang <pe...@fa...> + + * korean/c/cp949.py (1.2), src/koco_stream.h (1.3): + + - Add StreamReader for CP949 encoding + +2002-04-28 16:44 Hye-Shik Chang <pe...@fa...> + + * doc/benchmarks.txt (1.1): + + - Add simple rough benchmark + +2002-04-28 15:54 Hye-Shik Chang <pe...@fa...> + + * setup.py (1.19), korean/c/euc_kr.py (1.2), src/_koco.c (1.18), + src/koco_stream.h (1.2): + + - Fix unlimited access on boundary problem on readline_finalize + - Let python.c.euc_kr uses _koco.StreamReader as stream reader + +2002-04-28 15:16 Hye-Shik Chang <pe...@fa...> + + * src/: _koco.c (1.17), euckr_stream.h (1.2), koco_stream.h (1.1): + + - Rename euckr_stream.h to koco_stream.h + - make euc_kr_StreamReader more generalized. + - Add full support for StreamReader (read, readline, readlines, reset) + - Fix some garbage leaking + +2002-04-28 13:46 Hye-Shik Chang <pe...@fa...> + + * src/: _koco.c (1.16), euckr_stream.h (1.1): + + - Add StreamReader C implementation for EUC-KR codec + (lacks readlines() now) + +2002-04-28 10:03 Hye-Shik Chang <pe...@fa...> + + * korean/python/euc_kr.py (1.4): + + - Fix error handling on trailing uncompleted character + +2002-04-28 09:55 Hye-Shik Chang <pe...@fa...> + + * test/: test_cp949.py (1.4), test_euc_kr.py (1.4): + + - Add error handling tests for uncompleted characters + +2002-04-28 09:51 Hye-Shik Chang <pe...@fa...> + + * src/: Setup.in (1.4), cp949_codec.h (1.2), euckr_codec.h (1.2), + twobytestream.c (1.2): + Fix several bugs on previous cp949, euc-kr codecs. + - Handle error on trailing uncompleted character. + - Raise on error='strict' with right data. + +2002-04-28 06:01 Hye-Shik Chang <pe...@fa...> + + * src/hangul.c (1.9): + + - Just a style fix + +2002-04-28 05:59 Hye-Shik Chang <pe...@fa...> + + * src/hangul.c (1.8): + + - Fix garbage collection errors + +2002-04-27 13:48 Hye-Shik Chang <pe...@fa...> + + * setup.py (1.18), src/Setup.in (1.3), src/twobytestream.c (1.1): + + - Add twobytestream which will be used by CP949, EUC-KR and Johab + as StreamReader, StreamWriter, StreamReaderWriter assistant + +2002-04-27 12:37 Hye-Shik Chang <pe...@fa...> + + * test/test_hangul.py (1.8): + + - make it simple(tm) + +2002-04-27 12:27 Hye-Shik Chang <pe...@fa...> + + * setup.py (1.17): + + - Add trigger '--without-extension' not to install C extensions + +2002-04-27 06:11 Hye-Shik Chang <pe...@fa...> + + * src/: _koco.c (1.15), cp949_codec.c (1.2), cp949_codec.h (1.1), + euckr_codec.c (1.2), euckr_codec.h (1.1): + + - Rename *codec.c to *codec.h + +2002-04-27 06:06 Hye-Shik Chang <pe...@fa...> + + * src/: _koco.c (1.14), cp949_codec.c (1.1), euckr_codec.c (1.1): + + - Split euc-kr and cp949 codec from _koco.c + +2002-04-26 21:28 Hye-Shik Chang <pe...@fa...> + + * ChangeLog (1.6): + + - Update cvslogs up to 2.0.3a2 + +----------------------------------------------------------------------------- +Version 2.0.3a2 (2002-04-26) Tag: RELENG_2_0_3_ALPHA2 + 2002-04-26 21:26 Hye-Shik Chang <pe...@fa...> * README.en (1.13), README.ko (1.12): |
From: Chang <pe...@us...> - 2002-04-28 19:45:43
|
perky 02/04/28 04:35:35 Modified: doc benchmarks.txt Log: - Add hangul tests Revision Changes Path 1.2 +27 -3 KoreanCodecs/doc/benchmarks.txt Index: benchmarks.txt =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/doc/benchmarks.txt,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- benchmarks.txt 28 Apr 2002 07:44:23 -0000 1.1 +++ benchmarks.txt 28 Apr 2002 11:35:34 -0000 1.2 @@ -1,7 +1,7 @@ ROUGH Benchmark Tests (just for fun ;) -------------------------------------- -$Id: benchmarks.txt,v 1.1 2002/04/28 07:44:23 perky Exp $ +$Id: benchmarks.txt,v 1.2 2002/04/28 11:35:34 perky Exp $ CPU: Intel Pentium III 800 OS: FreeBSD 4.5 @@ -46,10 +46,10 @@ korean.c korean.python ratio ----------- ----------------- ---------- EUC-KR read(100): 0.03s 1.17s 3900 % - (3.3MB/s) (85K/s) + (3.26MB/s) (85K/s) EUC-KR read(): 0.07s 27.00s 9310 % - (41.4MB/s) (107K/s) + (38.9MB/s) (107K/s) EUC-KR readline()*5: 0.13s 1.60s 1230 % (38461 lines/s) (3125 lines/s) @@ -57,5 +57,29 @@ EUC-KR readlines(): 0.29s 27.51s 9486 % (10.0MB/s) (105K/s) +CP949 read(100): 0.03s 1.12s 3844 % + (3.26MB/s) (0.08MB/s) +CP949 read(): 0.06s 30.00s 5000 % + (40.7MB/s) (0.09MB/s) + +CP949 readline()*5: 0.07s 2.02s 2612 % + (71428 lines/s) (2475 lines/s) + +CP949 readlines(): 0.19s 45.59s 23309 % + (14.56MB/s) (0.06MB/s) + + +Hangul +====== + +10000 times + + korean.c korean.python ratio + ----------- ----------------- ---------- +hangul.join: 0.19s 3.06s 15396 % + (52631 op/s) (3267 op/s) + +hangul.split: 0.26s 6.50s 24753 % + (38461 op/s) (1537 op/s) |
From: Chang <pe...@us...> - 2002-04-28 19:45:33
|
perky 02/04/27 21:46:53 Modified: src _koco.c Added: src euckr_stream.h Log: - Add StreamReader C implementation for EUC-KR codec (lacks readlines() now) Revision Changes Path 1.16 +60 -17 KoreanCodecs/src/_koco.c Index: _koco.c =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/_koco.c,v retrieving revision 1.15 retrieving revision 1.16 diff -u -r1.15 -r1.16 --- _koco.c 26 Apr 2002 21:11:13 -0000 1.15 +++ _koco.c 28 Apr 2002 04:46:52 -0000 1.16 @@ -4,18 +4,30 @@ * KoreanCodecs C Implementations * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/26 21:11:13 $ + * Date : $Date: 2002/04/28 04:46:52 $ * Created : 15 March 2002 * - * $Revision: 1.15 $ + * $Revision: 1.16 $ */ static char *version = -"$Id: _koco.c,v 1.15 2002/04/26 21:11:13 perky Exp $"; +"$Id: _koco.c,v 1.16 2002/04/28 04:46:52 perky Exp $"; #define UNIFIL 0xfffd #include "Python.h" + +typedef int *state_t; +#define STATE_EXIST 0x100 +#define HAS_STATE(c) ((*(c))&STATE_EXIST) +#define GET_STATE(c) (unsigned char)((*(c))&0xFF) +#define REMOVE_STATE(c) ((*(c))&=0xFE00) +#define SET_STATE(c, v) (*(c)=STATE_EXIST|(v)) + +#ifndef max +#define max(a, b) ((a)<(b) ? (b) : (a)) +#endif + #include "_koco_ksc5601.h" #include "_koco_uhc.h" @@ -24,8 +36,8 @@ enum { error_strict, error_ignore, error_replace, error_undef }; -static -PyObject *codec_tuple(PyObject *unicode, int len) +static PyObject * +codec_tuple(PyObject *unicode, int len) { PyObject *v, *w; @@ -46,7 +58,8 @@ return v; } -int error_type(const char *errors) +static int +error_type(const char *errors) { if (errors == NULL || strcmp(errors, "strict") == 0) { return error_strict; @@ -65,8 +78,31 @@ } } +static PyObject * +PyClass_New_WithMethods(const char *name, PyMethodDef *methods) +{ + PyMethodDef *def; + + PyObject *classDict = PyDict_New(); + PyObject *className = PyString_FromString(name); + PyObject *newClass = PyClass_New(NULL, classDict, className); + Py_DECREF(classDict); + Py_DECREF(className); + + for (def = methods; def->ml_name != NULL; def++) { + PyObject *func = PyCFunction_New(def, NULL); + PyObject *method = PyMethod_New(func, NULL, newClass); + PyDict_SetItemString(classDict, def->ml_name, method); + Py_DECREF(method); + Py_DECREF(func); + } + + return newClass; +} + #include "euckr_codec.h" #include "cp949_codec.h" +#include "euckr_stream.h" /* List of methods defined in the module */ @@ -85,20 +121,27 @@ void init_koco(void) { - PyObject *m, *d; + PyObject *m, *d, *t; + + /* Create the module and add the functions */ + m = Py_InitModule("_koco", _koco_methods); - /* Create the module and add the functions */ - m = Py_InitModule("_koco", _koco_methods); + /* Add some symbolic constants to the module */ + d = PyModule_GetDict(m); - /* Add some symbolic constants to the module */ - d = PyModule_GetDict(m); + t = PyClass_New_WithMethods("euc_kr_StreamReader", euc_kr_StreamReader_methods); + PyDict_SetItemString(d, "euc_kr_StreamReader", t); + Py_DECREF(t); - PyDict_SetItemString(d, "version", PyString_FromString(version)); + t = PyString_FromString(version); + PyDict_SetItemString(d, "version", t); + Py_DECREF(t); - ErrorObject = PyErr_NewException("_koco.error", NULL, NULL); - PyDict_SetItemString(d, "error", ErrorObject); + ErrorObject = PyErr_NewException("_koco.error", NULL, NULL); + PyDict_SetItemString(d, "error", ErrorObject); + Py_DECREF(ErrorObject); - /* Check for errors */ - if (PyErr_Occurred()) - Py_FatalError("can't initialize the _koco module"); + /* Check for errors */ + if (PyErr_Occurred()) + Py_FatalError("can't initialize the _koco module"); } 1.1 KoreanCodecs/src/euckr_stream.h Index: euckr_stream.h =================================================================== /* * euckr_stream.c * * KoreanCodecs EUC-KR StreamReader C Implementation * * Author : Hye-Shik Chang <pe...@fa...> * Date : $Date: 2002/04/28 04:46:52 $ * Created : 28 April 2002 * * $Revision: 1.1 $ */ static PyObject * __euc_kr_decode(state_t state, char *s, int slen, int errtype) { unsigned char *srccur, *srcend; Py_UNICODE *destptr, *destcur, *codemap, code; PyObject *r; destcur = destptr = PyMem_New(Py_UNICODE, slen+1); srccur = s; srcend = s + slen; if (HAS_STATE(state)) { unsigned char c = GET_STATE(state); if (c & 0x80) { if (slen > 0) { codemap = ksc5601_decode_map[c & 0x7F]; if (!codemap) goto invalid_state; if (ksc5601_decode_bottom <= *srccur && *srccur <= ksc5601_decode_top) { code = codemap[*srccur - ksc5601_decode_bottom]; if (code == UNIFIL) goto invalid_state; *(destcur++) = code; srccur++; } else { invalid_state: switch (errtype) { case error_strict: PyErr_Format(PyExc_UnicodeError, "EUC-KR decoding error: invalid character \\x%02x%02x", c, srccur[0]); r = NULL; goto out; case error_replace: *(destcur++) = UNIFIL; break; case error_ignore: break; } srccur++; } } else { /* keep state */ r = PyUnicode_FromUnicode(NULL, 0); goto out; } } else *(destcur++) = c; REMOVE_STATE(state); } for (; srccur < srcend; srccur++) { if (*srccur & 0x80) { if (srccur+1 >= srcend) /* state out */ SET_STATE(state, *srccur); else { codemap = ksc5601_decode_map[*srccur & 0x7F]; if (!codemap) goto invalid; if (ksc5601_decode_bottom <= srccur[1] && srccur[1] <= ksc5601_decode_top) { code = codemap[srccur[1] - ksc5601_decode_bottom]; if (code == UNIFIL) goto invalid; *(destcur++) = code; srccur++; } else { invalid: switch (errtype) { case error_strict: PyErr_Format(PyExc_UnicodeError, "EUC-KR decoding error: invalid character \\x%02x%02x", srccur[0], srccur[1]); r = NULL; goto out; case error_replace: *(destcur++) = UNIFIL; break; case error_ignore: break; } srccur++; } } } else *(destcur++) = *srccur; } r = PyUnicode_FromUnicode(destptr, destcur-destptr); out: PyMem_Del(destptr); return r; } static void state_t_destroy(void *obj) { PyMem_Del(obj); } static char euc_kr_StreamReader___init____doc__[] = "euc_kr_StreamReader.__init__()"; static PyObject* euc_kr_StreamReader___init__(PyObject *typeself, PyObject *args, PyObject *kwargs) { PyObject *self, *stateobj; PyObject *stream, *errors = NULL; state_t state; static char *kwlist[] = {"self", "stream", "errors", NULL}; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|O:__init__", kwlist, &self, &stream, &errors)) return NULL; PyObject_SetAttrString(self, "stream", stream); if (errors) PyObject_SetAttrString(self, "errors", errors); else { errors = PyString_FromString("strict"); PyObject_SetAttrString(self, "errors", errors); Py_DECREF(errors); } state = PyMem_New(/*state_t*/int, 1); REMOVE_STATE(state); stateobj = PyCObject_FromVoidPtr((void*)state, state_t_destroy); PyObject_SetAttrString(self, "_state", stateobj); Py_DECREF(stateobj); Py_INCREF(Py_None); return Py_None; } static char euc_kr_StreamReader_read__doc__[] = "euc_kr_StreamReader.read()"; static PyObject* euc_kr_StreamReader_read(PyObject *typeself, PyObject *args) { PyObject *self, *tmp, *r = NULL; PyObject *stream, *stateobj; state_t state; int size = -1, errtype; if (!PyArg_ParseTuple(args, "O|i:read", &self, &size)) return NULL; if (size == 0) return PyUnicode_FromUnicode(NULL, 0); if ((stream = PyObject_GetAttrString(self, "stream")) == NULL) return NULL; if ((tmp = PyObject_GetAttrString(self, "errors")) == NULL) { Py_DECREF(stream); return NULL; } errtype = error_type(PyString_AsString(tmp)); Py_DECREF(tmp); if (errtype == error_undef) return NULL; if ((stateobj = PyObject_GetAttrString(self, "_state")) == NULL) { Py_DECREF(stream); return NULL; } if ((state = (state_t)PyCObject_AsVoidPtr(stateobj)) == NULL) goto out; if (size < 0) tmp = PyObject_CallMethod(stream, "read", NULL); /* without tuple */ else tmp = PyObject_CallMethod(stream, "read", "i", size); if (tmp == NULL) goto out; r = __euc_kr_decode( state, PyString_AS_STRING(tmp), PyString_GET_SIZE(tmp), errtype ); out: Py_DECREF(stream); Py_DECREF(stateobj); return r; } static char euc_kr_StreamReader_readline__doc__[] = "euc_kr_StreamReader.readline()"; static PyObject* euc_kr_StreamReader_readline(PyObject *typeself, PyObject *args) { PyObject *self, *tmp, *r = NULL; PyObject *stream, *stateobj; state_t state; int size = -1, errtype; if (!PyArg_ParseTuple(args, "O|i:read", &self, &size)) return NULL; if (size == 0) return PyUnicode_FromUnicode(NULL, 0); if ((stream = PyObject_GetAttrString(self, "stream")) == NULL) return NULL; if ((tmp = PyObject_GetAttrString(self, "errors")) == NULL) { Py_DECREF(stream); return NULL; } errtype = error_type(PyString_AsString(tmp)); Py_DECREF(tmp); if (errtype == error_undef) return NULL; if ((stateobj = PyObject_GetAttrString(self, "_state")) == NULL) { Py_DECREF(stream); return NULL; } if ((state = (state_t)PyCObject_AsVoidPtr(stateobj)) == NULL) goto out; if (size < 0) tmp = PyObject_CallMethod(stream, "readline", NULL); /* without tuple */ else tmp = PyObject_CallMethod(stream, "readline", "i", size); if (tmp == NULL) goto out; r = __euc_kr_decode( state, PyString_AS_STRING(tmp), PyString_GET_SIZE(tmp), errtype ); out: Py_DECREF(stream); Py_DECREF(stateobj); return r; } struct PyMethodDef euc_kr_StreamReader_methods[] = { {"__init__", (PyCFunction) euc_kr_StreamReader___init__, METH_VARARGS | METH_KEYWORDS, euc_kr_StreamReader___init____doc__}, {"read", (PyCFunction) euc_kr_StreamReader_read, METH_VARARGS, euc_kr_StreamReader_read__doc__}, {"readline", (PyCFunction) euc_kr_StreamReader_readline, METH_VARARGS, euc_kr_StreamReader_readline__doc__}, {NULL,}, }; |
From: Chang <pe...@us...> - 2002-04-28 19:45:16
|
perky 02/04/28 01:08:04 Modified: . README README.en README.ko setup.py Log: - Add descriptions for 2.0.3b1 Revision Changes Path 1.2 +1 -0 KoreanCodecs/README Index: README =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- README 17 Feb 2002 13:58:44 -0000 1.1 +++ README 28 Apr 2002 08:08:04 -0000 1.2 @@ -3,3 +3,4 @@ README.en : English (in ISO8859-1) README.ko : Korean (in EUC-KR) +and, Quick Start Guide is also available on doc/quick_guide.txt 1.14 +5 -3 KoreanCodecs/README.en Index: README.en =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README.en,v retrieving revision 1.13 retrieving revision 1.14 diff -u -r1.13 -r1.14 --- README.en 26 Apr 2002 12:26:24 -0000 1.13 +++ README.en 28 Apr 2002 08:08:04 -0000 1.14 @@ -1,8 +1,8 @@ -KoreanCodecs version 2.0.3a2 +KoreanCodecs version 2.0.3b1 ============================ Copyright(C) Hye-Shik Chang, 2002. -$Id: README.en,v 1.13 2002/04/26 12:26:24 perky Exp $ +$Id: README.en,v 1.14 2002/04/28 08:08:04 perky Exp $ @@ -120,13 +120,15 @@ ------- o Version 2.0.3 - April 2002 - - change jamo short names to confirm to Unicode 3.2 on hangul module - added hangul module C implementation (which means, johab, unijohab and qwerty2bul have gotten faster) + - added StreamReader C implementation for EUC-KR and CP949 + - change jamo short names to confirm to Unicode 3.2 on hangul module - added conjoin, disjoint, format in hangul module (format function is a unicode formatter that fixes korean suffixes after each arguments) - improvemented in platform and version compatibilities + - fixed some refcount leaks on C extensions o Version 2.0.2 - 16 March 2002 - added euc-kr and cp949 codec C implementations 1.13 +5 -3 KoreanCodecs/README.ko Index: README.ko =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/README.ko,v retrieving revision 1.12 retrieving revision 1.13 diff -u -r1.12 -r1.13 --- README.ko 26 Apr 2002 12:26:24 -0000 1.12 +++ README.ko 28 Apr 2002 08:08:04 -0000 1.13 @@ -1,8 +1,8 @@ -ÇѱÛÄÚµ¦ ¹öÁ¯ 2.0.3a2 +ÇѱÛÄÚµ¦ ¹öÁ¯ 2.0.3b1 ===================== Copyright(C) Hye-Shik Chang, 2002. -$Id: README.ko,v 1.12 2002/04/26 12:26:24 perky Exp $ +$Id: README.ko,v 1.13 2002/04/28 08:08:04 perky Exp $ *Ä·ÆäÀÎ* ÀÎÅͳݿ¡¼ ÇÑ±Û ¸ÂÃã¹ýÀ» Áöŵ½Ã´Ù. ^-^/~ @@ -123,13 +123,15 @@ ---- o ¹öÁ¯ 2.0.3 2002³â 4¿ù - - hangul ¸ðµâ À¯´ÏÄÚµå 3.2 Ç¥ÁØÀ¸·Î ÀÚ¸ð ¾à¾î º¯°æ - hangul ¸ðµâ C ±¸Çö Ãß°¡ (ÀÌ È®ÀåÀ¸·Î johab, unijohab, qwerty2bul ÄÚµ¦ÀÌ »¡¶óÁý´Ï´Ù.) + - EUC-KR, CP949 ÄÚµ¦À» À§ÇÑ StreamReader C ±¸Çö Ãß°¡ + - hangul ¸ðµâ À¯´ÏÄÚµå 3.2 Ç¥ÁØÀ¸·Î ÀÚ¸ð ¾à¾î º¯°æ - hangul ¸ðµâ¿¡ conjoin, disjoint, format ÇÔ¼ö Ãß°¡ (formatÀº Æ÷¸ËµÈ ´Ü¾îÀÇ Á¾¼º¿©ºÎ¿¡ µû¶ó µÚÀÇ Á¶»ç¸¦ ¼öÁ¤ÇØÁÖ´Â ÇѱۿëÀÇ À¯´ÏÄÚµå Æ÷¸ÅÆÃ ÇÔ¼öÀÔ´Ï´Ù.) - Ç÷§Æû°ú ¹öÁ¯º° ȣȯ¼ºÀÌ °³¼±µÇ¾ú½À´Ï´Ù. + - C ±¸Çö ÄÚµ¦µéÀÇ ÂüÁ¶È¸¼ö ¸®Å· ¹ö±×¸¦ ´Ù¼ö ¼öÁ¤ o ¹öÁ¯ 2.0.2 2002³â 3¿ù 16ÀÏ - EUC-KR, CP949 ÄÚµ¦ C ±¸Çö Ãß°¡ 1.20 +2 -2 KoreanCodecs/setup.py Index: setup.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/setup.py,v retrieving revision 1.19 retrieving revision 1.20 diff -u -r1.19 -r1.20 --- setup.py 28 Apr 2002 06:54:11 -0000 1.19 +++ setup.py 28 Apr 2002 08:08:04 -0000 1.20 @@ -1,5 +1,5 @@ #!/usr/bin/env python -# $Id: setup.py,v 1.19 2002/04/28 06:54:11 perky Exp $ +# $Id: setup.py,v 1.20 2002/04/28 08:08:04 perky Exp $ import sys from distutils.core import setup, Extension @@ -32,7 +32,7 @@ org_install_lib or self.install_purelib setup (name = "KoreanCodecs", - version = "2.0.3a3", + version = "2.0.3b1", description = "Korean Codecs for Python Unicode Support", long_description = "This package provides Unicode codecs that " "make Python aware of Korean character encodings such as " |
From: Chang <pe...@us...> - 2002-04-28 19:44:47
|
perky 02/04/27 17:51:51 Modified: src Setup.in cp949_codec.h euckr_codec.h Removed: src twobytestream.c Log: Fix several bugs on previous cp949, euc-kr codecs. - Handle error on trailing uncompleted character. - Raise on error='strict' with right data. Revision Changes Path 1.4 +0 -1 KoreanCodecs/src/Setup.in Index: Setup.in =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/Setup.in,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- Setup.in 27 Apr 2002 04:48:37 -0000 1.3 +++ Setup.in 28 Apr 2002 00:51:51 -0000 1.4 @@ -1,4 +1,3 @@ *shared* _koco _koco.c hangul hangul.c -twobytestream twobytestream.c 1.2 +19 -8 KoreanCodecs/src/cp949_codec.h Index: cp949_codec.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/cp949_codec.h,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- cp949_codec.h 26 Apr 2002 21:11:13 -0000 1.1 +++ cp949_codec.h 28 Apr 2002 00:51:51 -0000 1.2 @@ -4,10 +4,10 @@ * KoreanCodecs CP949 Codec C Implementation * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/26 21:11:13 $ + * Date : $Date: 2002/04/28 00:51:51 $ * Created : 15 March 2002 * - * $Revision: 1.1 $ + * $Revision: 1.2 $ */ static char cp949_decode__doc__[] = "CP949 decoder"; @@ -30,7 +30,20 @@ destcur = destptr = PyMem_New(Py_UNICODE, arglen+1); for (srccur = argstr, srcend = argstr + arglen; srccur < srcend; srccur++) { - if ((*srccur & 0x80) && (srccur+1 < srcend)) { + if (*srccur & 0x80) { + if (srccur+1 >= srcend) { + switch (errtype) { + case error_strict: + PyMem_Del(destptr); + PyErr_Format(PyExc_UnicodeError, + "CP949 decoding error: invalid character \\x%02x", *srccur); + return NULL; + case error_replace: + *(destcur++) = UNIFIL; + break; + case error_ignore: break; + } + } else { if (uhc_decode_hint[*srccur]) { /* UHC page0 region */ codemap = uhc_decode_map[*srccur & 0x7F]; /* codemap DOES have all maps on 0x81-0xA0, alphabet area can't on this */ @@ -63,25 +76,23 @@ goto invalid; *(destcur++) = code; srccur++; - continue; } else { -invalid: srccur++; /* skip 2byte */ - switch (errtype) { +invalid: switch (errtype) { case error_strict: PyMem_Del(destptr); PyErr_Format(PyExc_UnicodeError, "CP949 decoding error: invalid character \\x%02x%02x", srccur[0], srccur[1]); return NULL; - break; case error_replace: *(destcur++) = UNIFIL; break; - /* case error_ignore: break; */ + case error_ignore: break; } - continue; + srccur++; } } + } } else *(destcur++) = *srccur; } 1.2 +22 -8 KoreanCodecs/src/euckr_codec.h Index: euckr_codec.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/euckr_codec.h,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- euckr_codec.h 26 Apr 2002 21:11:13 -0000 1.1 +++ euckr_codec.h 28 Apr 2002 00:51:51 -0000 1.2 @@ -4,10 +4,10 @@ * KoreanCodecs EUC-KR Codec C Implementation * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/26 21:11:13 $ + * Date : $Date: 2002/04/28 00:51:51 $ * Created : 15 March 2002 * - * $Revision: 1.1 $ + * $Revision: 1.2 $ */ static char euc_kr_decode__doc__[] = "EUC-KR decoder"; @@ -30,7 +30,21 @@ destcur = destptr = PyMem_New(Py_UNICODE, arglen+1); for (srccur = argstr, srcend = argstr + arglen; srccur < srcend; srccur++) { - if ((*srccur & 0x80) && (srccur+1 < srcend)) { + if (*srccur & 0x80) { + if (srccur+1 >= srcend) { + switch (errtype) { + case error_strict: + PyMem_Del(destptr); + PyErr_Format(PyExc_UnicodeError, + "EUC-KR decoding error: invalid character \\x%02x", *srccur); + return NULL; + case error_replace: + *(destcur++) = UNIFIL; + break; + case error_ignore: + break; + } + } else { codemap = ksc5601_decode_map[*srccur & 0x7F]; if (!codemap) goto invalid; @@ -40,24 +54,23 @@ goto invalid; *(destcur++) = code; srccur++; - continue; } else { -invalid: srccur++; /* skip 2byte */ - switch (errtype) { +invalid: switch (errtype) { case error_strict: PyMem_Del(destptr); PyErr_Format(PyExc_UnicodeError, "EUC-KR decoding error: invalid character \\x%02x%02x", srccur[0], srccur[1]); return NULL; - break; case error_replace: *(destcur++) = UNIFIL; break; - /* case error_ignore: break; */ + case error_ignore: + break; } - continue; + srccur++; } + } } else *(destcur++) = *srccur; } @@ -66,6 +79,7 @@ PyMem_Del(destptr); return r; } + static char euc_kr_encode__doc__[] = "EUC-KR encoder"; |
From: Chang <pe...@us...> - 2002-04-28 19:43:53
|
perky 02/04/28 00:44:23 Added: doc benchmarks.txt Log: - Add simple rough benchmark Revision Changes Path 1.1 KoreanCodecs/doc/benchmarks.txt Index: benchmarks.txt =================================================================== ROUGH Benchmark Tests (just for fun ;) -------------------------------------- $Id: benchmarks.txt,v 1.1 2002/04/28 07:44:23 perky Exp $ CPU: Intel Pentium III 800 OS: FreeBSD 4.5 Python: Python 2.2.1 C Compiler: Intel C Compiler 6.0 Decoder ======= 1000 times with 2901 bytes string korean.c korean.python ratio ----------- ----------------- ---------- EUC-KR: 0.04s 28.24s 74900 % (69MB/s) (0.09MB/s) CP949: 0.05s 40.27s 83896 % (57MB/s) (0.06MB/s) Encoder ======= 1000 times with 2660 unicode characters korean.c korean.python ratio ----------- ----------------- ---------- EUC-KR: 0.05s 32.14s 64200 % (53 MUchar/s) (82.7 KUchar/s) CP949: 0.07s 32.60s 46500 % (38 MUchar/s) (81.5 KUchar/s) StreamReader ============ 1000 times with file that have 99 lines / 2901 bytes korean.c korean.python ratio ----------- ----------------- ---------- EUC-KR read(100): 0.03s 1.17s 3900 % (3.3MB/s) (85K/s) EUC-KR read(): 0.07s 27.00s 9310 % (41.4MB/s) (107K/s) EUC-KR readline()*5: 0.13s 1.60s 1230 % (38461 lines/s) (3125 lines/s) EUC-KR readlines(): 0.29s 27.51s 9486 % (10.0MB/s) (105K/s) |
From: Chang <pe...@us...> - 2002-04-28 19:43:38
|
perky 02/04/28 01:02:32 Modified: src koco_stream.h Log: - Add StreamReader for CP949 encoding Revision Changes Path 1.3 +145 -3 KoreanCodecs/src/koco_stream.h Index: koco_stream.h =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/koco_stream.h,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- koco_stream.h 28 Apr 2002 06:54:12 -0000 1.2 +++ koco_stream.h 28 Apr 2002 08:02:32 -0000 1.3 @@ -4,10 +4,15 @@ * KoreanCodecs EUC-KR StreamReader C Implementation * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/28 06:54:12 $ + * Date : $Date: 2002/04/28 08:02:32 $ * Created : 28 April 2002 * - * $Revision: 1.2 $ + * $Revision: 1.3 $ + */ + +/* + * TODO: + * __euc_kr_decode and __cp949_decode has so many big duplicated codes, now. */ static PyObject * @@ -103,6 +108,143 @@ return r; } +static PyObject * +__cp949_decode( + state_t *state, char *s, int slen, int errtype, + PyObject* (*finalizer)(const Py_UNICODE *, int) +) { + unsigned char *srccur, *srcend; + Py_UNICODE *destptr, *destcur, *codemap, code; + PyObject *r; + + destcur = destptr = PyMem_New(Py_UNICODE, slen+1); + srccur = s; + srcend = s + slen; + + if (HAS_STATE(*state)) { + unsigned char c = GET_STATE(*state); + + if (c & 0x80) { + if (slen > 0) { + if (uhc_decode_hint[c]) { /* UHC page0 region */ + codemap = uhc_decode_map[c & 0x7F]; + + if (uhc_page0_bottom <= *srccur && *srccur <= uhc_page0_top) { + code = codemap[*srccur - uhc_page0_bottom]; + if (code == UNIFIL) + goto invalid; + *(destcur++) = code; + srccur++; + } else + goto invalid_state; + } else if (uhc_decode_hint[*srccur]) { /* UHC page1 region */ + codemap = uhc_decode_map[c & 0x7F]; + if (!codemap) + goto invalid; + + code = codemap[*srccur - uhc_page1_bottom]; + if (code == UNIFIL) + goto invalid; + *(destcur++) = code; + srccur++; + } else { /* KSC5601 */ + codemap = ksc5601_decode_map[c & 0x7F]; + + if (!codemap) + goto invalid_state; + if (ksc5601_decode_bottom <= *srccur && *srccur <= ksc5601_decode_top) { + code = codemap[*srccur - ksc5601_decode_bottom]; + if (code == UNIFIL) + goto invalid_state; + *(destcur++) = code; + srccur++; + } else { +invalid_state: switch (errtype) { + case error_strict: + PyErr_Format(PyExc_UnicodeError, + "CP949 decoding error: invalid character \\x%02x%02x", + c, *srccur); + r = NULL; + goto out; + case error_replace: + *(destcur++) = UNIFIL; + break; + case error_ignore: break; + } + srccur++; + } + } + } else { /* keep state */ + r = PyUnicode_FromUnicode(NULL, 0); + goto out; + } + } else + *(destcur++) = c; + + RESET_STATE(*state); + } + + for (; srccur < srcend; srccur++) { + if (*srccur & 0x80) { + if (srccur+1 >= srcend) /* state out */ + SET_STATE(*state, *srccur); + else { + if (uhc_decode_hint[*srccur]) { /* UHC page0 region */ + codemap = uhc_decode_map[*srccur & 0x7F]; + if (uhc_page0_bottom <= srccur[1] && srccur[1] <= uhc_page0_top) { + code = codemap[srccur[1] - uhc_page0_bottom]; + if (code == UNIFIL) + goto invalid; + *(destcur++) = code; + srccur++; + } else + goto invalid; + } else if (uhc_decode_hint[srccur[1]]) { /* UHC page1 region */ + codemap = uhc_decode_map[*srccur & 0x7F]; + if (!codemap) + goto invalid; + code = codemap[srccur[1] - uhc_page1_bottom]; + if (code == UNIFIL) + goto invalid; + *(destcur++) = code; + srccur++; + } else { + codemap = ksc5601_decode_map[*srccur & 0x7F]; + if (!codemap) + goto invalid; + if (ksc5601_decode_bottom <= srccur[1] && srccur[1] <= ksc5601_decode_top) { + code = codemap[srccur[1] - ksc5601_decode_bottom]; + if (code == UNIFIL) + goto invalid; + *(destcur++) = code; + srccur++; + } else { +invalid: switch (errtype) { + case error_strict: + PyErr_Format(PyExc_UnicodeError, + "CP949 decoding error: invalid character \\x%02x%02x", + srccur[0], srccur[1]); + r = NULL; + goto out; + case error_replace: + *(destcur++) = UNIFIL; + break; + case error_ignore: break; + } + srccur++; + } + } + } + } else + *(destcur++) = *srccur; + } + + r = finalizer(destptr, destcur-destptr); +out: + PyMem_Del(destptr); + return r; +} + PyObject* readline_finalizer(const Py_UNICODE *data, int datalen) { @@ -166,7 +308,7 @@ if (!strcmp(encoding, "euc-kr")) stnfo->decoder = __euc_kr_decode; else if (!strcmp(encoding, "cp949")) - stnfo->decoder = __euc_kr_decode; + stnfo->decoder = __cp949_decode; else { PyMem_Del(stnfo); PyErr_Format(PyExc_UnicodeError, |
From: Chang <pe...@us...> - 2002-04-28 19:43:37
|
perky 02/04/28 01:02:32 Modified: korean/c cp949.py Log: - Add StreamReader for CP949 encoding Revision Changes Path 1.2 +3 -48 KoreanCodecs/korean/c/cp949.py Index: cp949.py =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/korean/c/cp949.py,v retrieving revision 1.1 retrieving revision 1.2 diff -u -r1.1 -r1.2 --- cp949.py 16 Mar 2002 01:59:18 -0000 1.1 +++ cp949.py 28 Apr 2002 08:02:31 -0000 1.2 @@ -1,6 +1,6 @@ # Hye-Shik Chang <16 Mar 2002> # -# $Id: cp949.py,v 1.1 2002/03/16 01:59:18 perky Exp $ +# $Id: cp949.py,v 1.2 2002/04/28 08:02:31 perky Exp $ import codecs import _koco @@ -12,53 +12,8 @@ class StreamWriter(Codec, codecs.StreamWriter): pass -class StreamReader(Codec, codecs.StreamReader): - - def __init__(self, stream, errors='strict'): - codecs.StreamReader.__init__(self, stream, errors) - self.data = '' - - def _read(self, func, size): - if size == 0: - return u'' - if size is None or size < 0: - data = self.data + func() - self.data = '' - else: - data = self.data + func(max(size, 2) - len(self.data)) - size = len(data) - p = 0 - while p < size: - if data[p] < "\x80": - p = p + 1 - elif p + 2 <= size: - p = p + 2 - else: - break - data, self.data = data[:p], data[p:] - return self.decode(data)[0] - - def read(self, size=-1): - return self._read(self.stream.read, size) - - def readline(self, size=-1): - return self._read(self.stream.readline, size) - - def readlines(self, size=-1): - data = self._read(self.stream.read, size) - buffer = [] - end = 0 - while 1: - pos = data.find(u'\n', end) - if pos < 0: - if end < len(data): - buffer.append(data[end:]) - break - buffer.append(data[end:pos+1]) - end = pos+1 - return buffer - def reset(self): - self.data = '' +class StreamReader(Codec, _koco.StreamReader, codecs.StreamReader): + encoding = 'cp949' ### encodings module API |
From: Chang <pe...@us...> - 2002-04-28 19:40:05
|
perky 02/04/27 14:01:19 Modified: src hangul.c Log: - Just a style fix Revision Changes Path 1.9 +4 -5 KoreanCodecs/src/hangul.c Index: hangul.c =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/hangul.c,v retrieving revision 1.8 retrieving revision 1.9 diff -u -r1.8 -r1.9 --- hangul.c 27 Apr 2002 20:59:21 -0000 1.8 +++ hangul.c 27 Apr 2002 21:01:19 -0000 1.9 @@ -4,14 +4,14 @@ * KoreanCodecs Hangul Module C Implementation * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/27 20:59:21 $ + * Date : $Date: 2002/04/27 21:01:19 $ * Created : 25 April 2002 * - * $Revision: 1.8 $ + * $Revision: 1.9 $ */ static char *version = -"$Id: hangul.c,v 1.8 2002/04/27 20:59:21 perky Exp $"; +"$Id: hangul.c,v 1.9 2002/04/27 21:01:19 perky Exp $"; #include "Python.h" @@ -614,8 +614,6 @@ /* Add some symbolic constants to the module */ d = PyModule_GetDict(m); - PyDict_SetItemString(d, "Space", UniSpace); - /*Py_DECREF(UniSpace); never die */ SET_INTCONSTANT(d, NCHOSUNG); SET_INTCONSTANT(d, NJUNGSUNG); SET_INTCONSTANT(d, NJONGSUNG); @@ -748,6 +746,7 @@ tuni[0] = JUNGSUNG_FILLER; PyDict_SetItemString(d, "JUNGSUNG_FILLER", PyUnicode_FromUnicode(tuni, 1)); PyDict_SetItemString(d, "Null", UniNull); + PyDict_SetItemString(d, "Space", UniSpace); PyDict_SetItemString(d, "version", PyString_FromString(version)); |
From: Chang <pe...@us...> - 2002-04-28 19:39:33
|
perky 02/04/27 13:59:21 Modified: src hangul.c Log: - Fix garbage collection errors Revision Changes Path 1.8 +16 -7 KoreanCodecs/src/hangul.c Index: hangul.c =================================================================== RCS file: /cvsroot/koco/KoreanCodecs/src/hangul.c,v retrieving revision 1.7 retrieving revision 1.8 diff -u -r1.7 -r1.8 --- hangul.c 26 Apr 2002 08:21:54 -0000 1.7 +++ hangul.c 27 Apr 2002 20:59:21 -0000 1.8 @@ -4,14 +4,14 @@ * KoreanCodecs Hangul Module C Implementation * * Author : Hye-Shik Chang <pe...@fa...> - * Date : $Date: 2002/04/26 08:21:54 $ + * Date : $Date: 2002/04/27 20:59:21 $ * Created : 25 April 2002 * - * $Revision: 1.7 $ + * $Revision: 1.8 $ */ static char *version = -"$Id: hangul.c,v 1.7 2002/04/26 08:21:54 perky Exp $"; +"$Id: hangul.c,v 1.8 2002/04/27 20:59:21 perky Exp $"; #include "Python.h" @@ -611,10 +611,11 @@ UniNull = PyUnicode_FromUnicode(NULL, 0); tuni[0] = 0x3000; /* Unicode Double-wide Space */ UniSpace = PyUnicode_FromUnicode(tuni, 1); - Py_INCREF(UniSpace); /* Add some symbolic constants to the module */ d = PyModule_GetDict(m); + PyDict_SetItemString(d, "Space", UniSpace); + /*Py_DECREF(UniSpace); never die */ SET_INTCONSTANT(d, NCHOSUNG); SET_INTCONSTANT(d, NJUNGSUNG); SET_INTCONSTANT(d, NJONGSUNG); @@ -677,30 +678,34 @@ tuni[0] = jamo->code; unijamo = PyUnicode_FromUnicode(tuni, 1); PyDict_SetItemString(d, jamo->name, unijamo); - Py_INCREF(unijamo); /* PuTyple_SET_ITEM steals reference */ if (isJaeum(jamo->code)) { PyTuple_SET_ITEM(JaeumCodes, cur_jaeum++, unijamo); + Py_INCREF(unijamo); if (isChosung(jamo->code)) { jamo->orders[0] = cur_cho; jamo_chosung[cur_cho] = jamo; PyList_SET_ITEM(Chosung, cur_cho++, unijamo); + Py_INCREF(unijamo); PyDict_SetItemString(JaeumDict, jamo->name, unijamo); } if (isJongsung(jamo->code)) { jamo->orders[2] = cur_jong; jamo_jongsung[cur_jong] = jamo; PyList_SET_ITEM(Jongsung, cur_jong++, unijamo); + Py_INCREF(unijamo); PyDict_SetItemString(JaeumDict, jamo->name, unijamo); } multicls = JaeumMulti; } else { /* Moeum */ PyTuple_SET_ITEM(MoeumCodes, cur_moeum++, unijamo); + Py_INCREF(unijamo); if (isJungsung(jamo->code)) { jamo->orders[1] = cur_jung; jamo_jungsung[cur_jung] = jamo; PyList_SET_ITEM(Jungsung, cur_jung++, unijamo); + Py_INCREF(unijamo); PyDict_SetItemString(MoeumDict, jamo->name, unijamo); } multicls = MoeumMulti; @@ -715,10 +720,13 @@ PyDict_SetItem(multicls, unijamo, tmp); Py_DECREF(tmp); } + Py_DECREF(unijamo); } - Py_DECREF(JaeumDict); - Py_DECREF(MoeumDict); + Py_DECREF(Chosung); Py_DECREF(Jungsung); Py_DECREF(Jongsung); + Py_DECREF(JaeumDict); Py_DECREF(MoeumDict); + Py_DECREF(JaeumCodes); Py_DECREF(MoeumCodes); + Py_DECREF(JaeumMulti); Py_DECREF(MoeumMulti); } tmp = PyTuple_New(2); @@ -745,6 +753,7 @@ ErrorObject = PyErr_NewException("hangul.UnicodeHangulError", NULL, NULL); PyDict_SetItemString(d, "UnicodeHangulError", ErrorObject); + Py_DECREF(ErrorObject); /* Check for errors */ if (PyErr_Occurred()) |