From: Ivan B. <bol...@cg...> - 2005-06-25 14:27:20
Attachments:
koi8-r.lisp
|
Here is a patch. |
From: Rudi S. <ru...@co...> - 2005-06-27 09:06:41
Attachments:
PGP.sig
|
On 25. Jun 2005, at 15:35, Ivan Boldyrev wrote: > Here is a patch. > > Now I'm going to implement other Cyrillic charsets: windows-1251, > iso-8859-5, cp886. Very cool. I'll commit this first thing after the current freeze. Looking forward to the other formats as well. Many thanks, Rudi |
From: Ivan B. <bol...@cg...> - 2005-06-27 14:29:41
Attachments:
octets.lisp.patch
|
On 9153 day of my life Rudi Schlatte wrote: > On 25. Jun 2005, at 15:35, Ivan Boldyrev wrote: > >> Here is a patch. >> >> Now I'm going to implement other Cyrillic charsets: windows-1251, >> iso-8859-5, cp886. > > Very cool. I'll commit this first thing after the current freeze. > Looking forward to the other formats as well. 1. iso-8859-5 doesn't cover all codes #x00-#xff, there is "window" of undefined characters at #x80-#x9F. How should I signal about wrong codes at IN-EXPR clause of DEFINE-EXTERNAL-FORMAT? 2. Can I use in fd-stream.lisp functions that are defined in octets.lisp? Otherwise I'll have to duplicate conversion code. 3. Is the following patch fine? With it, I can declare iso-8859-5 in this way: (define-unibyte-mapper iso-8859-5->code-mapper code->iso-8859-5-mapper (#x80 nil) (#x81 nil) (#x82 nil) ... (#x9f nil) (#xa0 #x00A0) ; NO-BREAK SPACE (#xa1 #x0401) ; CYRILLIC CAPITAL LETTER IO (#xa2 #x0402) ; CYRILLIC CAPITAL LETTER DJE ... (#xff #x045F)) ; CYRILLIC SMALL LETTER DZHE |
From: Christophe R. <cs...@ca...> - 2005-06-27 14:58:51
|
Ivan Boldyrev <bol...@cg...> writes: > On 9153 day of my life Rudi Schlatte wrote: >> On 25. Jun 2005, at 15:35, Ivan Boldyrev wrote: >> >>> Here is a patch. >>> >>> Now I'm going to implement other Cyrillic charsets: windows-1251, >>> iso-8859-5, cp886. >> >> Very cool. I'll commit this first thing after the current freeze. >> Looking forward to the other formats as well. > > 1. iso-8859-5 doesn't cover all codes #x00-#xff, there is "window" of > undefined characters at #x80-#x9F. How should I signal about wrong > codes at IN-EXPR clause of DEFINE-EXTERNAL-FORMAT? I'm not sure about this, but I think that common use of the ISO-8859-x code tables implies also ISO/IEC 6429 / ECMA-048, which specifies the layout of stuff in the control ranges (#x00--#1f and #x80--#x9f). Basically, it's likely to be exactly the same as latin-1. Cheers, Christophe |
From: Rudi S. <ru...@co...> - 2005-06-27 15:11:07
|
On 27. Jun 2005, at 16:23, Ivan Boldyrev wrote: > 1. iso-8859-5 doesn't cover all codes #x00-#xff, there is "window" of > undefined characters at #x80-#x9F. How should I signal about wrong > codes at IN-EXPR clause of DEFINE-EXTERNAL-FORMAT? signal a stream-decoding-error (fd-stream.lisp line 117 or thereabouts). If an attempt is made to write a chinese character to your iso-8859-5 stream, signal a stream-encoding-error. Appropriate restarts are provided for these conditions in stream functions. > 2. Can I use in fd-stream.lisp functions that are defined in > octets.lisp? Otherwise I'll have to duplicate conversion code. Yes; if it builds, it's fine. (Don't let anyone stop you from unifying the stream / alien string-to-octets-and-back functionality, either :) ) > 3. Is the following patch fine? With it, I can declare iso-8859-5 in > this way: Looks fine, if it doesn't break any other encodings. Be sure to resend it with the other patches so it doesn't get lost. ObNikodemus: it would be nice to have some encoding tests (similar to what NIIMI Satoshi did in tests/eucjp-impure.lisp), to make sure your work isn't broken during the eventual reorganization of external- format code. Maintainers get a stern look and a slap on the wrist when checking in code with failing tests, but a regression in one of the lesser-used encodings might slip in unnoticed. Cheers, Rudi |
From: Rudi S. <ru...@co...> - 2005-06-28 09:13:08
Attachments:
PGP.sig
|
On 25. Jun 2005, at 15:35, Ivan Boldyrev wrote: > Here is a patch. Thanks, committed as version 0.9.2.1, opening what promises to be an exciting month for sbcl. Rudi |
From: Ivan B. <bol...@cg...> - 2005-06-28 14:28:28
Attachments:
cyrillic.lisp
octets.lisp.patch
|
On 9154 day of my life Rudi Schlatte wrote: > On 25. Jun 2005, at 15:35, Ivan Boldyrev wrote: >> Here is a patch. > > Thanks, committed as version 0.9.2.1, opening what promises to be an > exciting month for sbcl. Oops :) I have gathered Cyrillic charsets in one file: cyrillic.lisp. |
From: Rudi S. <ru...@co...> - 2005-06-28 15:00:11
Attachments:
PGP.sig
|
On 28. Jun 2005, at 16:21, Ivan Boldyrev wrote: > Oops :) I have gathered Cyrillic charsets in one file: cyrillic.lisp. Fantastic, thanks! > So koi8-r.lisp should be removed. Will do. > I also didn't prepend code with something like #!+sb-unicode, I don't > know if it should be. No need, that's taken care of in build-order.lisp-expr. I have some busy days ahead of me, so committing could be delayed until early next week (depending on caffeine level etc). In any case, sbcl 0.9.3 will come with cyrillic support, which makes me happy. Cheers, Rudi |
From: Ivan B. <bol...@cg...> - 2005-06-30 15:23:59
|
On 9154 day of my life Rudi Schlatte wrote: >> So koi8-r.lisp should be removed. > > Will do. Now I have another idea: collect ISO-8859 encodings in one file, collect Microsoft (windows-xxxx) and IBM (cpxxx) in other files. So, don't remove koi8-r.lisp for a while, I will prepare some other encodings (not just Cyrillic). =2D-=20 Ivan Boldyrev Sorry my terrible English, my native language is Lisp! |
From: Nikodemus S. <tsi...@cc...> - 2005-06-30 15:53:40
|
On Thu, 30 Jun 2005, Ivan Boldyrev wrote: > Now I have another idea: collect ISO-8859 encodings in one file, > collect Microsoft (windows-xxxx) and IBM (cpxxx) in other files. So, > don't remove koi8-r.lisp for a while, I will prepare some other > encodings (not just Cyrillic). The number of encodings SBCL may eventually support is sufficiently large that maybe we should consider src/code/external-formats/ or similar. Cheers, -- Nikodemus Schemer: "Buddha is small, clean, and serious." Lispnik: "Buddha is big, has hairy armpits, and laughs." |
From: Rudi S. <ru...@co...> - 2005-07-01 10:40:53
Attachments:
PGP.sig
|
On 30. Jun 2005, at 17:42, Nikodemus Siivola wrote: > The number of encodings SBCL may eventually support is sufficiently > large that maybe we should consider src/code/external-formats/ or > similar. > Agreed. Also some of the contents of fd-streams.lisp (utf-8 and friends and EBCDIC) could be moved there. Rudi |
From: Ivan B. <bol...@cg...> - 2005-06-28 14:27:44
|
On 9153 day of my life Rudi Schlatte wrote: > On 25. Jun 2005, at 15:35, Ivan Boldyrev wrote: > >> Here is a patch. >> >> Now I'm going to implement other Cyrillic charsets: windows-1251, >> iso-8859-5, cp886. > > Very cool. I'll commit this first thing after the current freeze.=20=20= =20 So, it will be available in SBCL 0.9.3? =2D-=20 Ivan Boldyrev Outlook has performed an illegal operation and will be shut down. If the problem persists, contact the program vendor. |