From: Brian M. <bma...@cs...> - 2004-11-30 13:52:32
|
Hello all, It's time to make the binaries for SBCL 0.8.17, and because of the new sb-unicode feature, I need to decide how to make the binaries. Right now, sb-md5 fails its tests on a sb-unicode binary. The way I see it, there are five options for how to make the binaries: 1. Don't make any. 2. Make the binaries without sb-unicode. Or, if we make the binaries with unicode, there are two options: 3. Touch test-passed in the sb-md5 directory and ship it, knowing that users who use sb-md5 will see their MD5 computed values change. 4. Don't touch test-passed and distribute without sb-md5. 5. Some combination of #3 or #4 and #2. My personal feeling is that option #3 is the right option: I'd like to ship with unicode to get users testing, and it doesn't make sense to not ship sb-md5 just because the tests haven't been fixed. However, I could see an argument for #2 - waiting until there are encode and decode to string functions before shipping binaries with unicode. #5 would be a lot of work. Advice? -- Brian Mastenbrook http://www.iscblog.info/ http://www.cs.indiana.edu/~bmastenb/ |
From: Christophe R. <cs...@ca...> - 2004-11-30 14:06:31
|
Brian Mastenbrook <bma...@cs...> writes: > 1. Don't make any. > 2. Make the binaries without sb-unicode. > Or, if we make the binaries with unicode, there are two options: > 3. Touch test-passed in the sb-md5 directory and ship it, knowing that > users who use sb-md5 will see their MD5 computed values change. > 4. Don't touch test-passed and distribute without sb-md5. > 5. Some combination of #3 or #4 and #2. > > My personal feeling is that option #3 is the right option: I'd like to > ship with unicode to get users testing, and it doesn't make sense to > not ship sb-md5 just because the tests haven't been fixed. However, I > could see an argument for #2 - waiting until there are encode and > decode to string functions before shipping binaries with unicode. #5 > would be a lot of work. Again, speaking personally, I'd rather #4; that way, if anyone feels that they need sb-md5 in their binaries, they have an incentive to get down and make it happen. I see #3 as dishonest -- what is the point of having tests for contrib modules which we have always said have a slightly less supported status than the main body of code, if at the least provocation we circumvent those tests? It's not just a question of fixing the tests: the interface itself is broken, and despite two calls for contributions remains so. I believe Kevin has chosen option #2 for his Debian uploads -- in some sense, I think that covers that base; it seems to me that #4 is the option that best reflects the state of SBCL development at release time. Cheers, Christophe |
From: William H. N. <wil...@ai...> - 2004-11-30 15:09:18
|
On Tue, Nov 30, 2004 at 02:06:18PM +0000, Christophe Rhodes wrote: > Brian Mastenbrook <bma...@cs...> writes: > > > 1. Don't make any. > > 2. Make the binaries without sb-unicode. > > Or, if we make the binaries with unicode, there are two options: > > 3. Touch test-passed in the sb-md5 directory and ship it, knowing that > > users who use sb-md5 will see their MD5 computed values change. > > 4. Don't touch test-passed and distribute without sb-md5. > > 5. Some combination of #3 or #4 and #2. > > > > My personal feeling is that option #3 is the right option: I'd like to > > ship with unicode to get users testing, and it doesn't make sense to > > not ship sb-md5 just because the tests haven't been fixed. However, I > > could see an argument for #2 - waiting until there are encode and > > decode to string functions before shipping binaries with unicode. #5 > > would be a lot of work. > > Again, speaking personally, I'd rather #4; that way, if anyone feels > that they need sb-md5 in their binaries, they have an incentive to get > down and make it happen. I see #3 as dishonest -- what is the point > of having tests for contrib modules which we have always said have a > slightly less supported status than the main body of code, if at the > least provocation we circumvent those tests? It's not just a question > of fixing the tests: the interface itself is broken, and despite two > calls for contributions remains so. > > I believe Kevin has chosen option #2 for his Debian uploads -- in some > sense, I think that covers that base; it seems to me that #4 is the > option that best reflects the state of SBCL development at release > time. I think I agree with you (CSR) here. Choice #3 seems like promising what SBCL doesn't (quite) deliver; I would prefer either #2 or #4. By and large, SBCL tends to err in the direction of not promising what it doesn't quite deliver. Not only is that direction my personal preference, but I also have a metapreference: whichever direction we prefer, we should try to prefer it consistently. Thus, if here we really wanted to have defaults and summary information which cheerfully promote things which sort of work, I'd be wondering whether we should start converting other things to a similar approach. Between #2 and #4 I don't have as strong a preference, but again I have a metapreference for having a reasonably consistent policy. #4 looks like an instance of a policy that contrib/ maintainers are fundamentally the ones responsible for chasing the main code (so when the main code changes -- Unicode in this case -- and breaks something in contrib/, the default policy is to release the main code and shut down sb-md5 until it catches up). #2 looks like an instance of a policy that the main system shouldn't do things which break the contribs, starting to promote them to an equal part of the main system. I think to the extent that we have an explicit policy, it currently looks like the first, corresponding to #4. And I think if it's time to make an exception to that policy, then maybe it's time to think about changing the policy instead. (I still prefer the #4-style policy, but I can see there are arguments both ways, and quite some time has passed, and many changes have occurred, since we argued about it last.) -- William Harold Newman <wil...@ai...> "But I'll forgive you a good deal for calling it 'Interesting but slightly mad.'" -- <http://groups.google.com/groups?q=ddfr+Interesting+but+slightly+mad> |
From: Christophe R. <cs...@ca...> - 2004-12-01 14:00:06
|
Adam Warner <li...@co...> writes: >> I believe Kevin has chosen option #2 for his Debian uploads -- in some >> sense, I think that covers that base; it seems to me that #4 is the >> option that best reflects the state of SBCL development at release >> time. > > Indeed I've just installed SBCL 0.8.17-2 from Debian Unstable and > unfortunately it doesn't have large character support compiled in. > > I find 8 bit strings that can store arbitrary octets useful and I'm > wondering if you will consider continuing support for them in preference > to the ASCII support you've mooted for the FFI. The fantastic property of > ISO-8859-1 and Unicode is that the first 256 characters of Unicode > map exactly onto ISO-8859-1. So if you define BASE-CHAR to be octets > (of ostensibly ISO-8859-1 encoded characters) then EXTENDED-CHAR naturally > maps from character code 256 onwards. In practice the only two useful > encodings will be BASE-CHAR and CHARACTER. For the FFI, my current working model is as follows: char * <=> (* sb-alien:char) <=> (array ([un]signed-byte 8) (*)) char[] <=> c-string <=> [base-]string <=> iso-8859-1-string <=> [base-]string <=> utf8-string <=> [base-]string ... where the consequences "are undefined" if the alien code modifies the contents of its arguments in the case of the foo-string types. I could be wrong about this, but I think that there are different use cases for arrays of octets and arrays of things-which-are-going-to-be-treated-as-characters, and I don't think that they overlap so much. > I have one situation where I'm encoding binary within strings over a > socket connection that can only transfer text. The lowest common > denominator is ASCII so I have 7 bits (effectively 6 bits to avoid control > characters). If the implementation supports strings of octets then I'm > only using (8-6)/6 = 33% more memory. If one implementation only supports > 32 bit characters then I'm using (32-6)/6 = 433% more memory. I'm not sure I understand this as an argument against ASCII BASE-CHAR; if the lowest common denominator is ASCII, then a BASE-CHAR=ASCII representation gives you what you need, does it not? The base-string type isn't going away; it's mandated by ANSI, and has (in the Unicode-enabled SBCL) a 7-bit range including all of ASCII; it also has the potential to use only 16% more memory. :-) > It also sounds like ongoing support for strings of octets would help the > porting of sb-md5 (add :element-type 'base-char when making strings). No, I think this is completely orthogonal; and in particular I'm of the firm opinion that there should be no guarantee, implicit or explicit, that lisp objects are laid out in a particular way. In particular, having the return values of sb-md5 depend on the representation of the string is (I hope you'll agree, but maybe you won't) completely misguided. (sb-md5:md5sum-string "a") should be the same as (sb-md5:md5sum-string (coerce "a" 'base-string)). The problem with sb-md5 as it stands is that we don't even have this interface at all. A string-to-octets and octets-to-string pair (with an external-format keyword argument). > People could also store UTF-8 encoded text in the strings of octets. > There's no decoding overhead. You just store the octets as read instead of > the implementation having to decode the stream of octets and store > them as code points within the CHARACTER data structure. > > It's also possible to encode extra information within the string such as > an escape character. Some values in UTF-8 are undefined such as 255. > Such strings with undefined UTF-8 sequences can be portably transferred as > ISO-8859-1 while additionally interpreting and unescaping the UTF-8 > sequence at the other end. All of this sounds like a good argument for having an underlying data type with an 8-bit field size, but that exists already as an (array (unsigned-byte 8) (*)) -- I can't see anything in the above argument which relies on the array element type being a subtype of CHARACTER... > There are many reasons why strings of octets would continue to be > useful in an implementation that has Unicode support. This advice may be > helpful but feel free to ignore it without explanation. Your thoughts are welcome. Thank you. Cheers, Christophe |