On Fri, May 13, 2005 at 03:52:21PM +0100, Christophe Rhodes wrote:
> "R. Mattes" <rm@...> writes:
>
> > Hmm, so now we have a small but important semantic difference between
> > sb-md5:md5sum-sequence and md5:md5sum-sequence.
>
> Who is the maintainer of md5:md5sum-sequence, and do they know the
> implications of multiple string representations?
Kevin M. Rosenberg is the maintainer of the Debian package and since the
ASDF link on Cliki poiunts to his site i _assume_ it's him. The asdf file
mentins him as an author.
> > And, since (typep "string" 'sequence) is true it makes conditional
> > code rather more elaborate than necessary. What's the design
> > rationale behind this? After all strings _are_ sequences.
>
> Strings are sequences, but the md5 algorithm is defined over octets,
> not characters. As such, the name md5sum-sequence is a little
> misleading, but is a relic of the days when sbcl only knew about 256
> characters, and did not have any idea that there might be more than
> one possible encoding.
Yes, that name itself seems to come from Pierre Mai and his code originally
written for CMUCL.
> The sb-md5:md5sum-string entry point
> acknowledges the existence of multiple encodings for character data,
> and the sb-md5:md5sum-sequence entry point will shortly no longer
> accept general strings to prevent the user from shooting themselves in
> the foot.
Thanks, i think this is what i would have expected. I didn't shoot intentionally :-)
>
> Just to put this on a concrete footing, let me throw the question back
> to you: what, if md5sum-sequence works on strings, should
> (md5sum-sequence "?" return)
> [ where the character in the string is a euro sign ]
I'd expect one of two possible results:
- first: throw an error/condition (maybe only iff the codepoints of the string's characters
don't fit into 8 bits (taking advantage of the code point overlay of ASCII, latin-1
and Unicode).
- second: create the hash of the internal representation of the string. After all
the md5 algorithm is _always_ senitive to the binary representation. Will there be
a possible case in SBCL where the binary representation of to strings equal under
string= will differ? If not then i'd vote dor this solution.
One drawback of this solution: the md5 sum of a string would not necessarily match
that of a file containing the same string.
- third (just to make a mathematician nervous): have md5sum-sequence accept a keyword
:encoding. This would actually be backward compatible and (with :default as the default
encoding) would work as solution 2.
>
> > The "mysterious" thing is that md5sum-sequence does provide a result
> > (and doesn't throw a condition) - just a strange one.
>
> Probably it has been compiled with a low safety setting relative to
> speed, so that the usual checks are not performed.
Hmmm, beats me. This is on a "standard" Debian box. How would i check this?
>
> > Given the use of md5 in security relevant code i'm a bit worried.
>
> md5 in security-relevant code should probably be avoided, as it has
> "only" 128 bits of hash, which wouldn't be a problem were it not for
> the fact that it has been broken fairly comprehensively.
Still, it's used in several security relevant spots. Luckily, in my case it "only"
totally messed up a knowlege base (where object IDs are generated with a hash of
important properties (all 7-bit URIs).
Thanks for your input
ralfd
> Cheers,
>
> Christophe
|