|
From: Mark O'D. <mar...@gm...> - 2008-11-24 20:31:37
|
> > Slight aside. > What effect does this have on indexing? > I'm thinking in particular about ordering lists of names that are > encrypted - especially once you have large number of records. > Ah, Lester, noone will believe I didn't pay you to ask that question :-) it has an interesting answer. If your encryption worked, then the encrypted data looks totally random, and so there is (usually) little point trying to index it. Now, Alexandre suggested testing equality, that is true, in the old old days, you could change your Unix password to "password" then have a look at the /etc/passwd file and see if anyone else had exactly the same entry as you did, if they did, then you knew their password also had to be 'password' as well. It worked fairly well on guest accounts and the like (old Fb/Ib, before Alex's changes acted like that as well). So as a budding young hacker, one thing you could do, is build a database of SHA1/MD5 hashes containing all the words in the dictionary, and all the most common personal names, add all the special extentions of '1','2' onto the ends of the words, and all the 'E' to '3', 'o' to '0' conversions. Then you could checkout /etc/passwd maybe you had to steal it, but you could then check if any of the hashes matched those in your database if they did and then you would know that users password. Something similar was used recently by customs to prove that someone had child porn photos on their laptop, even though the customs people didn't actually see the photos on the laptop, they had just MD5'd all the files on that persons disk, and showed they were the same hashes as know child porn photos. And as it is with hashing, so it is with encryption, every encryption of the same object with the same keys gives you the same binary result, so you could tell object A was identical to object B, even though you didn't know what it was. But then along came salt, with salting you add some random bytes of data at the front or end of an object, say 20 bytes, you want know how many bytes, so you can remove them when you decrypt the object. But magically, since the 20 bytes will always be different, even if you encrypt or hash the same object many times, it will always give a completely different value, so now you have even hidden equality. This is used effectively in SSHA, invented at netscape, which is the method used for storing LDAP passwords. Now, you ask, what if we index first and then just encrypt the data record? Well it is not really a good idea, since most of the details of the object will be identifiable by analyzing the index, so you've given away what you were trying to hide. However Encrypting the whole database would work - slow but would work. And one final note :-). If you encrypt something, then try and compress it, that doesn't work very well either. Encryption tries to make the data look random, and compression, to do an effective compression, tries to find some order in the now random looking data. So if your encryption is any good, then it won't compress very much. So in practice, you should compress before you encrypt (but see note). Anyway, thanks for the chance to regurgitate, some odd little techie facts, I hope it was interesting :-) Cheers - Mark "the note": Theoretically speaking, if you compress and then encrypt, the resulting encrypted data is less random and is easier to break by brute force, than if you had encrypted just the raw data without compression - but thats the theory not what is done in practice. |