Re: Rsync friendly encryption
Brought to you by:
thesun
From: Shachar S. <rsy...@sh...> - 2005-11-08 15:19:56
|
Saul Hazledine wrote: > Hello Shachar, > Competition is good and hopefully we can encourage each other to > improve. I'm doing a course at University at the moment and am finding > that I don't have much time to spend on murk. However, I did have a > look at rsyncrypto when you first released it and have noticed the > following: > > 1) You're using C++ and that will make development quicker for you. > I'm currently stuck on a problem (building a pipeline which would > allow different types of compression and encryption) which would be > solved in C++ but is hard going in C. Your program also works on > Windows which is a bonus as murk is Unix only. While the infrastructure supports that, I have no plans of supporting any ciphers besides AES unless something bad happens to it. > 2) I don't know if you've considered it (I had to be told by various > people) but each time the encryption resets there are a few security > issues with what we are both doing: That and more. I'll send you my own cryptanalysis of rsyncrypto as soon as I find it... > i) The compression leaves a known header which should be removed > somehow. Well, that's only cause you compress each block individually. Rsyncrypto compresses the whole file as one unit prior to encrypting, which means that the compression block resets are not in sync with the encryption block resets. > ii) If the same key and IV are used when encryption resets , an > attacker can use this to get information about the plain text file > (see the comments section of murk on Freshmeat). I now change the IV > each time but am not sure the method I use is secure or not. I've been looking at this problem ever since I first started working on rsyncrypto. I have done everything I can to reduce the problem, but I'm fairly confident now that I do not intend to solve it completely. There are two approached I can see to solving this problem. The first is to choose an IV that is based on the unencrypted cipher in some way. This approach fails when you realize that more blocks have to repeat before the ciphertext becomes the same, but the core problem is not solved. The second approach is to integrate the IV sequence number into the IV selection function. This approach provides all the security in the world, but fails the simple practicality test - if the change done to the file between encryptions caused it's length to change enough that the number of block resets before the end of the change is different, the whole end of the file will now encrypt to a different byte stream and the rsync efficiency is out of the window. There was also a third approach, involving selecting really random IVs for each block. It had serious practical problems for rsyncrypto's uses, as it both require you to save a lot of state about the file to allow re-encryption, and has problems in matching the IVs to the same places they were used before. What I did with rsyncrypto: 1. compress the entire file with rsync friendly compression. This means that repetitions in the "plain text" (but compressed) file are less likely, as the compression would have removed most of them. This has the added benefit of greatly increasing the compression ratio. 2. Not have per block encryption or compression headers. The parameters for the block reset decision function are coded in the file's header, and the decryption simply uses the same decision function to know when to reset the IV. Another added bonus is lower overhead. I am considering other changes as well, but more on that later. > iii) Even if the encryption is perfect, some statistical analysis can > be run on the encrypted files to guess the type of data in the file > based on how often the encryption is reset and how many encrypted > blocks are the same. In a typical file encrypted with rsyncrypto: 1. It is impossible to know where the reset points are without having the key. 2. There should no (or very few) repetitions. This is not to say that it is impossible to know where a block starts. Either the begining of the encryption block or it's end will leak if you can capture an encrypted file both before and after a change (not both, though, which is curious but furtunate, due to the fact that gzip also has blocks, and also resets them). Still, the most important thing you can say about the point of the block reset is that the decision function fired at that point. I'm *fairly* certain that the decision function leak very little information about the actual data (chosen plaintext notwithstanding), and the fact that the information leaked about is the compressed plaintext, rather than the plain plaintext, adds to the difficulty of performing statistical analysis. > 3) I don't know if you have changed this but when I first saw > rsyncrypto you needed to have a patched version of gzip installed. I > believe the licence allows you to carry your own version of libgz > (this is something an interesting freshmeat project called zync does, > zsync is also worth looking at). Users would find this much more > convenient and you may get a performance increase. I know, and it's on my "todo" list. It is not as high priority as some of the other stuff I still have to do, so it will probably take a while. I provide the patched gzip for Windows along with the zip file, Debian already carries the proper gzip, and I'm not that interested in what other platforms will do :-). I do supply the gzip patch in rsyncrypto's "contrib" folder, to make other people's life SOMEWHAT easier. > Apologies if any of my comments are out of date -- there have been > lots of releases but I haven't had a chance to try them. Do try the latest one. It encrypts the file names, which I think is a major plus. > > Also, have you thought of integrating with GMail or GDrive somehow so > that you could use all that useful space for backups? This seemed it > could be a cool thing to do -- encrypt and backup to the email system. > I think the principles are the same but a different transport (ie not > rsync) would have to be written. I think that if that's what you want to do, an encryption method that does not suffer from rsyncrypto's (and murk's) deficiencies. It really only makes sense to live with them if you are going to be using rsync on the resulting files. In another aspect, if you check out my sig you will notice that I sell online backup services. It makes no sense to help people use the competition.... :-) > Best of luck, and please stay in touch. > Saul One general note. It seems that rsyncrypto is more tuned towards bulk operations. You can tell it to encrypt and entire directory, or even a list of directories and files. It will not use the same symmetric key OR IV for two files in an encryption. I generally get the feeling that you would want to invest more time in finding out what the use scenario for murk is, and how to make it best fit that scenario. Also, opening up a mailing list helps, I think. It's not that the patches for rsyncrypto were flowing in (I got none), but people using the product, and the occasional comment, are pretty priceless. -- Shachar Shemesh Lingnu Open Source Consulting ltd. Have you backed up today's work? http://www.lingnu.com/backup.html |