Re: Rsync friendly encryption

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Saul Hazledine wrote:

> Hello Shachar,
>  Competition is good and hopefully we can encourage each other to
> improve. I'm doing a course at University at the moment and am finding
> that I don't have much time to spend on murk. However, I did have a
> look at rsyncrypto when you first released it and have noticed the
> following:
>
> 1) You're using C++ and that will make development quicker for you.
> I'm currently stuck on a problem (building a pipeline which would
> allow different types of compression and encryption) which would be 
> solved in C++ but is hard going in C. Your program also works on
> Windows which is a bonus as murk is Unix only.

While the infrastructure supports that, I have no plans of supporting
any ciphers besides AES unless something bad happens to it.

> 2) I don't know if you've considered it (I had to be told by various
> people) but each time the encryption resets there are a few security
> issues with what we are both doing:

That and more. I'll send you my own cryptanalysis of rsyncrypto as soon
as I find it...

>   i) The compression leaves a known header which should be removed
> somehow.

Well, that's only cause you compress each block individually. Rsyncrypto
compresses the whole file as one unit prior to encrypting, which means
that the compression block resets are not in sync with the encryption
block resets.

>   ii) If the same key and IV are used when encryption resets , an
> attacker can use this to  get information about the plain text file
> (see the comments section of murk on Freshmeat).  I now change the IV
> each time but am not sure the method I use is secure or not.

I've been looking at this problem ever since I first started working on
rsyncrypto. I have done everything I can to reduce the problem, but I'm
fairly confident now that I do not intend to solve it completely.

There are two approached I can see to solving this problem. The first is
to choose an IV that is based on the unencrypted cipher in some way.
This approach fails when you realize that more blocks have to repeat
before the ciphertext becomes the same, but the core problem is not solved.

The second approach is to integrate the IV sequence number into the IV
selection function. This approach provides all the security in the
world, but fails the simple practicality test - if the change done to
the file between encryptions caused it's length to change enough that
the number of block resets before the end of the change is different,
the whole end of the file will now encrypt to a different byte stream
and the rsync efficiency is out of the window.

There was also a third approach, involving selecting really random IVs
for each block. It had serious practical problems for rsyncrypto's uses,
as it both require you to save a lot of state about the file to allow
re-encryption, and has problems in matching the IVs to the same places
they were used before.

What I did with rsyncrypto:
1. compress the entire file with rsync friendly compression. This means
that repetitions in the "plain text" (but compressed) file are less
likely, as the compression would have removed most of them. This has the
added benefit of greatly increasing the compression ratio.
2. Not have per block encryption or compression headers. The parameters
for the block reset decision function are coded in the file's header,
and the decryption simply uses the same decision function to know when
to reset the IV. Another added bonus is lower overhead.

I am considering other changes as well, but more on that later.

>  iii) Even if the encryption is perfect, some statistical analysis can
> be run on the encrypted files to guess the type of data in the file
> based on how often the encryption is reset and how many encrypted
> blocks are the same.

In a typical file encrypted with rsyncrypto:
1. It is impossible to know where the reset points are without having
the key.
2. There should no (or very few) repetitions.

This is not to say that it is impossible to know where a block starts.
Either the begining of the encryption block or it's end will leak if you
can capture an encrypted file both before and after a change (not both,
though, which is curious but furtunate, due to the fact that gzip also
has blocks, and also resets them). Still, the most important thing you
can say about the point of the block reset is that the decision function
fired at that point. I'm *fairly* certain that the decision function
leak very little information about the actual data (chosen plaintext
notwithstanding), and the fact that the information leaked about is the
compressed plaintext, rather than the plain plaintext, adds to the
difficulty of performing statistical analysis.

> 3) I don't know if you have changed this but when I first saw
> rsyncrypto you needed to have a patched version of gzip installed. I
> believe the licence allows you to carry your own version of libgz
> (this is something an interesting freshmeat project called zync does,
> zsync is also worth looking at). Users would find this much more
> convenient and you may get a performance increase.

I know, and it's on my "todo" list. It is not as high priority as some
of the other stuff I still have to do, so it will probably take a while.
I provide the patched gzip for Windows along with the zip file, Debian
already carries the proper gzip, and I'm not that interested in what
other platforms will do :-). I do supply the gzip patch in rsyncrypto's
"contrib" folder, to make other people's life SOMEWHAT easier.

> Apologies if any of my comments are out of date -- there have been
> lots of releases but I haven't had a chance to try them.

Do try the latest one. It encrypts the file names, which I think is a
major plus.

>
> Also, have you thought of integrating with GMail or GDrive somehow so
> that you could use all that useful space for backups? This seemed it
> could be a cool thing to do -- encrypt and backup to the email system.
> I think the principles are the same but a different transport (ie not
> rsync) would have to be written.

I think that if that's what you want to do, an encryption method that
does not suffer from rsyncrypto's (and murk's) deficiencies. It really
only makes sense to live with them if you are going to be using rsync on
the resulting files.

In another aspect, if you check out my sig you will notice that I sell
online backup services. It makes no sense to help people use the
competition.... :-)

> Best of luck, and please stay in touch.
> Saul

One general note. It seems that rsyncrypto is more tuned towards bulk
operations. You can tell it to encrypt and entire directory, or even a
list of directories and files. It will not use the same symmetric key OR
IV for two files in an encryption. I generally get the feeling that you
would want to invest more time in finding out what the use scenario for
murk is, and how to make it best fit that scenario. Also, opening up a
mailing list helps, I think. It's not that the patches for rsyncrypto
were flowing in (I got none), but people using the product, and the
occasional comment, are pretty priceless.

-- 
Shachar Shemesh
Lingnu Open Source Consulting ltd.
Have you backed up today's work? http://www.lingnu.com/backup.html