Thread: command line syntax
Brought to you by:
thesun
From: Tom M. <tme...@vl...> - 2005-08-23 05:14:15
|
The command line is specified as: rsyncrypto [options] srcfile dstfile keyfile key which seems a little bit awkward, as the 3rd argument is sort of optional, but the 4rd is required. It also interleaves input and output files. I'd suggest: rsyncrypto [options] pubkeyfile srcfile dstfile with encryption examples looking like: rsyncrypto pubkeyfile srcfile dstfile rsyncrypto --key=symkeyfile pubkeyfile srcfile dstfile rsyncrypto -r pubkeyfile srcdir dstdir rsyncrypto -r --key=symkeydir pubkeyfile srcdir dstdir rsyncrypto --filelist pubkeyfile listfile dstdir and decryption examples looking like: rsyncrypto -d prvkeyfile|symkeyfile srcfile dstfile rsyncrypto -d -r prvkeyfile|symkeydir srcdir dstdir Because only one of the two key files is needed, things get a bit messy. In the second case, it's easy enough to distinguish a file from a directory, but can you distinguish an rsyncrypto generated key from a public key? (Should be doable. One is binary.) Another approach might be to use named parameters for everything: rsyncrypto --pk=pubkeyfile --if=srcfile --of=dstfile which is more cumbersome to type, but makes parameters position independent and makes it cleaner to handle things like decryption and the --filelist parameter: rsyncrypto -d --pk=pubkeyfile --filelist=listfile --of=dstdir -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Shachar S. <rsy...@sh...> - 2005-08-23 07:21:02
|
Ok, Tom. You seem to have spent some time in trying to understand rsyncrypto, but spread your conclusions among three emails :-). I'll try to answer in a condensed manner. Tom Metro wrote: > The command line is specified as: > > rsyncrypto [options] srcfile dstfile keyfile key > > which seems a little bit awkward, as the 3rd argument is sort of > optional, but the 4rd is required. All arguments are required. Src and dst are required for obvious reasons. I'm afraid you slightly misunderstood the remaining two arguments, which is why you think only one of them is required at every given point. Let's begin with the fourth argument - the RSA key (also known as assymetric key). At some point called one of the keys "symetric RSA key". There is no such thing. RSA is always assymetric, AES is always symmetric. That's just how the algorithms are built. Strike that. Let's first talk about encryption, and then decryption. The fourth key has one function when encrypting. When encrypting, the fourth key is used to embed a RSA encrypted version of the symmetric key into the encrypted file. Since RSA encryption requires just the public key, but the public key can be built from the private key, when encrypting either the private or the public RSA key can be given as a fourth argument. The third argument, which is where the symmetric AES key is stored, needs to be a file name when encrypting. If this file name exists, then the key, encryption parameters and IV are taken from it. This is done so that repeated encryption of the same file will be possible with minimal changes to the file. It will do our rsync friendliness no good to re-encrypt a file with a different key. If the third argument is a name of a non-existing file, then rsyncrypto assumes that this is the first time such a file is encrypted, and will create the symmetric key file. This is so that future encryption will be able to use the same encryption parameters, and thus create a file that is similar. Now let's talk about decryption. There are two possible decryption modes. One is "warm restore", where you still have the symmetric key file for the file you are trying to decrypt. If that's the case, you still need to provide the (RSA assymetric) public key. This is mostly for the technical reason that rsyncrypto needs to know how big the encrypted symmetric key is in the file, and this depends on the size of the public key. We could save it in the unencrypted key, but as you should never ever lose the private key, much less the public key, this did not seem like a big enough trouble to justify the extra work. Also, this gives rsyncrypto a consistant command line across all invocations. So, for warm restore, the public or private RSA key must be given as a fourth argument, and an existing symmetric key file must be given for third argument. In the case of a total loss of all files, it is assumed you will still have the private key. In the cold restore scenario, the third argument will point to a non-existing file. This file will be created during the restore process, so that future encryption and warm decryption can take place. The fourth argument in such a case MUST be the private key, as we will need to open the RSA encryption of the symmetric key inside the encrypted file. I hope this sheds more light on the command line uses. > It also interleaves input and output files. I'd suggest: > > rsyncrypto [options] pubkeyfile srcfile dstfile Sorry, interleaving input and output file is pretty much unavoidable. Sometimes the symmetric key file is input, sometimes it's output. Putting the symmetric key between the source and the destination makes no sense. Also, as rsyncrypto never accepts a file list (unless via file or stdin), I allowed myself not to put the file as the last argument, deviating from the standard unix argument order. In any case, notice how the above example is wrong, as you need all four arguments for all invocation modes. > with encryption examples looking like: > rsyncrypto pubkeyfile srcfile dstfile > rsyncrypto --key=symkeyfile pubkeyfile srcfile dstfile > rsyncrypto -r pubkeyfile srcdir dstdir > rsyncrypto -r --key=symkeydir pubkeyfile srcdir dstdir > rsyncrypto --filelist pubkeyfile listfile dstdir I don't think it's right to have an optional symmetric key. Not storing a symmetric key would mean that you have to provide both a private key (rather than a public key today) and a previously encrypted file (the whole size of the file, rather than just 60 or so bytes) during encryption. Right now, you have very little state to store between encryptions, and you don't need the private key anywhere near the encrypting machine unless a total disaster happens. > Because only one of the two key files is needed, things get a bit > messy. In the second case, it's easy enough to distinguish a file from > a directory, but can you distinguish an rsyncrypto generated key from > a public key? (Should be doable. One is binary.) Like I said, both keys are needed always. Moving on to your other email: > After reading the "Quick use guide" here: > http://sourceforge.net/docman/display_doc.php?docid=26727&group_id=129038 > > I grabbed a certificate (cert) and private key (key) created with > OpenSSL on my Debian box and attempted to run rsyncrypto 0.14 on a > Windows NT system: > > % rsyncrypto.exe src/file cryp/file key cert > Invalid version magic: That's because you used an assymetric key where a symmetric key was needed. > Shachar Shemesh writes: > > AES keys (a.k.a. symmetric keys) are generated automatically by > > rsyncrypto per encrypted files. This is what is stored in the "key" > > file name you specify as the third parameter. > > Ah, so rsyncrypto creates the key file, rather than reading it. The > guide, when it says stuff like: > > If [the key file] already exists, the keys in it will be used for > this encryption. > > or > It can either exist, in which case the next parameter can be an x509 > certificate with a public key, or not exist, in which case the next > parameter needs to be the private key corresponding to the public > key used during encryption. In the later case, the key file will be > created after decryption. > > makes it sound like supplying a key is required for encryption. Which key? Supplying an RSA key is required for encryption. Supplying a symmetric, AES key is not required. One will be created if you don't supply it. Maybe the guide needs to be more elaborate on the difference between the two keys, but it is otherwise correct. > After this was clarified, it made a lot more sense why the key > parameter is a directory when you specify the -r switch. > > > > The rsyncrypto manual points you to the req(1) and x509(1) manual > > pages of openssl. > > Manual? Ah, I see it now in the source package. Apparently it didn't > make it into the Win32 package. It's not rsyncrypto's job to provide the man pages for openssl. Both req(1) and x509(1) are openssl commands. > The man page says: > > rsyncrypto will encrypt files using a symmetric block > cipher (AES). Each file is encrypted using a unique key. The > file key is stored in two locations. One is the "key" file, and > the second is inside the encrypted file itself. The second copy > is encrypted using a RSA public key, which can be shared for > all encrypted files. > > which is a pretty clear description. However, saying "each file is > encrypted using a unique key," is still vague. Unique AES key, as mentioned in the previous sentence. > Where does the key come from? Apparently it generated by rsyncrypto. > That should be specified. Something like..."rsyncrypto generates a > 128-bit (see -b switch) symmetric RSA An RSA key cannot be symmetric, and in any case, rsyncrypto cannot generate the RSA key for you. It's your job to create it. Rsyncrypto only generates the AES keys. > key and encrypts the file using symmetric block cipher (AES). It saves > a copy of this key both in the encrypted file and optionally at the > location specified by 'symkeyfile'. The former is encrypted using the > 'pubkeyfile' (an X509 certificate), which is shared among all files, > if processing multiple files." Aside from the above comment, and the fact that the symmetric key is not optional, feel free to send a patch against the documentation :-) > Taking a step back for a moment, what's the benefit of having the > individual keys stored in files? I'm sure there is some performance > benefit when decrypting, and when encrypting > but for the typical user it seems like it would introduce an > unnecessary complication and more files that need to be managed. See begining of email for explanation. In short - it allows us to not store the RSA private key, which is sensitive information, on the machine that does the encryption. > Have you considered having generation of the key files being optional > and only enabled if a switch is specified? I have now, and rejected the notion :-). Feel free to try and convince me based on the newly gained knowledge. > > Off the top of my head, the command line to generate would > > probably be something like: > > > > openssl req -new -nodes -x509 -out backup.crt -keyout backup.key > > I think that worked. It got rid of the 'Invalid version magic' error. > Be good to add this to the man page. Actually, there should be a > mini-HOWTO in the manual that explains how to generates the > public/private keys, and how they relate to rsyncrypto. Much of the > raw info for that is in the email I'm quoting. It would be a greate service to the project if you wrote one, as I'm out of time for such things right now. > > The *.crt file is the certificate (public key) file. rsyncrypto > > ignores just about all fields of the resulting certificate except > > the actual key. > > Then I'm not sure why rsyncrypto didn't like my original certificate. > It looked essentially the same, but had a bunch of headers before the > 'BEGIN CERTIFICATE' block. (The original certificate was generated > without the -nodes and -x509 options.) there is a but in rsyncrypto, which means it cannot handle encrypted certificates. The "nodes" option means that the private key is not encrypted. This may be your problem (only applies to the key, not the certificate). Then again, the problem is more likely to do with you passing a private RSA key where an AES key was expected. > So after putting the new certificate (cert) and private key in place, > I eventually got it to work in the recursive mode, after creating > empty destination directories for the encrypted files (cbin) and the > symmetric keys (keys): > > % rsyncrypto.exe -r bin\ cbin\ keys\ cert > > and decrypted the files using: > > % rsyncrypto.exe -d -r cbin\ ubin\ keys\ cert > > > Along the way I erroneously used -c instead of -r and again got some > less than useful error messages: > > % rsyncrypto.exe -c bin cbin keys cert > file open failed: Input/output error > > % rsyncrypto.exe -c bin\ cbin\ keys\ cert > file open failed: No such process I'm not sure why that is. > Also, after encrypting an individual file like so: > > % rsyncrypto.exe bin/rsync.exe cbin/rsync.exe rsync.exe.key cert > > when attempting to decrypt it using: > > % rsyncrypto.exe -d cbin/rsync.exe test.exe rsync.exe.key > or > % rsyncrypto.exe -d cbin/rsync.exe test.exe cert > or > % rsyncrypto.exe -d cbin/rsync.exe test.exe private > > rsyncrypto crashes. The reason for the crash is the same reason I gave during your first email. If you don't pass all mandatory arguments (the four file names) libargtable segfaults. > The second variation should have failed to decrypt, but the other two > should have worked - no? no. See above. > If not, how do you decrypt a file when you've lost the symkey? > Apparently like this: > > % rsyncrypto.exe -d cbin/rsync.exe test.exe foo private > > where 'foo' is a non-existent file, which gets created. Technically > the "Quick use guide" accurately describes how this works, but it's > far from intuitive. How else do you expect to recover the lost symmetric key? > Regardless, none of them should have crashed. See previous email. I'm not yet sure whether it's a bug in argtable or rsyncrypto's use of argtable, but as it's non-crucial, I gave it a lower priority. > But: > > % rsyncrypto.exe -d cbin/rsync.exe test.exe rsync.exe.key private > or > % rsyncrypto.exe -d cbin/rsync.exe test.exe rsync.exe.key cert > > works, even though the second one doesn't make much sense. If you have the symmetric key, you don't need the private key in order to recover the data (warm restore). Try deleting rsync.exe.key, and then try the second form again.... > I don't have a convenient build environment on Windows, but if I get > the build problems worked out on my Debian box I'll see about adding > verbosity to the error messages and look into the above crashes. It seems you already have the build env for debian. As for the crash - it somewhere in the argtable invocation when not enough mandatory arguments were supplied. Also, if you are going to be hacking the code, please try to work off CVS. The "C" in "CVS" does, after all, stand for "concurrent", and I'm doing rsyncrypto hacking too these days. > -Tom Like I said - amendments and additions to the documentation will probably be even more welcome than code hacks. If you do send patches, please use "cvs diff -u" format. Shachar |
From: Tom M. <tme...@vl...> - 2005-08-23 20:14:46
|
Shachar Shemesh wrote: > Not storing a symmetric key would mean that you have to provide both a > private key (rather than a public key today) and a previously encrypted > file (the whole size of the file, rather than just 60 or so bytes) > during encryption. Right now, you have very little state to store > between encryptions... Ah, this is a critical bit of rsyncrypto "philosophy" that I was missing. I ran across mention of this concept - needing only a small amount of state information - (though I don't see it in the guide, the man page, or the email I was quoting from) but it didn't add up at the time. So this explains why key files exist as separate files. The idea is that you encrypt a batch of files, rsync the encrypted files to some remote location, then you have the option of blowing away the encrypted files - treating them essentially as temporary files. Subsequent encryption runs then make use of your key files to produce new encrypted files that are minimally changed from the previous encryption cycle. The reason why this logic wasn't apparent to me initially is that in my mind I was planning on keeping the encrypted files locally, so the keys are simply redundant. If we presume that change is infrequent among the file set (reasonably safe, otherwise rsync wouldn't be an advantage) and that storage is probably a less limited resource than CPU in many cases, I would expect keeping around a copy of the encrypted files to be a win, and thus a popular approach among rsyncrypto users. You must have been thinking along these lines as well, as you have a -c switch that skips encrypting files that haven't changed. (Have you considered optionally using MD5/SHA hashes instead of relying on timestamps?) (In an ideal scenario, you'd have encryption hardware, and you could simply patch rsync to perform its differencing against an encrypted version of the source file in memory. That plus a bit of meta data stored in the file system to make it easier to determine which files have changed, ought to do it. But back to reality...) > The third argument...the symmetric AES key...[is used] so that repeated > encryption of the same file will be possible with minimal changes to the > file. It will do our rsync friendliness no good to re-encrypt a file > with a different key. Right, another important point. As I was playing around with rsyncrypto repeatedly encrypting the same file I noticed a different key was generated each time. That made me wonder how changes could be minimized in the crypt text, but before I followed through on that thought, I went on to other things. Of course the answer, as you point out, is that if the key already exists, it's used for all subsequent encryptions of that file, thus keeping the crypt text deterministic. This might also be thought of as a justification for having separate key files, but the key files can be extracted from the encrypted files in cases where the user is keeping them on hand. Although, that then means... > In the case of a total loss of all files, it is assumed you will still > have the private key. ... The fourth argument in such a case MUST be the > private key, as we will need to open the RSA encryption of the symmetric > key inside the encrypted file. ...you need to have the private key on hand, even when encrypting, just so you can extract the symmetric key. This leads to your point: > ...it allows us to not store the RSA private key, which is sensitive > information, on the machine that does the encryption. Hmmm...is keeping the private key "private" really an advantage, given the way rsyncrypto works? Conceptually, this sounds appealing. I could see, for example, not wanting to have my private key be sitting on a shared web server that I might be rsyncing encrypted files from. Lets consider some scenarios: Lets assume on the machine doing the encryption, I only have the public key, and that machine gets compromised. As an attacker, what are my options? I can 1. use the symmetric keys, which rsyncrypto currently encourages me to keep on hand, and decrypt the files, or 2. I can look at the plain text of the original files, because they exist on the machine doing the encryption. If some intermediary machine is compromised, say the machine where the encrypted files are archived, having the private key on the encryption machine hasn't altered the situation. Sure, you could come up with a scenario where the encryption machine is like an appliance and only momentarily stores the plain text and key files before discarding them, but that doesn't seem to describe typical usage. And no doubt that the private key is more valuable, in that it acts like a master key to unlock all your files. But practically speaking I don't think keeping the private key any more secure than the original plain text is buying you added security, and the "master key" nature can be mitigated by using different private keys for each collection of files. A typical PKI scenario - where one party might be less trusted and can only encrypt using the public key - doesn't really fit rsyncrypto, as the party doing the encryption only needs the symmetric keys to be able to extract the plain text, and rsyncrypto gives them the keys (which they are then free to distribute to third parties). Of course the encrypting party also has access to the plain text. PKI is more about authentication than data protection. > ...the public key can be built from the private key, when encrypting > either the private or the public RSA key can be given as afourth argument. Even better. So that means a user who is looking for a practical (rather than maximum) security environment may choose to ignore the symmetric keys (if there was an option to turn them off) and use a single private key for all operations. I'd start with this as the basic model for rsyncrypto, while continuing to support the more complex approaches for those who desire the greater security. > I'm afraid you slightly misunderstood the remaining two > arguments, which is why you think only one of them is required at every > given point. I'm not sure it's a misunderstanding so much as a different perspective. If what I've discussed above is correct, then it does seem possible to eliminate the key files. >>Have you considered having generation of the key files being optional >>and only enabled if a switch is specified? > > I have now, and rejected the notion :-). Feel free to try and convince > me based on the newly gained knowledge. I gave it a shot. Did it work? > I hope this sheds more light on the command line uses. Yes, indeed. Thanks for the detailed reply. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Shachar S. <rsy...@sh...> - 2005-08-24 10:55:08
|
Tom Metro wrote: > If we presume that change is infrequent among the file set granted > and that storage is probably a less limited resource than CPU in many > cases, ok > I would expect keeping around a copy of the encrypted files to be a > win, and thus a popular approach among rsyncrypto users. Huh? If storage is cheap compared to CPU, wouldn't keeping around an extra 60 bytes file be better than decrypting said file from within the big file better? > (In an ideal scenario, you'd have encryption hardware, and you could > simply patch rsync to perform its differencing against an encrypted > version of the source file in memory. That would be extremely difficult, due to the way rsync works. >> ...it allows us to not store the RSA private key, which is sensitive >> information, on the machine that does the encryption. > > > Hmmm...is keeping the private key "private" really an advantage, given > the way rsyncrypto works? Not mandating it is an advantage. Whether it really does help in each and every scenario - I don't know. Bear in mind that rsyncrypto was designed to be a tool in the Lingnu online backup service (http://www.lingnu.com/backup.html). It is therefor that scenario that takes outmost precedence in allocating my personal time. If you can come up with a usage scenario that does not break the interface consistency, I'll be glad to put it in. >> ...the public key can be built from the private key, when encrypting >> either the private or the public RSA key can be given as afourth >> argument. > > > Even better. So that means a user who is looking for a practical > (rather than maximum) security environment may choose to ignore the > symmetric keys (if there was an option to turn them off) and use a > single private key for all operations. I'd start with this as the > basic model for rsyncrypto, while continuing to support the more > complex approaches for those who desire the greater security. Well, it's too late to "start with this as the basic model". Rsyncrypto is approaching it's 16th release. If you can come up with a way that will not change the meaning of all existing invocations, I'll gladly consider it. >> I have now, and rejected the notion :-). Feel free to try and convince >> me based on the newly gained knowledge. > > > I gave it a shot. Did it work? Frankly, I don't think that the this use scenario is a priority to me. Like I said, if you come up with a command line syntax that is compatible with the current one and does what you want, and then be willing to implement it, I believe I'll put it in. Shachar |
From: Tom M. <tme...@vl...> - 2005-08-24 17:40:38
|
Shachar Shemesh wrote: > Tom Metro wrote: >>I would expect keeping around a copy of the encrypted files to be a >>win, and thus a popular approach among rsyncrypto users. > > Huh? If storage is cheap compared to CPU, wouldn't keeping around an > extra 60 bytes file be better than decrypting said file from within the > big file better? True. Although if you compared two scenarios: 1. keeping they keys, but throwing away the encrypted files after each run; and 2. keeping the encrypted files, but opting not to store the keys in external files; I think #2 would be a big win. In most cases, the prior version of the encrypted file will be left untouched. Occasionally, when a file has changed, you'll need to decrypt the meta data in order to produce a new encrypted file, but decrypting a few kilobytes of meta data from a known location should be reasonably quick, as CPU time for decompression/compression is proportional to the data quantity. >>...a user who is looking for a practical >>(rather than maximum) security environment may choose to ignore the >>symmetric keys (if there was an option to turn them off) and use a >>single private key for all operations. I'd start with this as the >>basic model for rsyncrypto, while continuing to support the more >>complex approaches for those who desire the greater security. > > Well, it's too late to "start with this as the basic model". Rsyncrypto > is approaching it's 16th release. If you can come up with a way that > will not change the meaning of all existing invocations, I'll gladly > consider it. I don't think I can, as what I'm proposing is to make an existing required parameter, which is interleaved among other parameters, optional. If you feel the "API" is fixed, the best I can hope for is to get your approval for an option that would make storing symmetric keys optional. Which I could then combine with a wrapper script to present users with the most basic interface. I think presenting the simplest UI is important, so the solution I'd rather see is making the built-in argument syntax follow the simplest model, and if necessary, use wrapper scripts to maintain backwards compatibility with existing lingnu.com backup users. But obviously it's your call. > Bear in mind that rsyncrypto was designed to be a tool in the Lingnu > online backup service (http://www.lingnu.com/backup.html). It is > therefor that scenario that takes outmost precedence in allocating my > personal time. Understood. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Shachar S. <rsy...@sh...> - 2005-08-26 08:25:41
|
Tom Metro wrote: > Although if you compared two scenarios: > > 1. keeping they keys, but throwing away the encrypted files after each > run; > > and > > 2. keeping the encrypted files, but opting not to store the keys in > external files; > > I think #2 would be a big win. In most cases, the prior version of the > encrypted file will be left untouched. Occasionally, when a file has > changed, you'll need to decrypt the meta data in order to produce a > new encrypted file, but decrypting a few kilobytes of meta data from a > known location should be reasonably quick, as CPU time for > decompression/compression is proportional to the data quantity. The keys directory will typically be a fraction the size of the encrypted files. How can you possibly compare the two? > I don't think I can, as what I'm proposing is to make an existing > required parameter, which is interleaved among other parameters, > optional. That was my thought too. > If you feel the "API" is fixed, the best I can hope for is to get your > approval for an option that would make storing symmetric keys > optional. Which I could then combine with a wrapper script to present > users with the most basic interface. Sure. I see no problem with that. Send a patch and I'm sure we'll find a way to put it in. Shachar |
From: Tom M. <tme...@vl...> - 2005-08-26 15:45:56
|
Shachar Shemesh wrote: > Tom Metro wrote: >>Although if you compared two scenarios: >> >>1. keeping they keys, but throwing away the encrypted files after each >>run; >> >>and >> >>2. keeping the encrypted files, but opting not to store the keys in >>external files; >> >>I think #2 would be a big win. In most cases, the prior version of the >>encrypted file will be left untouched. Occasionally, when a file has >>changed, you'll need to decrypt the meta data in order to produce a >>new encrypted file, but decrypting a few kilobytes of meta data from a >>known location should be reasonably quick, as CPU time for >>decompression/compression is proportional to the data quantity. > > The keys directory will typically be a fraction the size of the > encrypted files. How can you possibly compare the two? This was a continuation of a thread in which one of the opening assumptions was that CPU was a more limited resource than disk storage. The "big win" I refer to above is with respect to CPU usage. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Tom M. <tme...@vl...> - 2005-08-23 20:52:50
|
Shachar Shemesh wrote: > If the third argument is a name of a non-existing file, then > rsyncrypto assumes that this is the first time such a file is > encrypted, and will create the symmetric key file. This is so that > future encryption will be able to use the same encryption parameters, > and thus create a file that is similar. > [...] > Now let's talk about decryption. One is "warm restore", where you > still have the symmetric key file for the file you are trying to > decrypt. If that's the case, you still need to provide the (RSA > assymetric) public key. This is mostly for the technical reason that > rsyncrypto needs to know how big the encrypted symmetric key is in the > file, and this depends on the size of the public key. We could save it > in the unencrypted key, but as you should never ever lose the private > key, much less the public key, this did not seem like a big enough > trouble to justify the extra work. Also, this gives rsyncrypto a > consistant command line across all invocations. Your comments above, as well as this: --fk, --fr If command line, or a version with different defaults, dictate different values for the --roll-* options or the -b option, these will only affect files for which keyfile does not yet exist. specifying the --fk or --fr will recreate keyfile if it has values different than those in the previous key file. from the man page, plus your comments about storing encrypted file names, suggests that there is a need for richer meta data storage in the destination file. It also suggests that the symmetric key files really want to be more than just keys, but instead a collection of meta data needed to reproduce an identical encrypted file (if the source data doesn't change). Have you looked at any existing schemes for storing file meta data, such as zip or gzip file headers? There may be value in co-opting one of those. Another issue to consider is how much, if any, of the meta data should be encrypted (when part of the destination file)? Even though requiring the private key in order to access it may be inconvenient, probably makes sense to encrypt everything. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Shachar S. <rsy...@sh...> - 2005-08-24 11:00:38
|
Tom Metro wrote: > Your comments above, as well as this: > > --fk, --fr ... > from the man page, plus your comments about storing encrypted file > names, suggests that there is a need for richer meta data storage in > the destination file. Yes and no. There is a need for richer metadata storage, and I'm working on such a thing right now. I don't see how that stems fromt he sources you point out, though. > It also suggests that the symmetric key files really want to be more > than just keys, but instead a collection of meta data needed to > reproduce an identical encrypted file (if the source data doesn't > change). The symmetric key is all the information about the encryption procedure (keys, parameters, etc.). The file itself is the data. I'm adding a third location where file metadata is stored (file names, modification dates, permissions etc.). I don't think it's necessary to break backwards compatibility in order to store it inside the actual file. > Have you looked at any existing schemes for storing file meta data, > such as zip or gzip file headers? There may be value in co-opting one > of those. You obviously subscribed to the list after I put up the link to http://cvs.sourceforge.net/viewcvs.py/rsyncrypto/rsyncrypto/docs/filelist.txt?view=markup. That's the format I'll be storing the metadata. > Another issue to consider is how much, if any, of the meta data should > be encrypted (when part of the destination file)? Even though > requiring the private key in order to access it may be inconvenient, > probably makes sense to encrypt everything. That's why I need to store the filenames in a seperate file. In 0.16 the encrypted directory will be a series of files of equal length names composed of a series of random characters (well, a base64 encoding of a few random bytes, if you want the specifics). The translation will be stored in a file called "filelist", which will be itself encrypted using the usual rsyncrypto mechanisms. The specifics of said file are detailed in the link above. > -Tom Shachar |
From: Tom M. <tme...@vl...> - 2005-08-24 18:51:25
|
Shachar Shemesh wrote: > The symmetric key is all the information about the encryption procedure > (keys, parameters, etc.). Ah, so it already is a block of meta data. Would you mind if we came up with a new name for this file in the documentation? > I'm adding a > third location where file metadata is stored (file names, modification > dates, permissions etc.). I don't think it's necessary to break > backwards compatibility in order to store it inside the actual file. > > http://cvs.sourceforge.net/viewcvs.py/rsyncrypto/rsyncrypto/docs/filelist.txt?view=markup. If the -m (--meta-encrypt) option is given, the file names, as well as other meta-data about the files, is garbled. ... The real information about all of tha above is stored inside a special file, called "filelist". Aren't you concerned that loss or corruption of 'filelist' could render an entire collection of files as near useless? Why choose a single file model for this data, when you choose multi file model for the symmetric keys? You said above that the symmetric key files really contain more than the actual key, so why not extend it to include this additional meta data? I would think it would be worth breaking backwards compatibility for the vast benefits of having the block of meta data stored inside the file be identical to the block stored externally (with the exception that one is encrypted, of course). Consider that you can then use the same chunk of code to process the meta data, regardless of where it was stored. And that you can ditch all the special case code you'll have to add for dealing with 'filelist'. And 'filelist', being a "sequence of 'chunks'," is essentially a database, which is bound to require even more code to manage, as well as introduce potential memory issues when dealing with huge file sets. >>Another issue to consider is how much, if any, of the meta data should >>be encrypted (when part of the destination file)? Even though >>requiring the private key in order to access it may be inconvenient, >>probably makes sense to encrypt everything. > > That's why I need to store the filenames in a seperate file. I don't follow why that requires either an external file or a separate file. Yes, an external file is necessary to avoid needing the private key on decryption, but you've already got an external meta data file. (And if the user doesn't have the external meta data file on hand, then they need the private key anyway.) quoting more from: http://cvs.sourceforge.net/viewcvs.py/rsyncrypto/rsyncrypto/docs/filelist.txt?view=markup Also, in order to keep parsing of filelist simple, it is in binary format. It would add to the prerequisites, but might have been less work to link in an XML parser. (One of the ideas behind XML is to write a decent parser once, and not have to reinvent one for every project.) Otherwise the data structure seems decent. A magic number, which would permit locating the file or meta data chunk in the event of corruption. Variable number of blocks, and variable size blocks. And the concept that unknown block types should be ignored, helping to maintain backwards compatibility. A writer must always issue all mandatory blocks for the file version generated by it (as determined by the magic number at the start of the file). You might want to make the magic number fixed and have the version be a separate attribute. Other programs/tools might want to be able to recognize the magic number, but only your program needs to be able to interpret the contents. All strings are NULL terminated. Seems redundant if you're storing sizes, unless you plan to pack multiple strings into a single block. All blocks start on a file offset that is 4 bytes aligned. If a natural block size is not a multiple of 4, writers must pad the block with zero (null) bytes. The block length must include the padding, and must divide by 4. What's the benefit of this? A bit of a performance boost once the structure is put into word-aligned memory? What about a block and/or chunk checksum? == Block FFFF - End of Chunk == Writers must place this block at the end of each chunk. Readers should assume that any data after this chunk is the begining of the next chunk. I'm not sure that serves a purpose. If the file is not corrupted, then the chunk header tells you when you are done, and if the file is corrupted, FFFF probably isn't adequately unique to assist in reconstruction. If you stick with the idea of a single 'filelist' file, you might also want to use a magic number to mark the start of each chunk. == Block 0000 - Platform == == Block 0001 - Original File Name == == Block 0002 - Encoded File Name == == Block 0003 - Posix File Permission == What about an MD5 or SHA digest of the file, or is that stored elsewhere? What about the original file size, which could be utilized by -c? As I've implied above, I think this information, the original file size, time stamps, a digest of the file, and the AES key, should just be elements of a larger meta data chunk, which is stored both in the encrypted file (a separate chunk encrypted with the RSA key) and optionally also stored in an external file. In your document you might also want to address that you aren't scrambling the files' time stamps, which theoretically is a leak of information, but a necessity in order for rsync to operate. >> Have you looked at any existing schemes for storing file meta data, >> such as zip or gzip file headers? There may be value in co-opting one >> of those. > > You obviously subscribed to the list after I put up the link to > [document above]... Correct, I hadn't seen it before. But that doesn't answer the question (unless you are mimicking one of those - it's been a long time since I looked at Zip headers). Originally I was thinking one of those projects would be a good reference just for the mechanics of storing meta data headers (rather than reinventing the wheel), but they could also be a valuable source of multi-platform code for getting/setting file attributes, and a reference for what is considered important to preserve. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Shachar S. <rsy...@sh...> - 2005-08-26 08:46:31
|
Tom Metro wrote: > Shachar Shemesh wrote: > >> The symmetric key is all the information about the encryption procedure >> (keys, parameters, etc.). > > > Ah, so it already is a block of meta data. > > Would you mind if we came up with a new name for this file in the > documentation? Tom, with all due respect, I think it is time someone put their money where their mouth is. If you want to discuss documentation changes, do so with a patch against CVS. > Aren't you concerned that loss or corruption of 'filelist' could > render an entire collection of files as near useless? No, not really. There is an unencrypted version, as well as an encrypted version. The encrypted version is accessible through the same private key that unlocks the rest of the encrypted files. > Why choose a single file model for this data, when you choose multi > file model for the symmetric keys? Because, unlike the symmetric keys, this file has to be at a known location, and cannot be done without. Putting the data currently in the symmetric keys into the single file is an option to consider, but I'm not sure I have managed to wrap my mind around all implications of doing so yet. Another reason just came to mind. You need the information in filelist in order to find which is the file you refer to. This means that if you wanted to store this information in seperate files, you will need to read each and every one of them anyways. > You said above that the symmetric key files really contain more than > the actual key, so why not extend it to include this additional meta > data? Because the key file contains data about the encryption, while filelist contains data about the unencrypted file. It's just not the same thing. > I would think it would be worth breaking backwards compatibility for > the vast benefits of having the block of meta data stored inside the > file be identical to the block stored externally (with the exception > that one is encrypted, of course). See, that's the whole point, though, isn't it? If information crucial for finding which file is which is of a different level of importance than information about a specific file. > Consider that you can then use the same chunk of code to process the > meta data, regardless of where it was stored. And that you can ditch > all the special case code you'll have to add for dealing with > 'filelist'. And 'filelist', being a "sequence of 'chunks'," is > essentially a database, which is bound to require even more code to > manage, as well as introduce potential memory issues when dealing with > huge file sets. I guess I'll give you the same answer I gave you in my previous email. I think you are suggesting unimplementable solutions here, but feel free to prove me wrong by sending in patches against CVS. Grabbing the latest CVS according to the instructions in the site will get you the most up to date version I have, almost always. > I don't follow why that requires either an external file or a separate > file. Yes, an external file is necessary to avoid needing the private > key on decryption, but you've already got an external meta data file. > (And if the user doesn't have the external meta data file on hand, > then they need the private key anyway.) If a file called "/etc/passwd" is stored in the encrypted archive as "As9sm23irmsk", and the only way to correlate the later name to the former is through a correlation data, how on earth do you propose to store this correlation data inside "As9sm23irmsk", and encrypted at that. It means that if you ask to decrypt "/etc/passwd", rsyncrypto has to go over all the files in the archive, decrypting the private key header of each, and trying to locate the right one. This is even before I start talking about limited RSA block sizes and other technical problems with encrypting arbitrary length data using an assymetric cypher. > It would add to the prerequisites, but might have been less work to > link in an XML parser. (One of the ideas behind XML is to write a > decent parser once, and not have to reinvent one for every project.) Could be. Too late for 1.0.0.16, but maybe in the future. See, the parser is already writter :-). > Otherwise the data structure seems decent. A magic number, which would > permit locating the file or meta data chunk in the event of > corruption. Variable number of blocks, and variable size blocks. And > the concept that unknown block types should be ignored, helping to > maintain backwards compatibility. Actually, the magic number only serves to identify the file in case we need to change the basic structure in the future (say, moving the file over to XML format), while maintaining backwards compatibility. In any case, thanks for giving me marks. > A writer must always issue all mandatory blocks for the file version > generated by it (as determined by the magic number at the start of the > file). > > You might want to make the magic number fixed and have the version be > a separate attribute. Other programs/tools might want to be able to > recognize the magic number, but only your program needs to be able to > interpret the contents. It's easier to just switch magics if something fundemental needs to be changed. This also saves the trouble of trying to figure out how to handle version 5 with magic 2 etc. > All strings are NULL terminated. > > Seems redundant if you're storing sizes, I wasn't aware that I was storing sizes. Not of strings, in any case. > unless you plan to pack multiple strings into a single block. Could be necessary in the future, yes. > All blocks start on a file offset that is 4 bytes aligned. If a > natural block size is not a multiple of 4, writers must pad the block > with zero (null) bytes. The block length must include the padding, and > must divide by 4. > > What's the benefit of this? A bit of a performance boost once the > structure is put into word-aligned memory? Exactly. > What about a block and/or chunk checksum? What about them? What good is a checksum if there is nothing you can do in case it's wrong? > == Block FFFF - End of Chunk == > > Writers must place this block at the end of each chunk. Readers should > assume that any data after this chunk is the begining of the next > chunk. > > I'm not sure that serves a purpose. If the file is not corrupted, then > the chunk header tells you when you are done, and if the file is > corrupted, FFFF probably isn't adequately unique to assist in > reconstruction. I would be delighted to hear in what way the chunk header provides this information. > If you stick with the idea of a single 'filelist' file, you might also > want to use a magic number to mark the start of each chunk. Why? > == Block 0000 - Platform == > == Block 0001 - Original File Name == > == Block 0002 - Encoded File Name == > == Block 0003 - Posix File Permission == > > What about an MD5 or SHA digest of the file, or is that stored > elsewhere? What about the original file size, which could be utilized > by -c? Good ideas. I'll be expecting the patch by end of next week, which is when 0.16 must, come rain or high water, be released. > In your document you might also want to address that you aren't > scrambling the files' time stamps, which theoretically is a leak of > information, but a necessity in order for rsync to operate. The document documents the filelist file format. Shachar |
From: Tom M. <tme...@vl...> - 2005-08-26 15:55:02
|
Shachar Shemesh wrote: > Tom Metro wrote: >>Would you mind if we came up with a new name for this file in the >>documentation? > > Tom, with all due respect, I think it is time someone put their money > where their mouth is. If you want to discuss documentation changes, do > so with a patch against CVS. I find it more respectful to discuss before patching, rather than patching first and asking for forgiveness later. :-) It also give other people an opportunity to voice an opinion. And from a selfish perspective, it doesn't make sense to spend a lot of time developing a new document if it is going to be rejected. The purpose of these discussions are both to improve my understanding of the project enough such that I can contribute with code or documentation patches, as well as to get a feel for what is acceptable, so I'm not pointlessly spinning my wheels. What I'd like to do as a next step is post some proposed changes to the man page. If they sound reasonable, I convert them into a proper patch against the CVS trunk. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Tom M. <tme...@vl...> - 2005-08-26 19:22:03
|
Shachar Shemesh wrote: >Tom Metro wrote: >>Aren't you concerned that loss or corruption of 'filelist' could >>render an entire collection of files as near useless? > > No, not really. There is an unencrypted version, as well as an encrypted > version. There are two copies of 'filelist'? I guess I missed that in your write up. Though it doesn't change the situation much. So one copy goes into the root of the destination directory, and gets encrypted, and the other copy goes into the keys directory and is left as plain text? >>Why choose a single file model for this data, when you choose multi >>file model for the symmetric keys? > > Because, unlike the symmetric keys, this file has to be at a known > location, and cannot be done without. [...] > You need the information in filelist > in order to find which is the file you refer to. This means that if you > wanted to store this information in seperate files, you will need to > read each and every one of them anyways. [...] > If a file called "/etc/passwd" is stored in the encrypted archive as > "As9sm23irmsk", and the only way to correlate the later name to the > former is through a correlation data... Sounds like a pretty good point, though consider the usage scenarios. I started with the assumption that the "meta files" would retain the original file name. So to extract "/etc/passwd" the program would simply read in "keys/etc/passwd", and get the translation to "dest/a97a66d03c4a/As9sm23irmsk." (I presume you're scrambling directory names rather than mapping to a flat hierarchy.) Are there any scenarios in which the program would be given the encrypted file name, and then need to locate the meta file? If you're doing a batch operation (decrypting all files), you could avoid the issue by iterating over the meta files instead of the encrypted files. Or if such an operation is rare, you simply accept the overhead and extract the meta data from the encrypted file's header (if it gets stored there). Another trick that would make navigation to meta data stored in individual files faster would be to create a parallel hierarchy using hard links (which are supported on both UNIX and NTFS). Then "keys/a97a66d03c4a/As9sm23irmsk" resolves to the same file as "keys/etc/passwd." Though I'm not convinced that this is at all necessary. As for the idea of storing the encrypted file name translation in the encrypted file's header... > ...how on earth do you propose to > store this correlation data inside "As9sm23irmsk", and encrypted at > that. It means that if you ask to decrypt "/etc/passwd", rsyncrypto has > to go over all the files in the archive, decrypting the private key > header of each, and trying to locate the right one. Yes, if you don't have the external "meta files" on hand. I consider the meta files to be like a cache. They're nice to have to speed things up, but if you don't have them, and it necessitates decrypting all files (or their headers) in a set to find a specific file, that seems like a reasonable price. Again, I'd consider usage scenarios. If the intended purpose of rsyncrypto is the storage of backup files, extraction will be a rare operation, and it is acceptable for it to be slow. Any user who recovers their files after a loss of the originals is going to be a happy user, and isn't going to mind that the process might be an order of magnitude slower than a simple copy of unencrypted files. >>Consider that you can then use the same chunk of code to process the >>meta data, regardless of where it was stored. > > I think you are suggesting unimplementable solutions here... Does the above clarification resolve that concern? > Putting the data currently in the symmetric keys into the single file is > an option to consider... Interesting thought, but not an approach I'm voting for. >>You said above that the symmetric key files really contain more than >>the actual key, so why not extend it to include this additional meta >>data? > > Because the key file contains data about the encryption, while filelist > contains data about the unencrypted file. It's just not the same thing. Yet both are needed in order to recover the original (with the exception that if the 'filelist' files are lost, you're hosed). Consider it from an operational standpoint: on initial encryption, you're writing stuff to both the symmetric key file and filelist describing how the file was packaged, and on decryption you are consulting both of those files to determine how to recover the original. Practically speaking, there is little differentiating the two sources of information, except that a lost symmetric key is recoverable. > This is even before I start talking about limited RSA block sizes and > other technical problems with encrypting arbitrary length data using an > assymetric cypher. That's a good point. But given the choice between storing the complete meta data only in external files, or taking on the extra overhead of storing the meta data in an additional AES encrypted chunk as part of the encrypted file's structure, I'd take the latter. >> All strings are NULL terminated. >> >>Seems redundant if you're storing sizes, > > I wasn't aware that I was storing sizes. Not of strings, in any case. If a block is defined like: == Block 0002 - Encoded File Name == 2 bytes : block length 2 bytes : block type, always 0002 string : The name of the file (ASCII) unless you add additional variable length elements to that block, you've effectively defined the length of the string. >>unless you plan to pack multiple strings into a single block. > > Could be necessary in the future, yes. Right, so best to leave it as you have it. >>What about a block and/or chunk checksum? > > What good is a checksum if there is nothing you can do > in case it's wrong? You don't blindly create scrambled output when trying to restore a file. You can notify the user that there is a problem. You can write a recovery tool that cleans a 'filelist' by throwing out corrupted chunks, allowing at least partial recovery of the file set. >>I'm not sure [and end block] serves a purpose. If the file is not corrupted, >>then the chunk header tells you when you are done... > > I would be delighted to hear in what way the chunk header provides this > information. You define the chunk as: == Chunk Format == Each chunk is composed of a series of specific data. The first two bytes are the number of data blocks in this chunk. If I know there are N blocks in a chunk, I know I'm done processing a chunk after I've seen N blocks. No need for an end block marker. Same deal as knowing the size of a sting vs. null terminated. >> ...and if the file is corrupted, FFFF probably isn't adequately unique >> to assist in reconstruction. > >>If you stick with the idea of a single 'filelist' file, you might also >>want to use a magic number to mark the start of each chunk. > > Why? If the start of a block has a unique identifier, you can write a recovery tool that can re-synch after scanning past one or more corrupt chunks that don't have the expect block count or block size. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Tom M. <tme...@vl...> - 2005-08-23 18:19:01
|
Shachar Shemesh wrote: >>>The rsyncrypto manual points you to the req(1) and x509(1) manual >>>pages of openssl. >> >>Manual? Ah, I see it now in the source package. Apparently it didn't >>make it into the Win32 package. > > It's not rsyncrypto's job to provide the man pages for openssl. Both > req(1) and x509(1) are openssl commands. I was referring to the rsyncrypto man page, which isn't in the Windows distribution. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Shachar S. <rsy...@sh...> - 2005-08-24 11:01:15
|
Tom Metro wrote: > I was referring to the rsyncrypto man page, which isn't in the Windows > distribution. > > -Tom I don't know of a tool that will easilly let me package it in a form easilly readable on Windows. Suggestions welcome. Shachar |
From: Tom M. <tme...@vl...> - 2005-08-24 17:04:21
|
> I don't know of a tool that will easilly let me package it in a form > easilly readable on Windows. Suggestions welcome. Simply: % nroff -man rsyncrypto.man > rsyncrypto.txt plus appropriate line ending conversion, will do it. Or wrap minimal HTML headers and PRE tags around it and give it an html extension. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |
From: Shachar S. <rsy...@sh...> - 2005-08-27 14:00:56
|
Tom Metro wrote: >> I don't know of a tool that will easilly let me package it in a form >> easilly readable on Windows. Suggestions welcome. > > > Simply: > > % nroff -man rsyncrypto.man > rsyncrypto.txt > > plus appropriate line ending conversion, will do it. No, it won't, but thanks anyways. I think I found a good way of producing the right files. Shachar |