Re: metadata
Brought to you by:
thesun
From: Tom M. <tme...@vl...> - 2005-08-24 18:51:25
|
Shachar Shemesh wrote: > The symmetric key is all the information about the encryption procedure > (keys, parameters, etc.). Ah, so it already is a block of meta data. Would you mind if we came up with a new name for this file in the documentation? > I'm adding a > third location where file metadata is stored (file names, modification > dates, permissions etc.). I don't think it's necessary to break > backwards compatibility in order to store it inside the actual file. > > http://cvs.sourceforge.net/viewcvs.py/rsyncrypto/rsyncrypto/docs/filelist.txt?view=markup. If the -m (--meta-encrypt) option is given, the file names, as well as other meta-data about the files, is garbled. ... The real information about all of tha above is stored inside a special file, called "filelist". Aren't you concerned that loss or corruption of 'filelist' could render an entire collection of files as near useless? Why choose a single file model for this data, when you choose multi file model for the symmetric keys? You said above that the symmetric key files really contain more than the actual key, so why not extend it to include this additional meta data? I would think it would be worth breaking backwards compatibility for the vast benefits of having the block of meta data stored inside the file be identical to the block stored externally (with the exception that one is encrypted, of course). Consider that you can then use the same chunk of code to process the meta data, regardless of where it was stored. And that you can ditch all the special case code you'll have to add for dealing with 'filelist'. And 'filelist', being a "sequence of 'chunks'," is essentially a database, which is bound to require even more code to manage, as well as introduce potential memory issues when dealing with huge file sets. >>Another issue to consider is how much, if any, of the meta data should >>be encrypted (when part of the destination file)? Even though >>requiring the private key in order to access it may be inconvenient, >>probably makes sense to encrypt everything. > > That's why I need to store the filenames in a seperate file. I don't follow why that requires either an external file or a separate file. Yes, an external file is necessary to avoid needing the private key on decryption, but you've already got an external meta data file. (And if the user doesn't have the external meta data file on hand, then they need the private key anyway.) quoting more from: http://cvs.sourceforge.net/viewcvs.py/rsyncrypto/rsyncrypto/docs/filelist.txt?view=markup Also, in order to keep parsing of filelist simple, it is in binary format. It would add to the prerequisites, but might have been less work to link in an XML parser. (One of the ideas behind XML is to write a decent parser once, and not have to reinvent one for every project.) Otherwise the data structure seems decent. A magic number, which would permit locating the file or meta data chunk in the event of corruption. Variable number of blocks, and variable size blocks. And the concept that unknown block types should be ignored, helping to maintain backwards compatibility. A writer must always issue all mandatory blocks for the file version generated by it (as determined by the magic number at the start of the file). You might want to make the magic number fixed and have the version be a separate attribute. Other programs/tools might want to be able to recognize the magic number, but only your program needs to be able to interpret the contents. All strings are NULL terminated. Seems redundant if you're storing sizes, unless you plan to pack multiple strings into a single block. All blocks start on a file offset that is 4 bytes aligned. If a natural block size is not a multiple of 4, writers must pad the block with zero (null) bytes. The block length must include the padding, and must divide by 4. What's the benefit of this? A bit of a performance boost once the structure is put into word-aligned memory? What about a block and/or chunk checksum? == Block FFFF - End of Chunk == Writers must place this block at the end of each chunk. Readers should assume that any data after this chunk is the begining of the next chunk. I'm not sure that serves a purpose. If the file is not corrupted, then the chunk header tells you when you are done, and if the file is corrupted, FFFF probably isn't adequately unique to assist in reconstruction. If you stick with the idea of a single 'filelist' file, you might also want to use a magic number to mark the start of each chunk. == Block 0000 - Platform == == Block 0001 - Original File Name == == Block 0002 - Encoded File Name == == Block 0003 - Posix File Permission == What about an MD5 or SHA digest of the file, or is that stored elsewhere? What about the original file size, which could be utilized by -c? As I've implied above, I think this information, the original file size, time stamps, a digest of the file, and the AES key, should just be elements of a larger meta data chunk, which is stored both in the encrypted file (a separate chunk encrypted with the RSA key) and optionally also stored in an external file. In your document you might also want to address that you aren't scrambling the files' time stamps, which theoretically is a leak of information, but a necessity in order for rsync to operate. >> Have you looked at any existing schemes for storing file meta data, >> such as zip or gzip file headers? There may be value in co-opting one >> of those. > > You obviously subscribed to the list after I put up the link to > [document above]... Correct, I hadn't seen it before. But that doesn't answer the question (unless you are mimicking one of those - it's been a long time since I looked at Zip headers). Originally I was thinking one of those projects would be a good reference just for the mechanics of storing meta data headers (rather than reinventing the wheel), but they could also be a valuable source of multi-platform code for getting/setting file attributes, and a reference for what is considered important to preserve. -Tom -- Tom Metro Venture Logic, Newton, MA, USA "Enterprise solutions through open source." Professional Profile: https://www.linkedin.com/e/fps/3452158/ |