Re: [Dar-libdar_api] dar encryption - RSA

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Denis,

maybe I was not that exactly about what I want to hash and how the
dictionary is organized:
* I don't want to hash the content of the file
* the dictionary is not organized hierarchical as your catalogue is
* when I'm talking about filename I mean path + file name
* the dictionary does not replace the catalogue, it is just an extra option
It should look like this:
{ 
    H("/home/tobias/Documents/test.txt" + inodeID + mtime + UUID + salt) : 
        [userID, groupID, perm, file_size, is_dir, type, flags, ctime] ,

    H("/home/tobias/Pictures/foo.jpg" + inodeID + mtime + UUID + salt) : 
        [userID, groupID, perm, file_size, is_dir, type, flags, ctime] ,
    ...
}
(H() is a cryptographic hash function like sha256)

respectively:
{
    b2144d23ebc9a7f2af44e215b00dce5025bdc227346c6459b989ef8d203f3402 :
        [userID, groupID, perm, file_size, is_dir, type, flags, ctime] ,

    0df9ba289c76d5bb1761a2764593bfe97d64f4c944ecfa08d6f7a16721b5f317 :
        [userID, groupID, perm, file_size, is_dir, type, flags, ctime] ,
}

In this scenario the only possibility for a collision to occur is inside
the hash function, which is very unlikely to happen:
http://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-possibility-of-sha-collisions-in-practice/4014407#4014407
=> In my opinion the possibility of a hash collision can be ignored.

> In fact, adding system/hardware ID in the hash forbids the possibility
> to restore the whole data (most probably on a new filesystem, due to a
> crash for example), and keep using the latest backup of reference as
> reference for the next incremental backup.
Yes, that's right. But this is not only because of the uuid it's also
because I want to use the inode number, which will be different after
the restore also. In this case the user has to enter the encryption
password to use the encrypted catalogue as reference or a full backup
will be created. I think this restriction is acceptable.

Of course I can use the same password for all backups of one system and
requesting the user only once to enter it (this can be done without
modifying dar, just by using libdar) but that's not the point.

I admit the dictionary is not that easy to implement and it will require
changes on the archive format as well but I think it can be quite handy
for a lot of users who want to encrypt there backups.

Regards,
Tobias

Am Sonntag, den 05.10.2014, 12:38 +0200 schrieb Denis Corbin:
> On 01/10/2014 18:04, Tobias Specht wrote:
> > Hi Denis,
> 
> Hi Tobias,
> 
> > 
> > I like dar with its philosophy and I want to create a program
> > using libdar to implement some kind of intelligence. Of course
> > there will be some options but it should be enough to just define a
> > "Backup Drive" and backups will be created automatically every day
> > the user powers on the computer. And encryption should be at least
> > a strongly recommended option.
> > 
> > This leads me to the catalogue problem. I agree with you that dar
> > does not need that kind of intelligence I'm planing for my backup
> > tool, but as the catalogue and the process of creating a
> > referential backup is a elementary feature of dar, I think this
> > problem could be better solved within dar.
> > 
> > In the catalogue there is stored: * inodeID * filename (with its
> > path) * file permissions * userID * groupID * file size * last
> > modification date (mtime) * last change date (ctime) * if the file
> > is a directory (is_dir) * if the file has children or is an empty
> > dir * file type * flag about saved data / saved EA / compression
> > used (correct me if I'm wrong)
> 
> more or less yes, but that's a matter of details,
> 
> > I think the most private information that has to be protected is
> > the filename.
> 
> A agree with that.
> 
> > My idea was to create a second "hashed" catalogue which contains
> > only the necessary information to create a referential backup out
> > of it. It is structured like a dictionary with a hash representing
> >  the filename on the one side and with some information about the
> > file on the other side. This dictionary can be stored outside the
> > encrypted area of the archive because it doesn't contain any
> > private information. Yes, it is more data to be stored and in
> > general it contains only redundancy information, but with this we
> > can create a referential backup from an encrypted archive without
> > entering the encryption password which will lead to more usability.
> >  (And du you know any backup tool providing such a feature?)
> > 
> > The first idea was just to hash the filename: { H(filename) :
> > [inodeID, userID, groupID, perm, file_size, ctime, mtime, is_dir,
> > type, flags] , ... } This is quite simple to implement, but it's
> > not very resistant against brute-force attacks.
> 
> For this first idea, there is already a point to consider put aside
> the brute-force attack. If two different files in the same directory
> with different filenames provide the same hash, there is a conflict.
> While this should not occur very often but it is not impossible.
> 
> Same problem during the differential backup process, dar checks
> whether each file found on the filesystem does not already exist in
> the reference catalogue. Here, in order to compare, dar  has to create
> a hash for each filename read from the filesystem and compare that
> with the list of hash available in reference catalogue. But a false
> match may occur, if a new files have the same hash as an old one. Most
> of the time, dar will save that new file as expected if mtime or other
> attribute changed comparing with wrong reference, but in some rare
> cases it may fail to be saved that new file assuming it has not
> changed comparing it to a wrong reference.
> 
> So, even if chance are little that this situation occur, they is not
> impossible. How to cope with that? Have we to inform the user that
> there is a risk that the backup is not perfect, but not to worry this
> is occurs in very rare situation? Would you find that acceptable as
> user? :)
> 
> > So I thought about to slip some information, that is available on
> > the point of creating the referential backup and also when creating
> > the new backup but isn't known by an attacker who has only access
> > to the archive, in the calculation of the hash value: { H(filename
> > + inodeID + mtime) : [userID, groupID, perm, file_size, is_dir,
> > type, flags, ctime] , ... } I had a look at your source code
> > (filtre.cpp/filtre_sauvegarde) and as far as I have understand, you
> > first try to find the file based on it's path in the ref backup.
> > When there is a match you perform some optional security checks and
> > afterwards you decide what information to store in the backup: *
> > remove_ea * saving_inode * saving_ea * saving_fsa If there is no
> > match you have to store the whole file.
> > 
> > When using the hash value the first part leads to a slightly
> > different result as it doesn't consists only of the filename but
> > also of the inodeID and mtime. But this shouldn't be a problem as
> > the inodeID changes only when mtime changes too.
> 
> right, that's better, only comparing the hash will let dar know
> whether a file has to be saved again or not (put aside the hash
> conflict mentioned above).
> 
> > And in this case the whole file would be saved anyway.
> 
> Right.
> 
> > As the security check is implemented now it should also be fine
> > with the hash, because it relies on having the same mtime in both
> > archives. The evaluation of what action to perform when mtime
> > hasn't changed should be applicable with the information stored in
> > the dictionary.
> 
> OK, this let dar see if only EA/FSA have changed and resave this part
> only if necessary.
> 
> > 
> > In addition we should add some sort of UUID which is connected to
> > the system in such a way that it doesn't change on normal system
> > operation. I thought about the partition UUID but this is not
> > always that simple
> 
> In fact, adding system/hardware ID in the hash forbids the possibility
> to restore the whole data (most probably on a new filesystem, due to a
> crash for example), and keep using the latest backup of reference as
> reference for the next incremental backup.
> 
> > when we think about LVM and btrfs, but maybe there is something
> > else we can use. To break rainbow-table attacks we should also add
> > a random salt per archive: { H(filename + inodeID + mtime + UUID +
> > salt) : [userID, groupID, perm, file_size, is_dir, type, flags,
> > ctime] , ... } Originally I also wanted to slip the file_size in
> > the hash value, but this conflicts with the security check and the
> > sparse_file_detection.
> > 
> > The dictionary could be saved for example in a Berkeley DB which
> > could be stored some where in the archive. As hash function I would
> > suggest Keccak with 512 bit and 100 rounds.
> > 
> > What du you think about the idea of having a second
> > (hashed)catalogue?
> 
> That's an interesting approach. however it is not that simple to
> implement. However there is the point about hash collision to address.
> 
> I thought about another way, to do encrypted differential backups that
> has the same footprint as doing a full backup for the user point of
> view: using the same key (symmetrical or asymmetrical) for the archive
> of reference and the new differential backup, without having dar
> asking twice for the password as it does for full backup.
> 
> Given the encryption key, dar tries to open the encrypted isolated
> catalogue, if it succeeds, it assumes the user gave the key without
> typo error and use that key to encrypt the new differential backup.
> 
> I guess that when you use symmetrical key, very few use different keys
> for each new differential archive, right? I also guess, when using
> asymmetrical encryption, this is always the same public/private key
> pair that is used, thus the same passphrase is requested to open
> private key (enciphering and signature).
> 
> Whould this address your need? This is much more easy to implement to
> my point of view.
> 
> > 
> > Regards, Tobias
> > 
> 
> Regards,
> Denis.

Re: [Dar-libdar_api] dar encryption - RSA

For full, incremental, compressed and encrypted backups or archives

Re: [Dar-libdar_api] dar encryption - RSA