From: Radosław K. <rad...@ko...> - 2011-01-05 10:37:48
|
2011/1/5 Kern Sibbald <ke...@si...> > > I hope that you mean by keeping hashes on Director you mean actually > > keeping them on both? > > We are considering every possibility -- each solution has its own > advantages > and disadvantages, so it is very hard to say that one way of doing this is > the correct or right way. > > For example, it is faster to dedupicate if the hashes are stored on the > client > machine than if they are stored on a server such as the Director, but not > every client machine has enough disk space to store them. Most estimates > indicate that about 30% more disk space is required to keep hash codes. In > addition, your deduplication ratio will very significantly drop (be very > poor) if you are only deduping a client machine and do not use a > deduplication "pool" of hashes from multiple machines. > > Unless you run tests, which may vary from machine to machine, it is very > difficult to know what algorithm is best. One major factor is that the > machine might be connected to a server by a very slow 100Mb Internet > connection or a fast 10Gb LAN. > > We will probably start with something very simple and add to it over time. > A question is: How we'll store a data block hashes? SQL database seems to be very easy to implement, but it has a lot of disadvantages. Another option is use one of opensource key/value database like Redis ( http://redis.io/). It has a very good performance. The last one is to implement our own solution. Requires a lot of work and tests (all about time). What do you think about it? -- Radosław Korzeniewski rad...@ko... |