dar-libdar_api Mailing List for DAR - Disk ARchive
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
(22) |
Jul
(14) |
Aug
|
Sep
(3) |
Oct
(3) |
Nov
(22) |
Dec
(3) |
2005 |
Jan
(3) |
Feb
|
Mar
(9) |
Apr
|
May
(1) |
Jun
|
Jul
(2) |
Aug
(5) |
Sep
(2) |
Oct
(1) |
Nov
|
Dec
(1) |
2006 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2007 |
Jan
|
Feb
(5) |
Mar
(3) |
Apr
(10) |
May
(12) |
Jun
(4) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(3) |
2008 |
Jan
|
Feb
|
Mar
(8) |
Apr
(9) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
|
2010 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(13) |
Nov
(1) |
Dec
|
2014 |
Jan
|
Feb
(1) |
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(6) |
Oct
(5) |
Nov
|
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(10) |
Sep
|
Oct
|
Nov
|
Dec
|
From: Denis C. <dar...@fr...> - 2018-08-18 13:52:43
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi, just to let you know that pre-release phase has started, http://dar.linux.free.fr/pre-release/ Cheers, Denis -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUBW3gknQgxsL0D2LGCAQgzkw/9FCuq9oZWqxxuEXbtL9bIgWyLL0ujuDDl MyKVSZzLqx2qyKtbBgJbsN+cjf5TytCDIRI9ZXh6wQxNK/+lhmv8A00+8teuIHIt IrIo79NMxmCKXFa1XoVxfxEK/awqv1E9LxYnGGMUgAvfYbRLkpizHATpX1JO2Zhg PTVuAE6TJnOaGwLDZO+HXQK1VeLFgMegPltSmsQKaTcmx0fnsIdjR+DVrR5L8O7b veZtFaZrtRa4BtU8XYb8+ACHeRaQZFWVFElC79rkpXJ3H2q0yoMSPOE7Tl+dutVQ 75bTXW7MNSVjhtgRiUfaXn8EkGhvJQ0bSPRPC65rc4HHxe8NyknzTj3ZfyI0j205 mW9aZzhCM7gIAMppzq9l8gUBT76Ae2mol7bCwmkS/FzqXG5rpKuN/rQuB6z0TqUU hm/LJU5NNZFjMTeG8kMIW9w7IGBY5UsEWj81SqHaw86MW22YvQdxO+sA1udBFGKu XYMvK3WJPFv3qMWEQLSgajzDBfIWba6KUEgHsg2UWBPCZ2VaAYRmiYXHwQ3Hn4br wNE5dPS5qHeTvkK5SaqD8giOTkFJxsgO4ev3A8OrFvq74z185PrfTpbIeKqZWVWp EyjQ4WogGqy3KcxBj72/v/Dt1qE+bi2q2LI5jGC6YExa6ET7UOaDbqDWnyMpgl8D CnAX/VRnV68= =SCri -----END PGP SIGNATURE----- |
From: Tobias S. <spe...@gm...> - 2018-08-10 14:07:07
|
Hi Denis, this sounds like a big step for the API! I will check it out and see where it affects Gdar. Thank you for your offer, I will come back to you if there is something not clear :) A python3 API would be awesome. Best regards, Tobias Am 10. August 2018 13:49:08 MESZ schrieb Denis Corbin <dar...@fr...>: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA256 > >On 10/08/2018 09:13, Tobias Specht wrote: >[...] >> >> Hi Denis :) > >Hi Tobias! :) > >> >> nice to hear you are improving the libdar API. Will have a look on >> it. > >sure, so far all new API info is available GIT master, the >doc/API_tutorial.html is up to date for example. I have removed all >the pure C calls, replaces as much as possible arguments with standard >types. In parallel, I also plan to make a C binding for those that do >not want exceptions and classes... also planning a python-3 binding a >bit like what Wesley Leggette did some years ago with python-2, but >due to the longly expected 2.6.0 release this bindings will come right >after this new major release. > >If you need any help to have gdar working with next to come release >2.6.0 tell me, I can provide support, of course! > >> >> Best regards, Tobias > >Best Regards, >Denis > > > >-----BEGIN PGP SIGNATURE----- >Version: GnuPG v2 > >iQIVAwUBW217tAgxsL0D2LGCAQh1+BAAkgKJeY+Sr6D1H9E8NO2F5Z5ksOnQe3TN >4s2PSVHRJuIAC1PU60Vl3RIcPtrUyO3XZuRLBdkLwRZlSZYtD69w20FODf7YfwB4 >OiSAnsZP6dMkMjuNgBh7hB1/chIEnkZSmxtlqFCOwO+f+Ki1Xz2/CAARHzd3NgPo >B0kUvBlVvNEFYFdyIh5uaMsbFbiUKkdR4QOZ/ROtdo0SfVoG+dRhcc7xu76aGnYw >ylzVpnZmB02krd6XVpgMp4mnQLzCmjvD8cLAbUfBR2mU52McvJmlcPLSaRhp34xI >XHai5LLAiSbykY3Sb64yb/7HQT/ZeHoYbsmMD1EeLEg0ohYN6GoPxGwWTrnDZ/H4 >jM0K5oZmVEG8goQn1g3+juHW+T0i2Yxd4mSkPiryUxoJFzj8DmSww/AzEppeX+B+ >OCk9vo6wegwXtHcv7nETX9pezMYN2HaFhLL/MzdOM3z4X2SZc5UEA0SZaI29IqUW >6tFDRsXnS0GQsZ4TgBNEc1vm1lDRXgYNpZuESbQNvkizjIUcZiHCk3Ri6ViY2n7M >9Q72o4hzroAx0YOF967FFdQj/8AXKzoMhWup/wCYiSsVCDBJjHtHcNXrvTGgEF2W >ZgBrwdRVJZ4ClVGjEgWlM996TKE2AmdfngHZrYujO0FAY/eRBpxYmOILC0by1CaK >g2e1qODzwrM= >=0SEA >-----END PGP SIGNATURE----- > >------------------------------------------------------------------------------ >Check out the vibrant tech community on one of the world's most >engaging tech sites, Slashdot.org! http://sdm.link/slashdot >_______________________________________________ >Dar-libdar_api mailing list >Dar...@li... >https://lists.sourceforge.net/lists/listinfo/dar-libdar_api |
From: Tobias S. <spe...@gm...> - 2018-08-10 13:50:59
|
Hi Dennis, learning by example is always the best. Learning and making life easier was also my motivation for writing Gdar. I haven't looked on dar_manager in detail yet, but a decent versioning of archived files is great. Would be nice to have a GUI there :) Hope Gdar worked for you. Haven't had much time in between to work on it. Yes, my backup solution would also be based on libdar. The inhibit shutdown topic is unfortunately not a simple one. A solution for all Desktop environments and Distributions would of course be preferable. But I also like a convenient user experience. I would be happy to have a nice fitting solution for KDE and Gnome, for all other environments there could be a backup solution. The thing with overwriting the close event, like Editors are doing it, works well. But it requires a program Window. As far as I know, this does not work with a background process, without having a permanent Window. A solution based on systemd would be nice, but the inhibit shutdown does only work if the shutdown is triggered directly via systemd. I could not get it working with the shutdown command, or if the shutdown is initiated via the desktop. Do you see my point? But I think this discussion is nothing special about the libdar API :) Best regards, Tobias Am 10. August 2018 12:33:21 MESZ schrieb Dennis Katsonis <de...@ne...>: >On 08/10/2018 05:13 PM, Tobias Specht wrote: >> Hi Dennis, >> >> nice to hear you are planing to write a backup application with >libdar. >> I'm using the libdar API with my small tool Gdar myself: >> https://github.com/peckto/gdar >> http://www.peckto.de/gdar/gdar (Webseite having currently some >linking >> problems...) >> It's GTK though. Gdar can only extract dar archives at the moment. >> Feel free to work with my code, or use it as an libdar example :) >> > > > >> I'm planing a larger tool with automated backup too. >> But I'm still in the planing phase... >> Trying to solve some general problems with backups in desktop >environments, >> like inhibit/delay shutdown: >> https://forum.kde.org/viewtopic.php?f=305&t=141575 >> If you have an idea on it, let me know. >> > >Thank you. Even though I've started, I'm still thinking it might be >better to work with an existing program than create yet another >program. > I'm a self-taught hobby programmer, and part of the motive is simply >having something maybe worthwhile to work on. It's actually >dar_manager >and the database which interests me more, and allowing the user to >easily see which versions of which files are in their backup, and >restore them. Restoration from backups is more often to recover a >small >number of accidentally deleted or overwritten files, than full system >restores, or perhaps to see a file as it was some time ago. > >I played around with GDar a couple of years ago or so. > >Is this tool you are working on going to be libdar based? > >As for the shutdown, I don't know how to do it under KDE, and I don't >think you can (with good reason). The answer provided in that thread, >where I presume the mainwindow overrides the closeEvent slot to at >least >give a warning is probably the best one. > >The problem is, if it were possible, it would only work under KDE. Run >the software under FVWM, or Fluxbox, or something else, and it won't >inhibit closure of the windowing system. This would lead to users >having incomplete backups, possibly without them knowing. It also >doesn't inhibit closure from another user logged in, or the root user. > >I would use systemd-inhibit. > >See the second answer here. >https://unix.stackexchange.com/questions/34489/how-to-disable-shutdown-so-that-an-important-process-cannot-be-interrupted#264745 > > >> About your initial problem, I'm using fedora and dar as an rpm >myself. >> Did you write the packet maintainer about the compiler flag? >> Maybe he can add it to the building instructions. >> > >Yes, I did make the suggestion. > >> Hi Denis :) >> >> nice to hear you are improving the libdar API. >> Will have a look on it. >> >> Best regards, >> Tobias >> >> Am Donnerstag, 9. August 2018, 22:29:11 CEST schrieb Denis Corbin: >>> On 09/08/2018 13:37, Dennis Katsonis wrote: >>>> Hello, >>> >>> Hello Dennis, >>> >>>> I am developing a front end for Dar which is intended not just to >>>> provide a graphical way of creating archive, but also provide >>>> basic backup management. The application will be written using the >>>> Qt toolkit and using libdar directly. >>> >>> nice! :) >>> >>> Be aware that next to come major release 2.6.0 brings some API >>> re-design to simplify the use (less libdar specify auxiliary types) >>> and added new features. though there will be the same API in the >>> specific 'libdar5' namespace and I will be available to help you >>> migrating to the API v6 upon request. >>> >>>> I note that the version of dar compiled for Fedora uses the >>>> unlimited integer size. The performance of dar on archives with >>>> large numbers of files is not satisfactory, and would unfortunately >>>> also mean that the graphical application would stall and delay. >>> >>> this is know limitation of the 'infinint' dar/libdar flavor >>> http://dar.linux.free.fr/doc/Limitations.html >>> >>>> The following command on an archive containing about 1 million >>>> files takes 10 minutes. >>>> >>>> $ time /usr/bin/dar -l root > /dev/null >>>> >>>> /usr/bin/dar -l root > /dev/null 615.30s user 1.81s system 107% >>>> cpu 9:31.64 total >>>> >>>> Memory usage peaks at 2124MB. >>>> >>>> >>>> This delay is seen when listing, when scanning a reference archive >>>> when creating a differential backup or when adding the archive to >>>> a dar_manager database. It also causes a delay when extracting a >>>> file, which kind of defeats the purpose of having random access to >>>> files. It would probably take as long to extract a file from a >>>> compressed tarball. >>>> >>>> It also means that dar cannot complete a backup of my root >>>> directory on my laptop with 2G of RAM. >>>> >>>> I compiled dar 2.5.16 with the --enable-mode=64 option, and the >>>> performance greatly increased. >>>> >>>> For the exact same archive, using 64 bit integers. >>>> >>>> $ time /usr/bin/dar -l root > /dev/null >>>> >>>> dar -l root > /dev/null 28.89s user 0.48s system 97% cpu 30.253 >>>> total >>>> >>>> A 20x speed increase. >>>> >>>> Memory usage peaked at 879MB, still high, but far better. >>>> dar_manager operations were faster, but still slow. >>>> >>>> For smaller archives, the difference is less noticable. It seems >>>> that dar operations increase exponentially in CPU time as the >>>> number of files increase. For smaller archives, the difference was >>>> less noticeable, but still there. >>> >>> the memory requirement is not exponential but proportional to the >>> number of file saved. The CPU requirement is rawly proportional to >the >>> volume of data to treat (CRC computation, compression, encryption, >>> ...). This is true for both 64 bits and infinint flavors, though the >>> infinint flavor does not rely on CPU integer operation where from >its >>> slowness. >>> >>>> The dar website seems to suggest that the cost of infinint is >>>> modest, but my testing indicates that for what would be a regular >>>> backup scenario, the cost is high. >>> >>> Where have you read that? This should be an error to be fixed. >>> >>>> Looking at the page listing the limitations, the limitations of 64 >>>> bit integers seems to far, far exceed what is required, and what >>>> technology today can support anyway, and likely what technology >>>> for many years to come can support. >>>> >>>> I suggest that inifint as an integer type should not be the >>>> default. >>> >>> ... that's to be considered, though there is warning at compilation >>> time when you compile using infinint... thus, if the one that >compile >>> does not even read that warning, he will neither read documentation, >>> limitations and will blindly complain for any problem he will meet, >>> such people drain a lot of time and are always unsatisfied at the >>> end... so that's usually a good thing for me they do not use dar, it >>> saves me time to do more interesting things than trying to justify >and >>> explain ... >>> >>>> It add in some cases unacceptable costs for no practical gain. >>>> While some distributors compile with 64 bit integers (MacOSX brew), >>>> other use the default (Fedora) which leads to a dar binary which >>>> people may consider broken or buggy. >>> >>> Well, that's correct... >>> >>>> My other question is that the API uses infinint for values >>>> internally. How does a libdar compiled with 64 bit integers impact >>>> what is returned from methods returning an infinint? >>> >>> infinint and 64 bits flavors only differ by the way the "infinint" >>> class is implemented. infinint is an alias (typedef if you prefer) >to >>> either "class real_infinint" or "class limitint" (32 or 64 bits >>> integer underneath). Both classes have the same interface with the >>> reste of libdar, only their implementation differ. >>> >>> There is still infinint class up to the API... in APIv6 I've pushed >>> away a lot of internal types (including infinint) using pimpl idiom >>> for some classes, but that was too complicated or it would have >>> impacted performances to do it for all API related classes... thus >the >>> API remains indirectly dependent on either real_infinint/limitint >>> class used in libdar. I other words, if you program has been >>> dynamically linked with libdar64 it won't be possible to have it >>> dynamically linked with libdar (relying on infinint), at least >today. >>> I have not done the test with APIv6 but I pretty sure it wont work. >>> >>>> I plan to possible use a linked-in libdar compiled with 64 bit >>>> integers to ensure good performance. Does infinint convert >>>> internall from a native 64 bit to an infinint type? >>> >>> Not exactly. both classes (limitint and real_infinint) do the same >>> thing in particular they store/read integers the same way into a dar >>> archive. So the resulting archive is the same. If an integer is too >>> large to be handled by class limitint, the class will detect >overflow >>> during arithmetic operation or while reading integer from an archive >>> and libdar will abort with an Elimitint exception. >>> >>> Historically libdar relied on real_infinint (class was named >infinint >>> at that time) but due to poor performances the limitint class has >been >>> created to be substitued to class infinint (now renamed as >>> real_infinint). Internally dar does not directly manipulate 64 bits >>> intergers, for dates, for sizes, for offset, for anything, unless >when >>> dealing with system calls and library calls, where the "infinint" >>> class has the ability to convert from and to those classical integer >>> types like size_t and the like. >>> >>> So if your plan is to statically link your program with libdar64 >there >>> is no issue, it will work flawlessly. >>> >>>> Thanks, Dennis >>> >>> Cheers, >>> Denis >>> >>> >> >> >> >> >> >------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Dar-libdar_api mailing list >> Dar...@li... >> https://lists.sourceforge.net/lists/listinfo/dar-libdar_api >> |
From: Denis C. <dar...@fr...> - 2018-08-10 12:09:19
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 10/08/2018 12:00, Dennis Katsonis wrote: > On 08/10/2018 06:29 AM, Denis Corbin wrote: >> On 09/08/2018 13:37, Dennis Katsonis wrote: >>> Hello, > >> Hello Dennis, > > >>> I am developing a front end for Dar which is intended not just >>> to provide a graphical way of creating archive, but also >>> provide basic backup management. The application will be >>> written using the Qt toolkit and using libdar directly. > >> nice! :) > > > I am still in two minds about it. It is intended to be > backup-focused, for those who like manual, simple backups, but > with more emphasis on making restoration and viewing and managing > the state of the backups easy and visible. I'm not sure whether > this should be the separate project I've started, or a contribution > to KDar which includes dar_database style management > functionality. I can't tell you what's the best direction here, but I just remember since Johnathan K. Burchill developed kdar (started in 2003) that there was some issues with the new KDE version some years later (I guess it was KDE 4)... issues that have not been solved AFAIK (correct me if I'm wrong). > > I should give my thanks to you for creating dar, as I was looking > for a replacement for dump/restore and it met all my needs. > Simple, easy differential backup, able to do ad-hoc backups, > backups in manageable file archives, encryption and can reliably > save ALL the files attributes easily. It is a software package > where you can tell the author has given a lot of thought to how it > might be used and what people might want to do, and accommodated > for that and documented it wel l. Thanks! > [...] > >> the memory requirement is not exponential but proportional to the >> number of file saved. The CPU requirement is rawly proportional >> to the volume of data to treat (CRC computation, compression, >> encryption, ...). This is true for both 64 bits and infinint >> flavors, though the infinint flavor does not rely on CPU integer >> operation where from its slowness. > > > > Thank you for the clarification. I didn't do the math over > archives of different sizes, and went by my initial impression. what I mentioned is theory. If system starts swapping the performance degrades even faster while archive memory requirement increases, of course... > >>> The dar website seems to suggest that the cost of infinint is >>> modest, but my testing indicates that for what would be a >>> regular backup scenario, the cost is high. > >> Where have you read that? This should be an error to be fixed. > > > In the FAQ about a 'slight' penalty. > > http://dar.linux.free.fr/doc/FAQ.html > > Under "What slice size can I use with dar?" it says "thanks to its > internal own integer type named "infinint" dar is able to handle > arbitrarily large integers. This has a slightly memory and CPU > penalty in regard to using native computer 32 or 64 bits integers, > but has the advantage to provide a long term implementation in > dar." I will fix that, this is not correct, thanks for feedback. [...] > >>> It add in some cases unacceptable costs for no practical gain. >>> While some distributors compile with 64 bit integers (MacOSX >>> brew), other use the default (Fedora) which leads to a dar >>> binary which people may consider broken or buggy. > >> Well, that's correct... > > > My first impression was a bug, and I was looking to migrate away > from using dar until I did some web searching and thought that > maybe the integer type was more significant than it might appear. > I was a little concerned it might turn people off using dar if they > find the program seems to hang for 10 minutes or so while restoring > a single file . I will probably follow your suggestion for release 2.6.0... > > I'm not sure under what circumstances anyone would reach the > stated limits though, unless I'm reading the website wrong and the > file size limits are not 18EB but smaller. you are reading correctly, limit can will be reached on any system if the archive size (even split in many smaller slices) reaches 18EB. I got positive feedback of several hundred TB archives, I think some other guys reached the petabyte for their need some years ago, so we are still far from the 18 exabyte... :-) [...] Cheers, Denis -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUBW22AZQgxsL0D2LGCAQip8A/+MaaD8ROdfv+jeCXlXisu+9cZws2v8IAz F5bwPejEx1KMAXhIskeVnN0VOkiTj68pENeO9i34NHKCS3gCnUi+FJMnY48D5c7i zReoRnIjFDIdZjNpresQyEFwEm91/+vucEA18WA38ZjEjHAi3kXUr2jpj42wGVby YOX88lFXw0/5kWpvcMUf9h4H0y5HhKnfB8BOoxB4cbKO2xI5BhOD6+Df6Cso1iY2 pMXdxz9guYkjVoVhVe60+O9sKu5I0F8+iWuWOTUWh8s4kvJITxkCzZxeuHxYmXF5 XxFYR7KgZ8rwOxabQi0cvA62J+Mvi/GHPwhwGTLuo+qRI8HUz2y+5XOIBl3cfrnx n6o5lFxl6X02k4AQoUOQ/isFUpNb7HEwm8ef2igLSLOiT9IKzJUALHe2IN1oj5G/ Ra7S0ixUViWCeHK1uuK2kk28AnJQY9Wn6eh1562j+ZZNmX3yVrDcQ44llQlHy+rr ysmKNKwvtVP9V62HpAFUie+XtFiu9ZVK0jqo9vLB+huWsoQc0KnQi6mz3G9rMaV4 g+cnfg+STAY74pQvhalcrdtusGH//c/9BJfjYEN9LG/VFSonIsWyqIrLRIkCOo3e 3+iBeyisDrp9R8C3Bya1m1TtCcB7fh0EMxdPVg6iWuLH/RwjiyRX9gP53LLwMouN 7jB+WVoFJCg= =H5f3 -----END PGP SIGNATURE----- |
From: Denis C. <dar...@fr...> - 2018-08-10 11:49:17
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 10/08/2018 09:13, Tobias Specht wrote: [...] > > Hi Denis :) Hi Tobias! :) > > nice to hear you are improving the libdar API. Will have a look on > it. sure, so far all new API info is available GIT master, the doc/API_tutorial.html is up to date for example. I have removed all the pure C calls, replaces as much as possible arguments with standard types. In parallel, I also plan to make a C binding for those that do not want exceptions and classes... also planning a python-3 binding a bit like what Wesley Leggette did some years ago with python-2, but due to the longly expected 2.6.0 release this bindings will come right after this new major release. If you need any help to have gdar working with next to come release 2.6.0 tell me, I can provide support, of course! > > Best regards, Tobias Best Regards, Denis -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUBW217tAgxsL0D2LGCAQh1+BAAkgKJeY+Sr6D1H9E8NO2F5Z5ksOnQe3TN 4s2PSVHRJuIAC1PU60Vl3RIcPtrUyO3XZuRLBdkLwRZlSZYtD69w20FODf7YfwB4 OiSAnsZP6dMkMjuNgBh7hB1/chIEnkZSmxtlqFCOwO+f+Ki1Xz2/CAARHzd3NgPo B0kUvBlVvNEFYFdyIh5uaMsbFbiUKkdR4QOZ/ROtdo0SfVoG+dRhcc7xu76aGnYw ylzVpnZmB02krd6XVpgMp4mnQLzCmjvD8cLAbUfBR2mU52McvJmlcPLSaRhp34xI XHai5LLAiSbykY3Sb64yb/7HQT/ZeHoYbsmMD1EeLEg0ohYN6GoPxGwWTrnDZ/H4 jM0K5oZmVEG8goQn1g3+juHW+T0i2Yxd4mSkPiryUxoJFzj8DmSww/AzEppeX+B+ OCk9vo6wegwXtHcv7nETX9pezMYN2HaFhLL/MzdOM3z4X2SZc5UEA0SZaI29IqUW 6tFDRsXnS0GQsZ4TgBNEc1vm1lDRXgYNpZuESbQNvkizjIUcZiHCk3Ri6ViY2n7M 9Q72o4hzroAx0YOF967FFdQj/8AXKzoMhWup/wCYiSsVCDBJjHtHcNXrvTGgEF2W ZgBrwdRVJZ4ClVGjEgWlM996TKE2AmdfngHZrYujO0FAY/eRBpxYmOILC0by1CaK g2e1qODzwrM= =0SEA -----END PGP SIGNATURE----- |
From: Dennis K. <de...@ne...> - 2018-08-10 10:42:13
|
On 08/10/2018 05:13 PM, Tobias Specht wrote: > Hi Dennis, > > nice to hear you are planing to write a backup application with libdar. > I'm using the libdar API with my small tool Gdar myself: > https://github.com/peckto/gdar > http://www.peckto.de/gdar/gdar (Webseite having currently some linking > problems...) > It's GTK though. Gdar can only extract dar archives at the moment. > Feel free to work with my code, or use it as an libdar example :) > > I'm planing a larger tool with automated backup too. > But I'm still in the planing phase... > Trying to solve some general problems with backups in desktop environments, > like inhibit/delay shutdown: > https://forum.kde.org/viewtopic.php?f=305&t=141575 > If you have an idea on it, let me know. > Thank you. Even though I've started, I'm still thinking it might be better to work with an existing program than create yet another program. I'm a self-taught hobby programmer, and part of the motive is simply having something maybe worthwhile to work on. It's actually dar_manager and the database which interests me more, and allowing the user to easily see which versions of which files are in their backup, and restore them. Restoration from backups is more often to recover a small number of accidentally deleted or overwritten files, than full system restores, or perhaps to see a file as it was some time ago. I played around with GDar a couple of years ago or so. Is this tool you are working on going to be libdar based? As for the shutdown, I don't know how to do it under KDE, and I don't think you can (with good reason). The answer provided in that thread, where I presume the mainwindow overrides the closeEvent slot to at least give a warning is probably the best one. The problem is, if it were possible, it would only work under KDE. Run the software under FVWM, or Fluxbox, or something else, and it won't inhibit closure of the windowing system. This would lead to users having incomplete backups, possibly without them knowing. It also doesn't inhibit closure from another user logged in, or the root user. I would use systemd-inhibit. See the second answer here. https://unix.stackexchange.com/questions/34489/how-to-disable-shutdown-so-that-an-important-process-cannot-be-interrupted#264745 > About your initial problem, I'm using fedora and dar as an rpm myself. > Did you write the packet maintainer about the compiler flag? > Maybe he can add it to the building instructions. > Yes, I did make the suggestion. > Hi Denis :) > > nice to hear you are improving the libdar API. > Will have a look on it. > > Best regards, > Tobias > > Am Donnerstag, 9. August 2018, 22:29:11 CEST schrieb Denis Corbin: >> On 09/08/2018 13:37, Dennis Katsonis wrote: >>> Hello, >> >> Hello Dennis, >> >>> I am developing a front end for Dar which is intended not just to >>> provide a graphical way of creating archive, but also provide >>> basic backup management. The application will be written using the >>> Qt toolkit and using libdar directly. >> >> nice! :) >> >> Be aware that next to come major release 2.6.0 brings some API >> re-design to simplify the use (less libdar specify auxiliary types) >> and added new features. though there will be the same API in the >> specific 'libdar5' namespace and I will be available to help you >> migrating to the API v6 upon request. >> >>> I note that the version of dar compiled for Fedora uses the >>> unlimited integer size. The performance of dar on archives with >>> large numbers of files is not satisfactory, and would unfortunately >>> also mean that the graphical application would stall and delay. >> >> this is know limitation of the 'infinint' dar/libdar flavor >> http://dar.linux.free.fr/doc/Limitations.html >> >>> The following command on an archive containing about 1 million >>> files takes 10 minutes. >>> >>> $ time /usr/bin/dar -l root > /dev/null >>> >>> /usr/bin/dar -l root > /dev/null 615.30s user 1.81s system 107% >>> cpu 9:31.64 total >>> >>> Memory usage peaks at 2124MB. >>> >>> >>> This delay is seen when listing, when scanning a reference archive >>> when creating a differential backup or when adding the archive to >>> a dar_manager database. It also causes a delay when extracting a >>> file, which kind of defeats the purpose of having random access to >>> files. It would probably take as long to extract a file from a >>> compressed tarball. >>> >>> It also means that dar cannot complete a backup of my root >>> directory on my laptop with 2G of RAM. >>> >>> I compiled dar 2.5.16 with the --enable-mode=64 option, and the >>> performance greatly increased. >>> >>> For the exact same archive, using 64 bit integers. >>> >>> $ time /usr/bin/dar -l root > /dev/null >>> >>> dar -l root > /dev/null 28.89s user 0.48s system 97% cpu 30.253 >>> total >>> >>> A 20x speed increase. >>> >>> Memory usage peaked at 879MB, still high, but far better. >>> dar_manager operations were faster, but still slow. >>> >>> For smaller archives, the difference is less noticable. It seems >>> that dar operations increase exponentially in CPU time as the >>> number of files increase. For smaller archives, the difference was >>> less noticeable, but still there. >> >> the memory requirement is not exponential but proportional to the >> number of file saved. The CPU requirement is rawly proportional to the >> volume of data to treat (CRC computation, compression, encryption, >> ...). This is true for both 64 bits and infinint flavors, though the >> infinint flavor does not rely on CPU integer operation where from its >> slowness. >> >>> The dar website seems to suggest that the cost of infinint is >>> modest, but my testing indicates that for what would be a regular >>> backup scenario, the cost is high. >> >> Where have you read that? This should be an error to be fixed. >> >>> Looking at the page listing the limitations, the limitations of 64 >>> bit integers seems to far, far exceed what is required, and what >>> technology today can support anyway, and likely what technology >>> for many years to come can support. >>> >>> I suggest that inifint as an integer type should not be the >>> default. >> >> ... that's to be considered, though there is warning at compilation >> time when you compile using infinint... thus, if the one that compile >> does not even read that warning, he will neither read documentation, >> limitations and will blindly complain for any problem he will meet, >> such people drain a lot of time and are always unsatisfied at the >> end... so that's usually a good thing for me they do not use dar, it >> saves me time to do more interesting things than trying to justify and >> explain ... >> >>> It add in some cases unacceptable costs for no practical gain. >>> While some distributors compile with 64 bit integers (MacOSX brew), >>> other use the default (Fedora) which leads to a dar binary which >>> people may consider broken or buggy. >> >> Well, that's correct... >> >>> My other question is that the API uses infinint for values >>> internally. How does a libdar compiled with 64 bit integers impact >>> what is returned from methods returning an infinint? >> >> infinint and 64 bits flavors only differ by the way the "infinint" >> class is implemented. infinint is an alias (typedef if you prefer) to >> either "class real_infinint" or "class limitint" (32 or 64 bits >> integer underneath). Both classes have the same interface with the >> reste of libdar, only their implementation differ. >> >> There is still infinint class up to the API... in APIv6 I've pushed >> away a lot of internal types (including infinint) using pimpl idiom >> for some classes, but that was too complicated or it would have >> impacted performances to do it for all API related classes... thus the >> API remains indirectly dependent on either real_infinint/limitint >> class used in libdar. I other words, if you program has been >> dynamically linked with libdar64 it won't be possible to have it >> dynamically linked with libdar (relying on infinint), at least today. >> I have not done the test with APIv6 but I pretty sure it wont work. >> >>> I plan to possible use a linked-in libdar compiled with 64 bit >>> integers to ensure good performance. Does infinint convert >>> internall from a native 64 bit to an infinint type? >> >> Not exactly. both classes (limitint and real_infinint) do the same >> thing in particular they store/read integers the same way into a dar >> archive. So the resulting archive is the same. If an integer is too >> large to be handled by class limitint, the class will detect overflow >> during arithmetic operation or while reading integer from an archive >> and libdar will abort with an Elimitint exception. >> >> Historically libdar relied on real_infinint (class was named infinint >> at that time) but due to poor performances the limitint class has been >> created to be substitued to class infinint (now renamed as >> real_infinint). Internally dar does not directly manipulate 64 bits >> intergers, for dates, for sizes, for offset, for anything, unless when >> dealing with system calls and library calls, where the "infinint" >> class has the ability to convert from and to those classical integer >> types like size_t and the like. >> >> So if your plan is to statically link your program with libdar64 there >> is no issue, it will work flawlessly. >> >>> Thanks, Dennis >> >> Cheers, >> Denis >> >> > > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Dar-libdar_api mailing list > Dar...@li... > https://lists.sourceforge.net/lists/listinfo/dar-libdar_api > |
From: Dennis K. <de...@ne...> - 2018-08-10 10:09:27
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 08/10/2018 06:29 AM, Denis Corbin wrote: > On 09/08/2018 13:37, Dennis Katsonis wrote: >> Hello, > > Hello Dennis, > > >> I am developing a front end for Dar which is intended not just to >> provide a graphical way of creating archive, but also provide >> basic backup management. The application will be written using >> the Qt toolkit and using libdar directly. > > nice! :) > I am still in two minds about it. It is intended to be backup-focused, for those who like manual, simple backups, but with more emphasis on making restoration and viewing and managing the state of the backups easy and visible. I'm not sure whether this should be the separate project I've started, or a contribution to KDar which includes dar_database style management functionality. I should give my thanks to you for creating dar, as I was looking for a replacement for dump/restore and it met all my needs. Simple, easy differential backup, able to do ad-hoc backups, backups in manageable file archives, encryption and can reliably save ALL the files attributes easily. It is a software package where you can tell the author has given a lot of thought to how it might be used and what people might want to do, and accommodated for that and documented it wel l. > Be aware that next to come major release 2.6.0 brings some API > re-design to simplify the use (less libdar specify auxiliary > types) and added new features. though there will be the same API in > the specific 'libdar5' namespace and I will be available to help > you migrating to the API v6 upon request. > > >> I note that the version of dar compiled for Fedora uses the >> unlimited integer size. The performance of dar on archives with >> large numbers of files is not satisfactory, and would >> unfortunately also mean that the graphical application would >> stall and delay. > > this is know limitation of the 'infinint' dar/libdar flavor > http://dar.linux.free.fr/doc/Limitations.html > > > >> The following command on an archive containing about 1 million >> files takes 10 minutes. > >> $ time /usr/bin/dar -l root > /dev/null > >> /usr/bin/dar -l root > /dev/null 615.30s user 1.81s system 107% >> cpu 9:31.64 total > >> Memory usage peaks at 2124MB. > > >> This delay is seen when listing, when scanning a reference >> archive when creating a differential backup or when adding the >> archive to a dar_manager database. It also causes a delay when >> extracting a file, which kind of defeats the purpose of having >> random access to files. It would probably take as long to extract >> a file from a compressed tarball. > >> It also means that dar cannot complete a backup of my root >> directory on my laptop with 2G of RAM. > >> I compiled dar 2.5.16 with the --enable-mode=64 option, and the >> performance greatly increased. > >> For the exact same archive, using 64 bit integers. > >> $ time /usr/bin/dar -l root > /dev/null > >> dar -l root > /dev/null 28.89s user 0.48s system 97% cpu 30.253 >> total > >> A 20x speed increase. > >> Memory usage peaked at 879MB, still high, but far better. >> dar_manager operations were faster, but still slow. > >> For smaller archives, the difference is less noticable. It >> seems that dar operations increase exponentially in CPU time as >> the number of files increase. For smaller archives, the >> difference was less noticeable, but still there. > > the memory requirement is not exponential but proportional to the > number of file saved. The CPU requirement is rawly proportional to > the volume of data to treat (CRC computation, compression, > encryption, ...). This is true for both 64 bits and infinint > flavors, though the infinint flavor does not rely on CPU integer > operation where from its slowness. > > Thank you for the clarification. I didn't do the math over archives of different sizes, and went by my initial impression. > >> The dar website seems to suggest that the cost of infinint is >> modest, but my testing indicates that for what would be a >> regular backup scenario, the cost is high. > > Where have you read that? This should be an error to be fixed. > In the FAQ about a 'slight' penalty. http://dar.linux.free.fr/doc/FAQ.html Under "What slice size can I use with dar?" it says "thanks to its internal own integer type named "infinint" dar is able to handle arbitrarily large integers. This has a slightly memory and CPU penalty in regard to using native computer 32 or 64 bits integers, but has the advantage to provide a long term implementation in dar." > >> Looking at the page listing the limitations, the limitations of >> 64 bit integers seems to far, far exceed what is required, and >> what technology today can support anyway, and likely what >> technology for many years to come can support. > >> I suggest that inifint as an integer type should not be the >> default. > > ... that's to be considered, though there is warning at > compilation time when you compile using infinint... thus, if the > one that compile does not even read that warning, he will neither > read documentation, limitations and will blindly complain for any > problem he will meet, such people drain a lot of time and are > always unsatisfied at the end... so that's usually a good thing for > me they do not use dar, it saves me time to do more interesting > things than trying to justify and explain ... > >> It add in some cases unacceptable costs for no practical gain. >> While some distributors compile with 64 bit integers (MacOSX >> brew), other use the default (Fedora) which leads to a dar binary >> which people may consider broken or buggy. > > Well, that's correct... > > My first impression was a bug, and I was looking to migrate away from using dar until I did some web searching and thought that maybe the integer type was more significant than it might appear. I was a little concerned it might turn people off using dar if they find the program seems to hang for 10 minutes or so while restoring a single file . I'm not sure under what circumstances anyone would reach the stated limits though, unless I'm reading the website wrong and the file size limits are not 18EB but smaller. >> My other question is that the API uses infinint for values >> internally. How does a libdar compiled with 64 bit integers >> impact what is returned from methods returning an infinint? > > infinint and 64 bits flavors only differ by the way the "infinint" > class is implemented. infinint is an alias (typedef if you prefer) > to either "class real_infinint" or "class limitint" (32 or 64 bits > integer underneath). Both classes have the same interface with the > reste of libdar, only their implementation differ. > > There is still infinint class up to the API... in APIv6 I've > pushed away a lot of internal types (including infinint) using > pimpl idiom for some classes, but that was too complicated or it > would have impacted performances to do it for all API related > classes... thus the API remains indirectly dependent on either > real_infinint/limitint class used in libdar. I other words, if you > program has been dynamically linked with libdar64 it won't be > possible to have it dynamically linked with libdar (relying on > infinint), at least today. I have not done the test with APIv6 but > I pretty sure it wont work. > >> I plan to possible use a linked-in libdar compiled with 64 bit >> integers to ensure good performance. Does infinint convert >> internall from a native 64 bit to an infinint type? > Not exactly. both classes (limitint and real_infinint) do the same > thing in particular they store/read integers the same way into a > dar archive. So the resulting archive is the same. If an integer is > too large to be handled by class limitint, the class will detect > overflow during arithmetic operation or while reading integer from > an archive and libdar will abort with an Elimitint exception. > > Historically libdar relied on real_infinint (class was named > infinint at that time) but due to poor performances the limitint > class has been created to be substitued to class infinint (now > renamed as real_infinint). Internally dar does not directly > manipulate 64 bits intergers, for dates, for sizes, for offset, for > anything, unless when dealing with system calls and library calls, > where the "infinint" class has the ability to convert from and to > those classical integer types like size_t and the like. > > So if your plan is to statically link your program with libdar64 > there is no issue, it will work flawlessly. > > >> Thanks, Dennis > > > > Cheers, Denis > > > ---------------------------------------------------------------------- - -------- > > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ Dar-libdar_api > mailing list Dar...@li... > https://lists.sourceforge.net/lists/listinfo/dar-libdar_api > -----BEGIN PGP SIGNATURE----- iHUEAREIAB0WIQS9XnmVf3NcHCqygFfXi6TdSq3zSAUCW21iOgAKCRDXi6TdSq3z SF2DAQDjrCoErr8lfl5k7HX3MWz5en5zO0MsSoqenw/jgyer7AD/ctIaQaJoXgeA 5gacgUNo1V7ebG5i1Xr8yLHZh5xvFcc= =ToBl -----END PGP SIGNATURE----- |
From: Tobias S. <spe...@gm...> - 2018-08-10 07:13:44
|
Hi Dennis, nice to hear you are planing to write a backup application with libdar. I'm using the libdar API with my small tool Gdar myself: https://github.com/peckto/gdar http://www.peckto.de/gdar/gdar (Webseite having currently some linking problems...) It's GTK though. Gdar can only extract dar archives at the moment. Feel free to work with my code, or use it as an libdar example :) I'm planing a larger tool with automated backup too. But I'm still in the planing phase... Trying to solve some general problems with backups in desktop environments, like inhibit/delay shutdown: https://forum.kde.org/viewtopic.php?f=305&t=141575 If you have an idea on it, let me know. About your initial problem, I'm using fedora and dar as an rpm myself. Did you write the packet maintainer about the compiler flag? Maybe he can add it to the building instructions. Hi Denis :) nice to hear you are improving the libdar API. Will have a look on it. Best regards, Tobias Am Donnerstag, 9. August 2018, 22:29:11 CEST schrieb Denis Corbin: > On 09/08/2018 13:37, Dennis Katsonis wrote: > > Hello, > > Hello Dennis, > > > I am developing a front end for Dar which is intended not just to > > provide a graphical way of creating archive, but also provide > > basic backup management. The application will be written using the > > Qt toolkit and using libdar directly. > > nice! :) > > Be aware that next to come major release 2.6.0 brings some API > re-design to simplify the use (less libdar specify auxiliary types) > and added new features. though there will be the same API in the > specific 'libdar5' namespace and I will be available to help you > migrating to the API v6 upon request. > > > I note that the version of dar compiled for Fedora uses the > > unlimited integer size. The performance of dar on archives with > > large numbers of files is not satisfactory, and would unfortunately > > also mean that the graphical application would stall and delay. > > this is know limitation of the 'infinint' dar/libdar flavor > http://dar.linux.free.fr/doc/Limitations.html > > > The following command on an archive containing about 1 million > > files takes 10 minutes. > > > > $ time /usr/bin/dar -l root > /dev/null > > > > /usr/bin/dar -l root > /dev/null 615.30s user 1.81s system 107% > > cpu 9:31.64 total > > > > Memory usage peaks at 2124MB. > > > > > > This delay is seen when listing, when scanning a reference archive > > when creating a differential backup or when adding the archive to > > a dar_manager database. It also causes a delay when extracting a > > file, which kind of defeats the purpose of having random access to > > files. It would probably take as long to extract a file from a > > compressed tarball. > > > > It also means that dar cannot complete a backup of my root > > directory on my laptop with 2G of RAM. > > > > I compiled dar 2.5.16 with the --enable-mode=64 option, and the > > performance greatly increased. > > > > For the exact same archive, using 64 bit integers. > > > > $ time /usr/bin/dar -l root > /dev/null > > > > dar -l root > /dev/null 28.89s user 0.48s system 97% cpu 30.253 > > total > > > > A 20x speed increase. > > > > Memory usage peaked at 879MB, still high, but far better. > > dar_manager operations were faster, but still slow. > > > > For smaller archives, the difference is less noticable. It seems > > that dar operations increase exponentially in CPU time as the > > number of files increase. For smaller archives, the difference was > > less noticeable, but still there. > > the memory requirement is not exponential but proportional to the > number of file saved. The CPU requirement is rawly proportional to the > volume of data to treat (CRC computation, compression, encryption, > ...). This is true for both 64 bits and infinint flavors, though the > infinint flavor does not rely on CPU integer operation where from its > slowness. > > > The dar website seems to suggest that the cost of infinint is > > modest, but my testing indicates that for what would be a regular > > backup scenario, the cost is high. > > Where have you read that? This should be an error to be fixed. > > > Looking at the page listing the limitations, the limitations of 64 > > bit integers seems to far, far exceed what is required, and what > > technology today can support anyway, and likely what technology > > for many years to come can support. > > > > I suggest that inifint as an integer type should not be the > > default. > > ... that's to be considered, though there is warning at compilation > time when you compile using infinint... thus, if the one that compile > does not even read that warning, he will neither read documentation, > limitations and will blindly complain for any problem he will meet, > such people drain a lot of time and are always unsatisfied at the > end... so that's usually a good thing for me they do not use dar, it > saves me time to do more interesting things than trying to justify and > explain ... > > > It add in some cases unacceptable costs for no practical gain. > > While some distributors compile with 64 bit integers (MacOSX brew), > > other use the default (Fedora) which leads to a dar binary which > > people may consider broken or buggy. > > Well, that's correct... > > > My other question is that the API uses infinint for values > > internally. How does a libdar compiled with 64 bit integers impact > > what is returned from methods returning an infinint? > > infinint and 64 bits flavors only differ by the way the "infinint" > class is implemented. infinint is an alias (typedef if you prefer) to > either "class real_infinint" or "class limitint" (32 or 64 bits > integer underneath). Both classes have the same interface with the > reste of libdar, only their implementation differ. > > There is still infinint class up to the API... in APIv6 I've pushed > away a lot of internal types (including infinint) using pimpl idiom > for some classes, but that was too complicated or it would have > impacted performances to do it for all API related classes... thus the > API remains indirectly dependent on either real_infinint/limitint > class used in libdar. I other words, if you program has been > dynamically linked with libdar64 it won't be possible to have it > dynamically linked with libdar (relying on infinint), at least today. > I have not done the test with APIv6 but I pretty sure it wont work. > > > I plan to possible use a linked-in libdar compiled with 64 bit > > integers to ensure good performance. Does infinint convert > > internall from a native 64 bit to an infinint type? > > Not exactly. both classes (limitint and real_infinint) do the same > thing in particular they store/read integers the same way into a dar > archive. So the resulting archive is the same. If an integer is too > large to be handled by class limitint, the class will detect overflow > during arithmetic operation or while reading integer from an archive > and libdar will abort with an Elimitint exception. > > Historically libdar relied on real_infinint (class was named infinint > at that time) but due to poor performances the limitint class has been > created to be substitued to class infinint (now renamed as > real_infinint). Internally dar does not directly manipulate 64 bits > intergers, for dates, for sizes, for offset, for anything, unless when > dealing with system calls and library calls, where the "infinint" > class has the ability to convert from and to those classical integer > types like size_t and the like. > > So if your plan is to statically link your program with libdar64 there > is no issue, it will work flawlessly. > > > Thanks, Dennis > > Cheers, > Denis > > |
From: Denis C. <dar...@fr...> - 2018-08-09 20:29:27
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 09/08/2018 13:37, Dennis Katsonis wrote: > Hello, Hello Dennis, > > I am developing a front end for Dar which is intended not just to > provide a graphical way of creating archive, but also provide > basic backup management. The application will be written using the > Qt toolkit and using libdar directly. nice! :) Be aware that next to come major release 2.6.0 brings some API re-design to simplify the use (less libdar specify auxiliary types) and added new features. though there will be the same API in the specific 'libdar5' namespace and I will be available to help you migrating to the API v6 upon request. > > I note that the version of dar compiled for Fedora uses the > unlimited integer size. The performance of dar on archives with > large numbers of files is not satisfactory, and would unfortunately > also mean that the graphical application would stall and delay. this is know limitation of the 'infinint' dar/libdar flavor http://dar.linux.free.fr/doc/Limitations.html > > The following command on an archive containing about 1 million > files takes 10 minutes. > > $ time /usr/bin/dar -l root > /dev/null > > /usr/bin/dar -l root > /dev/null 615.30s user 1.81s system 107% > cpu 9:31.64 total > > Memory usage peaks at 2124MB. > > > This delay is seen when listing, when scanning a reference archive > when creating a differential backup or when adding the archive to > a dar_manager database. It also causes a delay when extracting a > file, which kind of defeats the purpose of having random access to > files. It would probably take as long to extract a file from a > compressed tarball. > > It also means that dar cannot complete a backup of my root > directory on my laptop with 2G of RAM. > > I compiled dar 2.5.16 with the --enable-mode=64 option, and the > performance greatly increased. > > For the exact same archive, using 64 bit integers. > > $ time /usr/bin/dar -l root > /dev/null > > dar -l root > /dev/null 28.89s user 0.48s system 97% cpu 30.253 > total > > A 20x speed increase. > > Memory usage peaked at 879MB, still high, but far better. > dar_manager operations were faster, but still slow. > > For smaller archives, the difference is less noticable. It seems > that dar operations increase exponentially in CPU time as the > number of files increase. For smaller archives, the difference was > less noticeable, but still there. the memory requirement is not exponential but proportional to the number of file saved. The CPU requirement is rawly proportional to the volume of data to treat (CRC computation, compression, encryption, ...). This is true for both 64 bits and infinint flavors, though the infinint flavor does not rely on CPU integer operation where from its slowness. > > The dar website seems to suggest that the cost of infinint is > modest, but my testing indicates that for what would be a regular > backup scenario, the cost is high. Where have you read that? This should be an error to be fixed. > > Looking at the page listing the limitations, the limitations of 64 > bit integers seems to far, far exceed what is required, and what > technology today can support anyway, and likely what technology > for many years to come can support. > > I suggest that inifint as an integer type should not be the > default. ... that's to be considered, though there is warning at compilation time when you compile using infinint... thus, if the one that compile does not even read that warning, he will neither read documentation, limitations and will blindly complain for any problem he will meet, such people drain a lot of time and are always unsatisfied at the end... so that's usually a good thing for me they do not use dar, it saves me time to do more interesting things than trying to justify and explain ... > It add in some cases unacceptable costs for no practical gain. > While some distributors compile with 64 bit integers (MacOSX brew), > other use the default (Fedora) which leads to a dar binary which > people may consider broken or buggy. Well, that's correct... > > My other question is that the API uses infinint for values > internally. How does a libdar compiled with 64 bit integers impact > what is returned from methods returning an infinint? infinint and 64 bits flavors only differ by the way the "infinint" class is implemented. infinint is an alias (typedef if you prefer) to either "class real_infinint" or "class limitint" (32 or 64 bits integer underneath). Both classes have the same interface with the reste of libdar, only their implementation differ. There is still infinint class up to the API... in APIv6 I've pushed away a lot of internal types (including infinint) using pimpl idiom for some classes, but that was too complicated or it would have impacted performances to do it for all API related classes... thus the API remains indirectly dependent on either real_infinint/limitint class used in libdar. I other words, if you program has been dynamically linked with libdar64 it won't be possible to have it dynamically linked with libdar (relying on infinint), at least today. I have not done the test with APIv6 but I pretty sure it wont work. > I plan to possible use a linked-in libdar compiled with 64 bit > integers to ensure good performance. Does infinint convert > internall from a native 64 bit to an infinint type? Not exactly. both classes (limitint and real_infinint) do the same thing in particular they store/read integers the same way into a dar archive. So the resulting archive is the same. If an integer is too large to be handled by class limitint, the class will detect overflow during arithmetic operation or while reading integer from an archive and libdar will abort with an Elimitint exception. Historically libdar relied on real_infinint (class was named infinint at that time) but due to poor performances the limitint class has been created to be substitued to class infinint (now renamed as real_infinint). Internally dar does not directly manipulate 64 bits intergers, for dates, for sizes, for offset, for anything, unless when dealing with system calls and library calls, where the "infinint" class has the ability to convert from and to those classical integer types like size_t and the like. So if your plan is to statically link your program with libdar64 there is no issue, it will work flawlessly. > > Thanks, Dennis > > Cheers, Denis -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIVAwUBW2ykFwgxsL0D2LGCAQiaXw/+NuIGHOGUEFR56UXzsGYm39Nv+iM4Pyx7 5KA5Xqd/vsAf0pE4mLNFlwnNVqUUXUaK6899+UYymr/Y5+myvNlfRfFp0PLilMJI pT8XNiMhI1P6lQL17lHQuvKCbl6K2B4cG9FwzmC35Eg3nVuxsKt+N5QCSEl8WzBd j3alkO0hK/AnyHNC4LkJJ8sdNjOyQLiLoCrMVlw2baO5XwzAzKtV39lDiRpqF9su 4Wu28UkD8ITzE0cFTrlKuBAVqGWLYT9As1pQHhKuRMj61e3j5wRZ/B/wVqByJ4ic rClXaMRDgT3rDRWkfrOo+smTeKnlPzkmn+EexztSF9V5T8rXiGxDqXp/Ik8RgfKl DJvvtznmqGrzKqjsnDjzAvssZwEOcYJZST/BFyraNRLqdNmhQ4inMTZ2IOVf809f EdbVhmW8wS2l+hJjlPVr3xjWkgX4vXVag6xJwKB9FIV4SqfwcdZUwIowJGbVIKax xX9cTQYCqr7C5x+1JUsl14rvBBXGdBtApXD/K4QACUJYjVBr2+ne4wQiy0Dq/NUG QeqFqgqcg1vypWjYuOVi9x/On/X0haSorp4GL51NDWS/7M8KBoDi1oo6EpwvBJVd FLC43eDWP7yK6YN2Bn221LY+MnqohGcf7stNgPgWcoCLVjwNChxxyQkKGa0g3lV3 6DYX6KGzusQ= =7hCG -----END PGP SIGNATURE----- |
From: Dennis K. <de...@ne...> - 2018-08-09 12:07:27
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hello, I am developing a front end for Dar which is intended not just to provide a graphical way of creating archive, but also provide basic backup management. The application will be written using the Qt toolkit and using libdar directly. I note that the version of dar compiled for Fedora uses the unlimited integer size. The performance of dar on archives with large numbers of files is not satisfactory, and would unfortunately also mean that the graphical application would stall and delay. The following command on an archive containing about 1 million files takes 10 minutes. $ time /usr/bin/dar -l root > /dev/null /usr/bin/dar -l root > /dev/null 615.30s user 1.81s system 107% cpu 9:31.64 total Memory usage peaks at 2124MB. This delay is seen when listing, when scanning a reference archive when creating a differential backup or when adding the archive to a dar_manager database. It also causes a delay when extracting a file, which kind of defeats the purpose of having random access to files. It would probably take as long to extract a file from a compressed tarball. It also means that dar cannot complete a backup of my root directory on my laptop with 2G of RAM. I compiled dar 2.5.16 with the --enable-mode=64 option, and the performance greatly increased. For the exact same archive, using 64 bit integers. $ time /usr/bin/dar -l root > /dev/null dar -l root > /dev/null 28.89s user 0.48s system 97% cpu 30.253 total A 20x speed increase. Memory usage peaked at 879MB, still high, but far better. dar_manager operations were faster, but still slow. For smaller archives, the difference is less noticable. It seems that dar operations increase exponentially in CPU time as the number of files increase. For smaller archives, the difference was less noticeable, but still there. The dar website seems to suggest that the cost of infinint is modest, but my testing indicates that for what would be a regular backup scenario, the cost is high. Looking at the page listing the limitations, the limitations of 64 bit integers seems to far, far exceed what is required, and what technology today can support anyway, and likely what technology for many years to come can support. I suggest that inifint as an integer type should not be the default. It add in some cases unacceptable costs for no practical gain. While some distributors compile with 64 bit integers (MacOSX brew), other use the default (Fedora) which leads to a dar binary which people may consider broken or buggy. My other question is that the API uses infinint for values internally. How does a libdar compiled with 64 bit integers impact what is returned from methods returning an infinint? I plan to possible use a linked-in libdar compiled with 64 bit integers to ensure good performance. Does infinint convert internall from a native 64 bit to an infinint type? Thanks, Dennis -----BEGIN PGP SIGNATURE----- iHUEAREIAB0WIQS9XnmVf3NcHCqygFfXi6TdSq3zSAUCW2wnSAAKCRDXi6TdSq3z SC2DAQDaySQTiVL/8UPGjazwEhUn3N8SrC7yjuRFNXgb6sMW+gEAoO9SUPoXQk2g u4ldwdE8HdjDf/AGfskWWBIpnjM2oNY= =G8bn -----END PGP SIGNATURE----- |
From: Denis C. <dar...@fr...> - 2017-09-09 11:56:34
|
Hi all, this mailing-list is hosted at Sourceforge where the policy has recently changed: Subscribed users needed to manually resubscribed before August not to be removed from the mailing-list. I have been removed myself so I may have missed some support request since July. Well, it seems I missed or did not paid attention to the notice sent by Sourceforge about that new policy ... ... anyway, this mail has two main purposes: - adding a trace in the mailing-list archive just in case - and second, checking that the mailing-list to newsgroup gateway at gmane.org, is still operational. Sorry for inconvenience Cheers, Denis |
From: Denis C. <dar...@fr...> - 2016-01-09 18:32:40
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Tobias, Yep, I missed it, sorry. This now fixed in GIT and ready for next release. Thanks for your feedback. Regards, Denis. Le 08/01/2016 21:38, Tobias Specht a écrit : > Hi Denis, > > since dar version 2.5.1 including 2.5.3 I'm experiencing problems > to include libdar in my project gdar. The compiler races the > following error: In file included from > /usr/include/dar/storage.hpp:29:0, from > /usr/include/dar/real_infinint.hpp:43, from > /usr/include/dar/infinint.hpp:31, from > /usr/include/dar/compressor.hpp:31, from > /usr/include/dar/libdar.hpp:77, from mylibdar.hpp:26, from > gdar.cpp:22: /usr/include/dar/on_pool.hpp:37:45: fatal error: > /usr/include/dar/cygwin_adapt.hpp: No such file or directory > #include "/usr/include/dar/cygwin_adapt.hpp" > > In dar version 2.5.3 cat_tools.hpp seams also to be missing. > > The missing header files are part of the release but are not copied > during installation. > > I compiled dar from source like: ./configure --prefix=/usr make > make install > > The system I'm using is LinuxMint 17.3, but the problem probably > occurs on other systems as well. > > The compiler option when including libdar is: `pkg-config --cflags > libdar` > > Do I miss some compiler or config options or are the files just > missing? > > Best regards, Tobias > > ------------------------------------------------------------------------------ > > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions > now Troubleshoot faster and improve end-user experience. Signup > Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > _______________________________________________ Dar-libdar_api > mailing list Dar...@li... > https://lists.sourceforge.net/lists/listinfo/dar-libdar_api > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIVAwUBVpFSQAgxsL0D2LGCAQJkFg/7B1YDF889t5IBrrGLcTpRCRUI+QkJ3h8B V6RKkHjAvScWECD53fDbwh0VrQ9B2Vm25LlcLPxxg2uNLKFo6sWHKn2NqvSJNe34 /Qi3o3DGUP3uwzwvdlEE2bRa61cmJBzZ09tYI9hs4YfKLJDQOdaTOEE3iXwD6gte tnbMwUGfAC2Wou9cdc6IkLAJrtYDNoKLkjNb8SDhJYkOQNZ6lIMTt7XdIzfUGrsP xbVoVaC1lIs7xl3FCEnkUo/dQTtNGEm7a2wP+XMQiuOtI+HGu6MCflZElKz4mMZ7 KtBAgbo58CYkZRgIbUV7ppXLG0sdYd5qGFq+Ikhh+TLMug8BAAtiT2zbnelIgHOq AuTuOfatUANxu/P3TBzMf7W1/+ZZ/csZbQbgfzOuggz1SAz/6ERyqIKPz1iO1+OL 86tcYG2mD2laXtbAt1UcQ2CYlFo0T2y6lXi7XPzolUyQnDD46/UJbtqqxH1VFciX GmftQ5npKZ2RPAPjHUGCqd+yP/xmUM4F3jvuWKL/FVI6sMmMVRIMhcDvTHz1G/x2 yeV0Wx3XS7PzKzlIaABeSUozAs96TPM93hjIEYRMe+rwab7Y+3wYDBpD+bKnOZSw qfwuAmx+FTeWdg5bDlHpwnJ48pSlgh5mQk6EHKAfDHpCjgYlBmF13uD73PpTEkr5 gUuEGMd8ZFc= =F9ll -----END PGP SIGNATURE----- |
From: Tobias S. <spe...@gm...> - 2016-01-08 20:38:44
|
Hi Denis, since dar version 2.5.1 including 2.5.3 I'm experiencing problems to include libdar in my project gdar. The compiler races the following error: In file included from /usr/include/dar/storage.hpp:29:0, from /usr/include/dar/real_infinint.hpp:43, from /usr/include/dar/infinint.hpp:31, from /usr/include/dar/compressor.hpp:31, from /usr/include/dar/libdar.hpp:77, from mylibdar.hpp:26, from gdar.cpp:22: /usr/include/dar/on_pool.hpp:37:45: fatal error: /usr/include/dar/cygwin_adapt.hpp: No such file or directory #include "/usr/include/dar/cygwin_adapt.hpp" In dar version 2.5.3 cat_tools.hpp seams also to be missing. The missing header files are part of the release but are not copied during installation. I compiled dar from source like: ./configure --prefix=/usr make make install The system I'm using is LinuxMint 17.3, but the problem probably occurs on other systems as well. The compiler option when including libdar is: `pkg-config --cflags libdar` Do I miss some compiler or config options or are the files just missing? Best regards, Tobias |
From: Tobias S. <spe...@gm...> - 2014-10-12 23:07:59
|
Hi Denis, thank you very much for the official feature request. I know it takes time to implement such a complex feature and you have other open points too. If I can help you in some point please let me know. About the hashes, one salt will be used for one archive. An attacker shouldn't be able to use one hash twice (on two different archives), this is the purpose of the salt. The salt does not make the hash more secure in general, because it is public. It has to be stored beside the hash table outside the encrypted area of the archive. The only attack I can think about is the situation when the attacker can guess the path and file name. Then he could proof that one specific file is in the archive. To prevent this case I added the other values: H(path+filename + inodeID + mtime + UUID + salt) It is meant that only the inodeID + mtime + UUID provide enough security to make it inefficient to crack the hash. The time an attacker would need to do so can be calculated as follows. We assume: * the path and file name can be guessed * the attacker don't have to try more then 100000 inodeIDs * mtime can be limited within one day * a 32Bit UUID (meaning 32Bits that can't be guessed) * 1 hash round (if it doesn't bother you we can go up to 100 rounds) * a GPU cluster (http://hashcat.net/oclhashcat/#performance) The possibilities that have to be brute-forced are: 100000*60*60*24*2^32 When we assume the attacker can generate 2005M c/s he will need: ((100000*2^32*60*60*24) / (2005*10^6))[s] ~ 580 years I guess this is quite to long to wait. I have created an demo application for the hashed catalogue: https://github.com/peckto/hash_dic_test It creates the hashes as discussed (1*SHA3-512) and stores them plus the corresponding data in an std::unordered_map. Afterwards it iterates again though the file system and looks up every hash in the map. It turns out that the most time consuming part is the hashing process. On my notebook it looks like this: # ./hash_dic_test / generate hashes duration: 0:6:986 --------------------------------- build hash table duration: 0:7:544 --------------------------------- search in hash table cant find hash! /var/log/journal/ad1d17f14aee4b34a7e6c6a3689ac394/system.journal|655806|1413144061|fdf7c30d-838c-4e16-af37-2a345650590a duration: 0:7:563 --------------------------------- map entries: 495.425 map size: ~99,085MB You can see that files that have been changed since the hash table has been created, are not found in the table. About the performance, it takes about 7s to iterate though the file system and to calculate the hashes. In the second stage the same process is done again but the hashes and the related date is stored in the map. In the last stage the hashes are generated again and are looked up in the map. In reality the hash must only be calculated once to perform the two different tasks (write the hash to the new table and look-up in the table of reference). Both tasks would need about one second in total (for 500000 files). You can try it on your own system as well. Du you have any concerns regarding the performance? How flexible is you archive, is it possible to get just some space (eg. 100MB) to store any sort of binary data? Are there any limitations? I'm still working on a method to store the dictionary inside another file (the dar archive). I'm looking at Berkeley db, it could also replace the whole map structure. It is also possible to just serialize the map (eg with boost), but this is not very compatible. Regards, Tobias Am Dienstag, den 07.10.2014, 21:26 +0200 schrieb Denis Corbin: > Le 07/10/2014 17:41, Tobias Specht wrote: > > Hi Denis, > > Hi Tobias, > > > > > maybe I was not that exactly about what I want to hash and how the > > dictionary is organized: * I don't want to hash the content of the > > file > This I understood, > > > * the dictionary is not organized hierarchical as your catalogue > > is > this is didn't but OK, that does not change much the picture and makes > sens to avoid exposing the directory tree structure. > > > * when I'm talking about filename I mean path + file name > OK, > > > * the dictionary does not replace the catalogue, it is just an > > extra option > I understood that the dictionary was stored in clear text beside the > catalogue which would stay encrypted. > > > > It should look like this: { H("/home/tobias/Documents/test.txt" + > > inodeID + mtime + UUID + salt) : [userID, groupID, perm, file_size, > > is_dir, type, flags, ctime] , > > > > H("/home/tobias/Pictures/foo.jpg" + inodeID + mtime + UUID + salt) > > : [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , > > ... } (H() is a cryptographic hash function like sha256) > > > > respectively: { > > b2144d23ebc9a7f2af44e215b00dce5025bdc227346c6459b989ef8d203f3402 : > > [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , > > > > 0df9ba289c76d5bb1761a2764593bfe97d64f4c944ecfa08d6f7a16721b5f317 : > > [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , } > > > > > > In this scenario the only possibility for a collision to occur is > > inside the hash function, which is very unlikely to happen: > > http://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-possibility-of-sha-collisions-in-practice/4014407#4014407 > > > > > => In my opinion the possibility of a hash collision can be ignored. > > I admit the probability is very low, but this has to be documented at > least for the user to know the risk, as low at it can be. > > > > >> In fact, adding system/hardware ID in the hash forbids the > >> possibility to restore the whole data (most probably on a new > >> filesystem, due to a crash for example), and keep using the > >> latest backup of reference as reference for the next incremental > >> backup. > > Yes, that's right. But this is not only because of the uuid it's > > also because I want to use the inode number, which will be > > different after the restore also. > Yes, that's correct. I just wonder why adding the inodeID and UUID? > Would just salt not be sufficient to randomize the data to hash? By > the way, I suspect there would be a different salt value per hash? > Would the salt for each entry be stored in clear beside the > corresponding hash? No offense, my cryptographic knowledge is quite > basic! :) > > > In this case the user has to enter the encryption password to use > > the encrypted catalogue as reference or a full backup will be > > created. I think this restriction is acceptable. > It is for me too. As you say, there is the catalogue for that situation. > > > > > Of course I can use the same password for all backups of one system > > and requesting the user only once to enter it (this can be done > > without modifying dar, just by using libdar) but that's not the > > point. > > > > I admit the dictionary is not that easy to implement and it will > > require changes on the archive format as well but I think it can be > > quite handy for a lot of users who want to encrypt there backups. > The archive format is flexible, so that's not a problem to add a new > fields. The point concerns more the algorithme of differential backup > (filtre.cpp: the filtre_sauvegarde() routine) It should be able to > handle hashes in place of filename while also performing file > comparison on filename (for normal differential backup). > > Another point to consider is the algorithm complexity (I mean the time > to execute the requested task). Actually when doing a differential > backup, each file from the filesystem under backup first search in the > reference catalogue, but only in the directory it is located in. Here, > due to the hash on the whole path+filename, each new file to consider > for backup has to be hashed and this hash has to be compared more > widely to the whole archive hash base. Of course having a sorted list > of hashes (like it is for filename in each directory) leads to a > faster search (binary search) but it remains that the execution time > will increase with number of files in the archive. I guess, this hash > lookup is not the biggest CPU consuming task in libdar (comparing with > data compression or encryption), but that's however a scalability issue. > > I think I now get the picture of your request/idea. This is a > reasonable compromise, while it is not a simple feature to > implement... :-/ > > I add it to the Feature Request list on sourceforge. > https://sourceforge.net/p/dar/feature-requests/173/ > > I can't promise I will have time to implement it for release 2.5.0, > the next major release I would like to finish developping this year > for a release first semester 2015. I'm taking more time than expected > testing the current feature (multi-threaded libdar), while performance > benefit is not much visible for now... well I have not yet tuned it > all, first have to make it work as expected. So I can't promise but I > will try to. > > > > > > Regards, Tobias > > > > > > Regards, > Denis. |
From: Denis C. <dar...@fr...> - 2014-10-07 19:26:47
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 07/10/2014 17:41, Tobias Specht wrote: > Hi Denis, Hi Tobias, > > maybe I was not that exactly about what I want to hash and how the > dictionary is organized: * I don't want to hash the content of the > file This I understood, > * the dictionary is not organized hierarchical as your catalogue > is this is didn't but OK, that does not change much the picture and makes sens to avoid exposing the directory tree structure. > * when I'm talking about filename I mean path + file name OK, > * the dictionary does not replace the catalogue, it is just an > extra option I understood that the dictionary was stored in clear text beside the catalogue which would stay encrypted. > It should look like this: { H("/home/tobias/Documents/test.txt" + > inodeID + mtime + UUID + salt) : [userID, groupID, perm, file_size, > is_dir, type, flags, ctime] , > > H("/home/tobias/Pictures/foo.jpg" + inodeID + mtime + UUID + salt) > : [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , > ... } (H() is a cryptographic hash function like sha256) > > respectively: { > b2144d23ebc9a7f2af44e215b00dce5025bdc227346c6459b989ef8d203f3402 : > [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , > > 0df9ba289c76d5bb1761a2764593bfe97d64f4c944ecfa08d6f7a16721b5f317 : > [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , } > > > In this scenario the only possibility for a collision to occur is > inside the hash function, which is very unlikely to happen: > http://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-possibility-of-sha-collisions-in-practice/4014407#4014407 > > => In my opinion the possibility of a hash collision can be ignored. I admit the probability is very low, but this has to be documented at least for the user to know the risk, as low at it can be. > >> In fact, adding system/hardware ID in the hash forbids the >> possibility to restore the whole data (most probably on a new >> filesystem, due to a crash for example), and keep using the >> latest backup of reference as reference for the next incremental >> backup. > Yes, that's right. But this is not only because of the uuid it's > also because I want to use the inode number, which will be > different after the restore also. Yes, that's correct. I just wonder why adding the inodeID and UUID? Would just salt not be sufficient to randomize the data to hash? By the way, I suspect there would be a different salt value per hash? Would the salt for each entry be stored in clear beside the corresponding hash? No offense, my cryptographic knowledge is quite basic! :) > In this case the user has to enter the encryption password to use > the encrypted catalogue as reference or a full backup will be > created. I think this restriction is acceptable. It is for me too. As you say, there is the catalogue for that situation. > > Of course I can use the same password for all backups of one system > and requesting the user only once to enter it (this can be done > without modifying dar, just by using libdar) but that's not the > point. > > I admit the dictionary is not that easy to implement and it will > require changes on the archive format as well but I think it can be > quite handy for a lot of users who want to encrypt there backups. The archive format is flexible, so that's not a problem to add a new fields. The point concerns more the algorithme of differential backup (filtre.cpp: the filtre_sauvegarde() routine) It should be able to handle hashes in place of filename while also performing file comparison on filename (for normal differential backup). Another point to consider is the algorithm complexity (I mean the time to execute the requested task). Actually when doing a differential backup, each file from the filesystem under backup first search in the reference catalogue, but only in the directory it is located in. Here, due to the hash on the whole path+filename, each new file to consider for backup has to be hashed and this hash has to be compared more widely to the whole archive hash base. Of course having a sorted list of hashes (like it is for filename in each directory) leads to a faster search (binary search) but it remains that the execution time will increase with number of files in the archive. I guess, this hash lookup is not the biggest CPU consuming task in libdar (comparing with data compression or encryption), but that's however a scalability issue. I think I now get the picture of your request/idea. This is a reasonable compromise, while it is not a simple feature to implement... :-/ I add it to the Feature Request list on sourceforge. https://sourceforge.net/p/dar/feature-requests/173/ I can't promise I will have time to implement it for release 2.5.0, the next major release I would like to finish developping this year for a release first semester 2015. I'm taking more time than expected testing the current feature (multi-threaded libdar), while performance benefit is not much visible for now... well I have not yet tuned it all, first have to make it work as expected. So I can't promise but I will try to. > > Regards, Tobias > > Regards, Denis. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIVAwUBVDQ+bggxsL0D2LGCAQJCdw/9FTzQQ/Z4sTqXThPo7Uy6V4GH8xnK/fi+ wcE3hCCU66HDFXjFmPoYYTNyxyIfPXmWHu8/yuLJJ63yI0S30KQcmu650uhJyx66 ql1DLTYLvbnfrF/U430fqCWuQvbYTOr5eyC6wr//VoG/SrwM6LoV3afGgV8NXOwZ 0xpfpEAU0F41s43srxbPTFxH+sSL0AicGUWeLzbw3qTB7o7URho8cjPDXjf1EAZ3 r3vP3s5//nmAsggXSkf1YczepZ65fY8oZsWSRNUb8q78fpR19hQuFAUa1+8b1W/I hFVUSJdQ/c5BF3pJfi/OBFLfA/NmlDL3XnzQ/2XSK52xGEFMSNZ/ix5gSDZHHKW4 4XqYI9fKAZWDwG6XLOKoGb3POhyrluaBEgHCrzFTgczm0cFvFhmhrI6i7PG94055 fGm9fGRBBFW1LV6ogtt5wkLZ2HP34MoQ3g1ErumTGkmC4gnR5JDZZPA5gcOke4c8 UxewRvkdLaZoAih78hHkERf5SEsD64HZ974FYGijnqss0qB4aiqffovueFLZ33vy EowQ88zlYMmDBXnDMB+6p92e2N60zhwF4mSbIFxDq33CfOb3H/ELNILX1gooruEd mMUtjLejzLb8jV39CpcSdZ+Op5b6lZcAB4ADAcJImi5vWjmARN1LJwDMjB/mU24p o2vsGfRt8Rc= =9o5O -----END PGP SIGNATURE----- |
From: Tobias S. <spe...@gm...> - 2014-10-07 15:41:23
|
Hi Denis, maybe I was not that exactly about what I want to hash and how the dictionary is organized: * I don't want to hash the content of the file * the dictionary is not organized hierarchical as your catalogue is * when I'm talking about filename I mean path + file name * the dictionary does not replace the catalogue, it is just an extra option It should look like this: { H("/home/tobias/Documents/test.txt" + inodeID + mtime + UUID + salt) : [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , H("/home/tobias/Pictures/foo.jpg" + inodeID + mtime + UUID + salt) : [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , ... } (H() is a cryptographic hash function like sha256) respectively: { b2144d23ebc9a7f2af44e215b00dce5025bdc227346c6459b989ef8d203f3402 : [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , 0df9ba289c76d5bb1761a2764593bfe97d64f4c944ecfa08d6f7a16721b5f317 : [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , } In this scenario the only possibility for a collision to occur is inside the hash function, which is very unlikely to happen: http://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-possibility-of-sha-collisions-in-practice/4014407#4014407 => In my opinion the possibility of a hash collision can be ignored. > In fact, adding system/hardware ID in the hash forbids the possibility > to restore the whole data (most probably on a new filesystem, due to a > crash for example), and keep using the latest backup of reference as > reference for the next incremental backup. Yes, that's right. But this is not only because of the uuid it's also because I want to use the inode number, which will be different after the restore also. In this case the user has to enter the encryption password to use the encrypted catalogue as reference or a full backup will be created. I think this restriction is acceptable. Of course I can use the same password for all backups of one system and requesting the user only once to enter it (this can be done without modifying dar, just by using libdar) but that's not the point. I admit the dictionary is not that easy to implement and it will require changes on the archive format as well but I think it can be quite handy for a lot of users who want to encrypt there backups. Regards, Tobias Am Sonntag, den 05.10.2014, 12:38 +0200 schrieb Denis Corbin: > On 01/10/2014 18:04, Tobias Specht wrote: > > Hi Denis, > > Hi Tobias, > > > > > I like dar with its philosophy and I want to create a program > > using libdar to implement some kind of intelligence. Of course > > there will be some options but it should be enough to just define a > > "Backup Drive" and backups will be created automatically every day > > the user powers on the computer. And encryption should be at least > > a strongly recommended option. > > > > This leads me to the catalogue problem. I agree with you that dar > > does not need that kind of intelligence I'm planing for my backup > > tool, but as the catalogue and the process of creating a > > referential backup is a elementary feature of dar, I think this > > problem could be better solved within dar. > > > > In the catalogue there is stored: * inodeID * filename (with its > > path) * file permissions * userID * groupID * file size * last > > modification date (mtime) * last change date (ctime) * if the file > > is a directory (is_dir) * if the file has children or is an empty > > dir * file type * flag about saved data / saved EA / compression > > used (correct me if I'm wrong) > > more or less yes, but that's a matter of details, > > > I think the most private information that has to be protected is > > the filename. > > A agree with that. > > > My idea was to create a second "hashed" catalogue which contains > > only the necessary information to create a referential backup out > > of it. It is structured like a dictionary with a hash representing > > the filename on the one side and with some information about the > > file on the other side. This dictionary can be stored outside the > > encrypted area of the archive because it doesn't contain any > > private information. Yes, it is more data to be stored and in > > general it contains only redundancy information, but with this we > > can create a referential backup from an encrypted archive without > > entering the encryption password which will lead to more usability. > > (And du you know any backup tool providing such a feature?) > > > > The first idea was just to hash the filename: { H(filename) : > > [inodeID, userID, groupID, perm, file_size, ctime, mtime, is_dir, > > type, flags] , ... } This is quite simple to implement, but it's > > not very resistant against brute-force attacks. > > For this first idea, there is already a point to consider put aside > the brute-force attack. If two different files in the same directory > with different filenames provide the same hash, there is a conflict. > While this should not occur very often but it is not impossible. > > Same problem during the differential backup process, dar checks > whether each file found on the filesystem does not already exist in > the reference catalogue. Here, in order to compare, dar has to create > a hash for each filename read from the filesystem and compare that > with the list of hash available in reference catalogue. But a false > match may occur, if a new files have the same hash as an old one. Most > of the time, dar will save that new file as expected if mtime or other > attribute changed comparing with wrong reference, but in some rare > cases it may fail to be saved that new file assuming it has not > changed comparing it to a wrong reference. > > So, even if chance are little that this situation occur, they is not > impossible. How to cope with that? Have we to inform the user that > there is a risk that the backup is not perfect, but not to worry this > is occurs in very rare situation? Would you find that acceptable as > user? :) > > > So I thought about to slip some information, that is available on > > the point of creating the referential backup and also when creating > > the new backup but isn't known by an attacker who has only access > > to the archive, in the calculation of the hash value: { H(filename > > + inodeID + mtime) : [userID, groupID, perm, file_size, is_dir, > > type, flags, ctime] , ... } I had a look at your source code > > (filtre.cpp/filtre_sauvegarde) and as far as I have understand, you > > first try to find the file based on it's path in the ref backup. > > When there is a match you perform some optional security checks and > > afterwards you decide what information to store in the backup: * > > remove_ea * saving_inode * saving_ea * saving_fsa If there is no > > match you have to store the whole file. > > > > When using the hash value the first part leads to a slightly > > different result as it doesn't consists only of the filename but > > also of the inodeID and mtime. But this shouldn't be a problem as > > the inodeID changes only when mtime changes too. > > right, that's better, only comparing the hash will let dar know > whether a file has to be saved again or not (put aside the hash > conflict mentioned above). > > > And in this case the whole file would be saved anyway. > > Right. > > > As the security check is implemented now it should also be fine > > with the hash, because it relies on having the same mtime in both > > archives. The evaluation of what action to perform when mtime > > hasn't changed should be applicable with the information stored in > > the dictionary. > > OK, this let dar see if only EA/FSA have changed and resave this part > only if necessary. > > > > > In addition we should add some sort of UUID which is connected to > > the system in such a way that it doesn't change on normal system > > operation. I thought about the partition UUID but this is not > > always that simple > > In fact, adding system/hardware ID in the hash forbids the possibility > to restore the whole data (most probably on a new filesystem, due to a > crash for example), and keep using the latest backup of reference as > reference for the next incremental backup. > > > when we think about LVM and btrfs, but maybe there is something > > else we can use. To break rainbow-table attacks we should also add > > a random salt per archive: { H(filename + inodeID + mtime + UUID + > > salt) : [userID, groupID, perm, file_size, is_dir, type, flags, > > ctime] , ... } Originally I also wanted to slip the file_size in > > the hash value, but this conflicts with the security check and the > > sparse_file_detection. > > > > The dictionary could be saved for example in a Berkeley DB which > > could be stored some where in the archive. As hash function I would > > suggest Keccak with 512 bit and 100 rounds. > > > > What du you think about the idea of having a second > > (hashed)catalogue? > > That's an interesting approach. however it is not that simple to > implement. However there is the point about hash collision to address. > > I thought about another way, to do encrypted differential backups that > has the same footprint as doing a full backup for the user point of > view: using the same key (symmetrical or asymmetrical) for the archive > of reference and the new differential backup, without having dar > asking twice for the password as it does for full backup. > > Given the encryption key, dar tries to open the encrypted isolated > catalogue, if it succeeds, it assumes the user gave the key without > typo error and use that key to encrypt the new differential backup. > > I guess that when you use symmetrical key, very few use different keys > for each new differential archive, right? I also guess, when using > asymmetrical encryption, this is always the same public/private key > pair that is used, thus the same passphrase is requested to open > private key (enciphering and signature). > > Whould this address your need? This is much more easy to implement to > my point of view. > > > > > Regards, Tobias > > > > Regards, > Denis. |
From: Denis C. <dar...@fr...> - 2014-10-05 10:38:17
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/10/2014 18:04, Tobias Specht wrote: > Hi Denis, Hi Tobias, > > I like dar with its philosophy and I want to create a program > using libdar to implement some kind of intelligence. Of course > there will be some options but it should be enough to just define a > "Backup Drive" and backups will be created automatically every day > the user powers on the computer. And encryption should be at least > a strongly recommended option. > > This leads me to the catalogue problem. I agree with you that dar > does not need that kind of intelligence I'm planing for my backup > tool, but as the catalogue and the process of creating a > referential backup is a elementary feature of dar, I think this > problem could be better solved within dar. > > In the catalogue there is stored: * inodeID * filename (with its > path) * file permissions * userID * groupID * file size * last > modification date (mtime) * last change date (ctime) * if the file > is a directory (is_dir) * if the file has children or is an empty > dir * file type * flag about saved data / saved EA / compression > used (correct me if I'm wrong) more or less yes, but that's a matter of details, > I think the most private information that has to be protected is > the filename. A agree with that. > My idea was to create a second "hashed" catalogue which contains > only the necessary information to create a referential backup out > of it. It is structured like a dictionary with a hash representing > the filename on the one side and with some information about the > file on the other side. This dictionary can be stored outside the > encrypted area of the archive because it doesn't contain any > private information. Yes, it is more data to be stored and in > general it contains only redundancy information, but with this we > can create a referential backup from an encrypted archive without > entering the encryption password which will lead to more usability. > (And du you know any backup tool providing such a feature?) > > The first idea was just to hash the filename: { H(filename) : > [inodeID, userID, groupID, perm, file_size, ctime, mtime, is_dir, > type, flags] , ... } This is quite simple to implement, but it's > not very resistant against brute-force attacks. For this first idea, there is already a point to consider put aside the brute-force attack. If two different files in the same directory with different filenames provide the same hash, there is a conflict. While this should not occur very often but it is not impossible. Same problem during the differential backup process, dar checks whether each file found on the filesystem does not already exist in the reference catalogue. Here, in order to compare, dar has to create a hash for each filename read from the filesystem and compare that with the list of hash available in reference catalogue. But a false match may occur, if a new files have the same hash as an old one. Most of the time, dar will save that new file as expected if mtime or other attribute changed comparing with wrong reference, but in some rare cases it may fail to be saved that new file assuming it has not changed comparing it to a wrong reference. So, even if chance are little that this situation occur, they is not impossible. How to cope with that? Have we to inform the user that there is a risk that the backup is not perfect, but not to worry this is occurs in very rare situation? Would you find that acceptable as user? :) > So I thought about to slip some information, that is available on > the point of creating the referential backup and also when creating > the new backup but isn't known by an attacker who has only access > to the archive, in the calculation of the hash value: { H(filename > + inodeID + mtime) : [userID, groupID, perm, file_size, is_dir, > type, flags, ctime] , ... } I had a look at your source code > (filtre.cpp/filtre_sauvegarde) and as far as I have understand, you > first try to find the file based on it's path in the ref backup. > When there is a match you perform some optional security checks and > afterwards you decide what information to store in the backup: * > remove_ea * saving_inode * saving_ea * saving_fsa If there is no > match you have to store the whole file. > > When using the hash value the first part leads to a slightly > different result as it doesn't consists only of the filename but > also of the inodeID and mtime. But this shouldn't be a problem as > the inodeID changes only when mtime changes too. right, that's better, only comparing the hash will let dar know whether a file has to be saved again or not (put aside the hash conflict mentioned above). > And in this case the whole file would be saved anyway. Right. > As the security check is implemented now it should also be fine > with the hash, because it relies on having the same mtime in both > archives. The evaluation of what action to perform when mtime > hasn't changed should be applicable with the information stored in > the dictionary. OK, this let dar see if only EA/FSA have changed and resave this part only if necessary. > > In addition we should add some sort of UUID which is connected to > the system in such a way that it doesn't change on normal system > operation. I thought about the partition UUID but this is not > always that simple In fact, adding system/hardware ID in the hash forbids the possibility to restore the whole data (most probably on a new filesystem, due to a crash for example), and keep using the latest backup of reference as reference for the next incremental backup. > when we think about LVM and btrfs, but maybe there is something > else we can use. To break rainbow-table attacks we should also add > a random salt per archive: { H(filename + inodeID + mtime + UUID + > salt) : [userID, groupID, perm, file_size, is_dir, type, flags, > ctime] , ... } Originally I also wanted to slip the file_size in > the hash value, but this conflicts with the security check and the > sparse_file_detection. > > The dictionary could be saved for example in a Berkeley DB which > could be stored some where in the archive. As hash function I would > suggest Keccak with 512 bit and 100 rounds. > > What du you think about the idea of having a second > (hashed)catalogue? That's an interesting approach. however it is not that simple to implement. However there is the point about hash collision to address. I thought about another way, to do encrypted differential backups that has the same footprint as doing a full backup for the user point of view: using the same key (symmetrical or asymmetrical) for the archive of reference and the new differential backup, without having dar asking twice for the password as it does for full backup. Given the encryption key, dar tries to open the encrypted isolated catalogue, if it succeeds, it assumes the user gave the key without typo error and use that key to encrypt the new differential backup. I guess that when you use symmetrical key, very few use different keys for each new differential archive, right? I also guess, when using asymmetrical encryption, this is always the same public/private key pair that is used, thus the same passphrase is requested to open private key (enciphering and signature). Whould this address your need? This is much more easy to implement to my point of view. > > Regards, Tobias > Regards, Denis. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIVAwUBVDEfkAgxsL0D2LGCAQJQIhAAp5cRrn5LZ+usAarnAuxtJNjuAunGFLGD stbQIWXvlTOFfwR0Y6Ef5P6YHNNdLYamXv7UVZFaDugBgY0X6ckIVKxVepySWJcZ JOx0un3/xCy0LysPjZ5g4WR9Tyfs72sjTHehGOgEZwfnFX0bxWY2GYrsb8AIpzjk +AcullD99R41NOTMA1ByBmr1RY7/5ScFMubg+pu/kT1RnGdE8w/Fd0XW+fgnZnEl NtqiQ7CRYH78EfR1RZFrTw/PDCxajYNUgeDNHR93/egE8jiPNxEoxBjSvfgq6eLt PTINJzkHUgFJMOBDDXehaII59V+L6rXziPm0cx3511Lo3ZvoemF4b9Pj+p1R8zjg 3RS6jbcWPNiA+c5wFfB8Hm5KNHwplXWAOB8E+kNoJ2SFOZEbxm3obSRmA/hUrgBx PSuViCOjnmxBJKb00d2bCf791AqUfmg7zAtFfNe7Wea3Sk6QoaAvliRtrepr6aSg 2VU1oG5kenNL2QVQ2QVVRV8g5Kb+3t4Lap734Q80R/fr5CtKPTIHBCkngYz+Qp9d W3ukegYd28fAWwqxS21ygLO6Z/YoQJ1CZIn2HwecxfTbAdzUCsIl5aFmmK5L+cPN a38uUhk6A9UtOJa7A3bX260/Ebkj+2hCv7WV1QAbBcXocneg7T03RDVMLoSZFNJv zTLEfVP7KY4= =c67x -----END PGP SIGNATURE----- |
From: Tobias S. <spe...@gm...> - 2014-10-01 16:04:45
|
Hi Denis, I like dar with its philosophy and I want to create a program using libdar to implement some kind of intelligence. Of course there will be some options but it should be enough to just define a "Backup Drive" and backups will be created automatically every day the user powers on the computer. And encryption should be at least a strongly recommended option. This leads me to the catalogue problem. I agree with you that dar does not need that kind of intelligence I'm planing for my backup tool, but as the catalogue and the process of creating a referential backup is a elementary feature of dar, I think this problem could be better solved within dar. In the catalogue there is stored: * inodeID * filename (with its path) * file permissions * userID * groupID * file size * last modification date (mtime) * last change date (ctime) * if the file is a directory (is_dir) * if the file has children or is an empty dir * file type * flag about saved data / saved EA / compression used (correct me if I'm wrong) I think the most private information that has to be protected is the filename. My idea was to create a second "hashed" catalogue which contains only the necessary information to create a referential backup out of it. It is structured like a dictionary with a hash representing the filename on the one side and with some information about the file on the other side. This dictionary can be stored outside the encrypted area of the archive because it doesn't contain any private information. Yes, it is more data to be stored and in general it contains only redundancy information, but with this we can create a referential backup from an encrypted archive without entering the encryption password which will lead to more usability. (And du you know any backup tool providing such a feature?) The first idea was just to hash the filename: { H(filename) : [inodeID, userID, groupID, perm, file_size, ctime, mtime, is_dir, type, flags] , ... } This is quite simple to implement, but it's not very resistant against brute-force attacks. So I thought about to slip some information, that is available on the point of creating the referential backup and also when creating the new backup but isn't known by an attacker who has only access to the archive, in the calculation of the hash value: { H(filename + inodeID + mtime) : [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , ... } I had a look at your source code (filtre.cpp/filtre_sauvegarde) and as far as I have understand, you first try to find the file based on it's path in the ref backup. When there is a match you perform some optional security checks and afterwards you decide what information to store in the backup: * remove_ea * saving_inode * saving_ea * saving_fsa If there is no match you have to store the whole file. When using the hash value the first part leads to a slightly different result as it doesn't consists only of the filename but also of the inodeID and mtime. But this shouldn't be a problem as the inodeID changes only when mtime changes too. And in this case the whole file would be saved anyway. As the security check is implemented now it should also be fine with the hash, because it relies on having the same mtime in both archives. The evaluation of what action to perform when mtime hasn't changed should be applicable with the information stored in the dictionary. In addition we should add some sort of UUID which is connected to the system in such a way that it doesn't change on normal system operation. I thought about the partition UUID but this is not always that simple when we think about LVM and btrfs, but maybe there is something else we can use. To break rainbow-table attacks we should also add a random salt per archive: { H(filename + inodeID + mtime + UUID + salt) : [userID, groupID, perm, file_size, is_dir, type, flags, ctime] , ... } Originally I also wanted to slip the file_size in the hash value, but this conflicts with the security check and the sparse_file_detection. The dictionary could be saved for example in a Berkeley DB which could be stored some where in the archive. As hash function I would suggest Keccak with 512 bit and 100 rounds. What du you think about the idea of having a second (hashed)catalogue? Regards, Tobias Am Sonntag, den 28.09.2014, 10:36 +0200 schrieb Denis Corbin: > On 27/09/2014 18:13, Tobias Specht wrote: > > Hi Denis, > > Hi Tobias, > > > > > now I see the problem with signing the symmetric key and your > > solution to sign also the catalogue sounds reasonable. > > > > Are you interested in discussing the catalogue topic a little bit > > more? > Of course! > > > I don't feel comfortable with storing passwords in clear text even > > on my own system and to store the catalogue unencrypted is also not > > very consistent in terms of privacy. In my opinion both options > > prevent users actually from encryption there backups. > I guess the main reason is not more complexity of encryption than > ignorance and blindness of marketing/profiling possible abuses. When > you even don't ask yourself whether you can trust the owners of the > remote "cloud" storage you send your data to, for they do not read or > analyze the content you've sent, you don't even thing about encrypting > your data before sending it out... > > > > To make encryption more popular there should be no disadvantages > > when using it! > I agree with that point, the less a task requires effort, the more > people will probably do it. However, sometimes doing something safely > will always cost more than doing it another way. In that situation > educating users is at rescue. Example, many people today use seat belt > during their travels by car, while it is just easier not to use it. :) > > > (I mean the problem with doing referential backups when using > > encryption at the same time.) The effort for the user should be as > > small as possible. > Right, for now as small as possible is issuing a password. Without any > key (of any sort) to be provided, how can you see a mechanism that > could differentiate a user that has the right to access the data from > another one that has not that right? How could it be done > better/simpler? I guess you have suggestions about the differential > backup context? :) > > > As you know it isn't even simple to convince a user to make backups > > at all. > ... education... unless automatic backup is performed by the system. > But if user are ignorant about the existence of such automatic backup > mechanism, how would they think they can rely on it to restore their > system when a single file got lost by mistake or a whole system has > been destroyed (crash, stealing, disaster,...): The system has to be > even more "smart" (by opposition to the user). Usually having systems > getting "smarter" removes freedom to the users... so we must also pay > attention not to remove freedom to clever users, those that either are > educated about a subject (here backup) and/or that don't completely > (want to) rely on a "smart" system to provide the service they need. > > > But when it means more effort to use the encryption option, it is > > very unlikely that he will use it. What do you think? > Nothing more than what I have answered above. > > The new public key encryption is a partial solution to that point, > with the additional feature of encryption algorithm in archive > headers/trailer: > * Encrypting an archive is as simple as listing the email recipients > we want to encrypt the archive with corresponding public keys. > * Deciphering an archive is as simple as a clear archive (as soon as > you have an adequate private key), no -K option to give... > > But, yes, this does not answer the differential backup you underline, > nor it address the backup/restoration to recover a disaster for > example: you need the private key to decipher the archive... > > Another point to mention and take into consideration in this > discussion: dar/libdar is quite low level tool not targeted to people > that need to relying on a smart system... However, > it can serve other tool that provide this high level intelligence of > what user need without user expressing any request... > > The "philosophy" of dar/libdar is a tool with logical default values > and systematic explicit options, no guessing, no "intelligence" in > order to preserve user freedom to use or activate the features they want. > > By opposition, maybe you have been using MS Word. What annoying it is > to have it capitalizing a word automatically because it "thinks" that > it has to be capitalized... But if it was not the case, it gives you > additional work to correct what has been changed without you having > been asked, and lead you to become more vigilant for that it does not > modify what follows an 'E' to an exponent and so on... My point of > view with such "intelligent" tool is that I am not that stupid, I know > how to type an uppercase or an lowercase... let me assume my mistakes > and keep my freedom of writing the way I want. > > In short, too much or badly designed "intelligence" in software may > become more painful than helpful. > > As you see, I just want to avoid that with dar/libdar, tools relying > on libdar are not my concern, every need has to be satisfied but at > different levels. > > In that context, yes, I am open to consider anything that could > simplify the use of encryption within dar/libdar, or any mechanism > that could help overlying applications on dar/libdar to provide that > smart service to users. :) > > > I would be glad about hearing other comments on this topic. > > > > Regards, Tobias > > > > Best Regards, > Denis. |
From: Denis C. <dar...@fr...> - 2014-09-28 08:36:25
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 27/09/2014 18:13, Tobias Specht wrote: > Hi Denis, Hi Tobias, > > now I see the problem with signing the symmetric key and your > solution to sign also the catalogue sounds reasonable. > > Are you interested in discussing the catalogue topic a little bit > more? Of course! > I don't feel comfortable with storing passwords in clear text even > on my own system and to store the catalogue unencrypted is also not > very consistent in terms of privacy. In my opinion both options > prevent users actually from encryption there backups. I guess the main reason is not more complexity of encryption than ignorance and blindness of marketing/profiling possible abuses. When you even don't ask yourself whether you can trust the owners of the remote "cloud" storage you send your data to, for they do not read or analyze the content you've sent, you don't even thing about encrypting your data before sending it out... > To make encryption more popular there should be no disadvantages > when using it! I agree with that point, the less a task requires effort, the more people will probably do it. However, sometimes doing something safely will always cost more than doing it another way. In that situation educating users is at rescue. Example, many people today use seat belt during their travels by car, while it is just easier not to use it. :) > (I mean the problem with doing referential backups when using > encryption at the same time.) The effort for the user should be as > small as possible. Right, for now as small as possible is issuing a password. Without any key (of any sort) to be provided, how can you see a mechanism that could differentiate a user that has the right to access the data from another one that has not that right? How could it be done better/simpler? I guess you have suggestions about the differential backup context? :) > As you know it isn't even simple to convince a user to make backups > at all. ... education... unless automatic backup is performed by the system. But if user are ignorant about the existence of such automatic backup mechanism, how would they think they can rely on it to restore their system when a single file got lost by mistake or a whole system has been destroyed (crash, stealing, disaster,...): The system has to be even more "smart" (by opposition to the user). Usually having systems getting "smarter" removes freedom to the users... so we must also pay attention not to remove freedom to clever users, those that either are educated about a subject (here backup) and/or that don't completely (want to) rely on a "smart" system to provide the service they need. > But when it means more effort to use the encryption option, it is > very unlikely that he will use it. What do you think? Nothing more than what I have answered above. The new public key encryption is a partial solution to that point, with the additional feature of encryption algorithm in archive headers/trailer: * Encrypting an archive is as simple as listing the email recipients we want to encrypt the archive with corresponding public keys. * Deciphering an archive is as simple as a clear archive (as soon as you have an adequate private key), no -K option to give... But, yes, this does not answer the differential backup you underline, nor it address the backup/restoration to recover a disaster for example: you need the private key to decipher the archive... Another point to mention and take into consideration in this discussion: dar/libdar is quite low level tool not targeted to people that need to relying on a smart system... However, it can serve other tool that provide this high level intelligence of what user need without user expressing any request... The "philosophy" of dar/libdar is a tool with logical default values and systematic explicit options, no guessing, no "intelligence" in order to preserve user freedom to use or activate the features they want. By opposition, maybe you have been using MS Word. What annoying it is to have it capitalizing a word automatically because it "thinks" that it has to be capitalized... But if it was not the case, it gives you additional work to correct what has been changed without you having been asked, and lead you to become more vigilant for that it does not modify what follows an 'E' to an exponent and so on... My point of view with such "intelligent" tool is that I am not that stupid, I know how to type an uppercase or an lowercase... let me assume my mistakes and keep my freedom of writing the way I want. In short, too much or badly designed "intelligence" in software may become more painful than helpful. As you see, I just want to avoid that with dar/libdar, tools relying on libdar are not my concern, every need has to be satisfied but at different levels. In that context, yes, I am open to consider anything that could simplify the use of encryption within dar/libdar, or any mechanism that could help overlying applications on dar/libdar to provide that smart service to users. :) > I would be glad about hearing other comments on this topic. > > Regards, Tobias > Best Regards, Denis. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIVAwUBVCfIfwgxsL0D2LGCAQLs5w/7Br/J89pJjhP5gxfz9q/bKCyTvxS2bgnz kdtI+LLhmmQCjB6uhD/MVPIo1xO7NHTRA0oruo+K9uESfHyYxTkqmpTixASkOLAC NrV0qugHwXd9VAB/bQM3dDRy1eU2tWbwzS5ezvK4kzaicrD2MUvXI8lgnP7AP1l2 ZldoRs0B87doWJwyrFzzp5OFn2AWlhO1NX720D/j50x1SViQQkTVjGywggGADyLP 2wqCOgAup1VStswLMRSWAA6HEFTkQKG67b8AJ5BMfpbezwFqbkk530JtDGw3XfPL HmOvpfVgEnAkd06vK1RUL/93eH4cYhWOIBf36OnjA2RTLHTqIVG/quvDvggw5DPR qmANAOozwXTLFEbqJf2idY8QWoQKfahqpqOKNeFZXIFWMS0t/yi1xnG6cd9SkwDq 2iPANIITQLwjYvZ4PJXjr9/347yTk/qKVfdJxqEUuRwNBNt77N568+UkhDyw16Eh ra03BAD1j2cqYj8IjxIOjPg5ocD/HF+q4fiBc5SGQOOIE3iMe2B4loosQwbDHgrt bOtrGGfIEJ4hSpdgHiM1szj5mA9IhnBZErDFOEQANCCRhQCq8tk4izi239HDuTUR PeGlVcGUjXxUMJ133Qzhw5KJGYBYuPVsBbmPl3KVh7HskapErE+D3O1+9t5DE5N6 OW3uF59hgPI= =5IBE -----END PGP SIGNATURE----- |
From: Tobias S. <spe...@gm...> - 2014-09-27 16:13:22
|
Hi Denis, now I see the problem with signing the symmetric key and your solution to sign also the catalogue sounds reasonable. Are you interested in discussing the catalogue topic a little bit more? I don't feel comfortable with storing passwords in clear text even on my own system and to store the catalogue unencrypted is also not very consistent in terms of privacy. In my opinion both options prevent users actually from encryption there backups. To make encryption more popular there should be no disadvantages when using it! (I mean the problem with doing referential backups when using encryption at the same time.) The effort for the user should be as small as possible. As you know it isn't even simple to convince a user to make backups at all. But when it means more effort to use the encryption option, it is very unlikely that he will use it. What do you think? I would be glad about hearing other comments on this topic. Regards, Tobias Am Donnerstag, den 25.09.2014, 21:14 +0200 schrieb Denis Corbin: > On 24/09/2014 17:08, Tobias Specht wrote: > > Hi Denis, > > Hi Tobias, > > > > > sorry that I haven't checked your current master branch, you have > > done great work! > > No problem, the dev branch is not much visible... > > > > > Regarding your problem with the signature. Maybe we talk past each > > other but in general signing goes like this: A check-sum of a > > document is encrypting with the private key of the sender (not with > > the public key of the sender neither of the receiver) so that > > everyone else can verify the signature by calculating the check-sum > > at his own and compare it with the value from the signature as > > every one can decrypt it with the public key of the sender. > > yep, > > > In your case the document is the symmetric key and as check-sum > > you could calculate a hash value. When you encrypt the hash value > > with the private key of the sender (eg. the person who creates the > > backup) it is in my opinion ensured that no one else has created or > > modified this backup. > > In other word the "document" which is the symmetric key could be > copied as is in a new archive. Any recipient could also decrypt the > "document" and obtain the symmetric key, thus could use it to create > an new encrypted archive with that same key. This would lead any other > recipient see this faked archive as signed by the same person as the > original one, unless you apply signature on some other part of the > archive, like the internal catalogue. > > > > > The thing with the asymmetric encryption was just an idea I had a > > few days ago. And I think we have designed it for two different use > > cases. The general idea to use asymmetric encryption was in my case > > to create backups automatic without prompting the user to enter the > > encryption password. Not to encrypt them for different recipients. > > well you could also drop the password in a file that only the user > could read and feed that file to dar using -B option. No need of > symmetrical encryption to have encrypted archive without manual > interaction nor exposing the password to the command-line. > > > And this was also the background of the catalogue question. Of > > course I can create a differential backup based on an encrypted > > archive by providing the key for the archive of reference. But in > > this case user interaction is needed which I wanted to avoid. You > > understand? > > I better understand what you want to do, but why not using Dar > Command-line File (DCF file) you could use with -B option as just > exposed? (I do it myself for my own backups that get stored remotely. > moreover, as it is a password, which password I know, in case of > disaster, I do not need any private key to decrypt the archives. > > > > Yes, your right, an unencrypted catalogue does not ensure privacy. > > And for sure there is no easy solution for this problem. In my > > opinion the most sensitiv data in the catalogue are the names of > > files and folders. One quick approach to restore privacy in an > > unencrypted catalogue would be to store only hash values of this > > information. How ever you solve this problem, it would led to big > > changes in dar and maybe the field of application would me quite > > small. > That's right, > > So never mind. > > > > About the private key, I don't wanted to store it inside the > > archive. I just wanted to store the information which privat key to > > use for decryption. In your case this is done via the email address > > of the recipients and in my case it would be the finger print of > > the public key with which the symmetric key has been encrypted. > > OK I see, sorry for my misunderstanding! > > > > > One last thing I want to add regarding private keys in general. > > Because the whole security relies on this file it has to be kept > > secret and this is not done by storing it unencrypted on a hard > > drive (I'm sure you don't write your passwords in to a text file > > either). > > Well, yes I do :) but with adequate file permission and if I trust the > system engineer (Which I am at home). For work backup, of course I > type the password at each new backup... > > > When you don't want to have a password but want to ensure privacy > > you can move the "security factor" inside some kind of hardware > > token. But also in this case it is recommended to protect the > > access to the private key by a PIN. > > Yes, that's right > > > > > Regards, Tobias > > > > Cheers, > Denis. > > ------------------------------------------------------------------------------ > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer > http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk > _______________________________________________ > Dar-libdar_api mailing list > Dar...@li... > https://lists.sourceforge.net/lists/listinfo/dar-libdar_api |
From: Denis C. <dar...@fr...> - 2014-09-25 19:14:11
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/09/2014 17:08, Tobias Specht wrote: > Hi Denis, Hi Tobias, > > sorry that I haven't checked your current master branch, you have > done great work! No problem, the dev branch is not much visible... > > Regarding your problem with the signature. Maybe we talk past each > other but in general signing goes like this: A check-sum of a > document is encrypting with the private key of the sender (not with > the public key of the sender neither of the receiver) so that > everyone else can verify the signature by calculating the check-sum > at his own and compare it with the value from the signature as > every one can decrypt it with the public key of the sender. yep, > In your case the document is the symmetric key and as check-sum > you could calculate a hash value. When you encrypt the hash value > with the private key of the sender (eg. the person who creates the > backup) it is in my opinion ensured that no one else has created or > modified this backup. In other word the "document" which is the symmetric key could be copied as is in a new archive. Any recipient could also decrypt the "document" and obtain the symmetric key, thus could use it to create an new encrypted archive with that same key. This would lead any other recipient see this faked archive as signed by the same person as the original one, unless you apply signature on some other part of the archive, like the internal catalogue. > > The thing with the asymmetric encryption was just an idea I had a > few days ago. And I think we have designed it for two different use > cases. The general idea to use asymmetric encryption was in my case > to create backups automatic without prompting the user to enter the > encryption password. Not to encrypt them for different recipients. well you could also drop the password in a file that only the user could read and feed that file to dar using -B option. No need of symmetrical encryption to have encrypted archive without manual interaction nor exposing the password to the command-line. > And this was also the background of the catalogue question. Of > course I can create a differential backup based on an encrypted > archive by providing the key for the archive of reference. But in > this case user interaction is needed which I wanted to avoid. You > understand? I better understand what you want to do, but why not using Dar Command-line File (DCF file) you could use with -B option as just exposed? (I do it myself for my own backups that get stored remotely. moreover, as it is a password, which password I know, in case of disaster, I do not need any private key to decrypt the archives. > Yes, your right, an unencrypted catalogue does not ensure privacy. > And for sure there is no easy solution for this problem. In my > opinion the most sensitiv data in the catalogue are the names of > files and folders. One quick approach to restore privacy in an > unencrypted catalogue would be to store only hash values of this > information. How ever you solve this problem, it would led to big > changes in dar and maybe the field of application would me quite > small. That's right, > So never mind. > > About the private key, I don't wanted to store it inside the > archive. I just wanted to store the information which privat key to > use for decryption. In your case this is done via the email address > of the recipients and in my case it would be the finger print of > the public key with which the symmetric key has been encrypted. OK I see, sorry for my misunderstanding! > > One last thing I want to add regarding private keys in general. > Because the whole security relies on this file it has to be kept > secret and this is not done by storing it unencrypted on a hard > drive (I'm sure you don't write your passwords in to a text file > either). Well, yes I do :) but with adequate file permission and if I trust the system engineer (Which I am at home). For work backup, of course I type the password at each new backup... > When you don't want to have a password but want to ensure privacy > you can move the "security factor" inside some kind of hardware > token. But also in this case it is recommended to protect the > access to the private key by a PIN. Yes, that's right > > Regards, Tobias > Cheers, Denis. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIVAwUBVCRpeQgxsL0D2LGCAQJ74Q//eNICXjvH+a37WIt1sO1GTI/k5/OpFBB5 Kx8c709XZY0PLfy5fMhyW6eqmjDBwH+vdjEMlE7B67noEdWXDjmvJgH+4SbSj9CW 15psFyoCytxC4PuIaJ745V5FSx9Jy+F6MU5Zo9OPfpqnqo+e7ggvJxTgw4urdobT xguBDzvZBCdcLg4c66qgymf3mw1ju9CJZyDn5JGy77nteBGCyRRqGBkhwMVXgk6f x8nFywW46SjjLU6K9Wd7aUrZvQ6fme1hjjaa0euxlhdHpORdHJ0pld7BgpPNFe+o VTMbaUxrA1lfdqt+povHu/oznepDAmtZyCfdEdNskB6QvLahX+gC/ji8SzsXUUJs SC5Ez1EoKrwEdfZ069pSPPDS9B05l4ZUpMsFd6Q828XCgeFLZno4E3245cZRTaZO y/Isd+QUw/zw9kJ0UAzauOy/NPhT9yMt5jDSEcDRC3q90zdsMhgcO112XvNxgXcG wwqRXaYS1qDP2ZU2X971vTF8TWCPxJfkUAULeo4K2DqjjEGDZbwVkikAuU9EGBdr RQGN8mG3+OpIwuQTt0VkYI5Apu9vqFMVsKrjXdiJPqdjVnNXdjMwjcmYbU07gg4k 4+Z6Vwed1Cr6+1rNzkrjKN50AudDQQSYNiMLaQ8o8KFvYPVNmPeg7cL9IVIy3Ek0 3uTIyPNPgRk= =sPsZ -----END PGP SIGNATURE----- |
From: Tobias S. <spe...@gm...> - 2014-09-24 15:08:18
|
Hi Denis, sorry that I haven't checked your current master branch, you have done great work! Regarding your problem with the signature. Maybe we talk past each other but in general signing goes like this: A check-sum of a document is encrypting with the private key of the sender (not with the public key of the sender neither of the receiver) so that everyone else can verify the signature by calculating the check-sum at his own and compare it with the value from the signature as every one can decrypt it with the public key of the sender. In your case the document is the symmetric key and as check-sum you could calculate a hash value. When you encrypt the hash value with the private key of the sender (eg. the person who creates the backup) it is in my opinion ensured that no one else has created or modified this backup. The thing with the asymmetric encryption was just an idea I had a few days ago. And I think we have designed it for two different use cases. The general idea to use asymmetric encryption was in my case to create backups automatic without prompting the user to enter the encryption password. Not to encrypt them for different recipients. And this was also the background of the catalogue question. Of course I can create a differential backup based on an encrypted archive by providing the key for the archive of reference. But in this case user interaction is needed which I wanted to avoid. You understand? Yes, your right, an unencrypted catalogue does not ensure privacy. And for sure there is no easy solution for this problem. In my opinion the most sensitiv data in the catalogue are the names of files and folders. One quick approach to restore privacy in an unencrypted catalogue would be to store only hash values of this information. How ever you solve this problem, it would led to big changes in dar and maybe the field of application would me quite small. So never mind. About the private key, I don't wanted to store it inside the archive. I just wanted to store the information which privat key to use for decryption. In your case this is done via the email address of the recipients and in my case it would be the finger print of the public key with which the symmetric key has been encrypted. One last thing I want to add regarding private keys in general. Because the whole security relies on this file it has to be kept secret and this is not done by storing it unencrypted on a hard drive (I'm sure you don't write your passwords in to a text file either). When you don't want to have a password but want to ensure privacy you can move the "security factor" inside some kind of hardware token. But also in this case it is recommended to protect the access to the private key by a PIN. Regards, Tobias Am Mittwoch, den 24.09.2014, 14:32 +0200 schrieb Denis Corbin: > Le 24/09/2014 02:00, Tobias Specht wrote: > > Hi Denis, > > Hi Tobias, > > > > > As I'm working on a user friendly automated backup solution based > > on dar I have also done some considerations about encryption. For > > me encryption is a major topic to ensure privacy and to do so it > > must be easy to use for everyone, not only for IT-experts. > You are right... and libdar needs such user friendly interfaces. :) > > > > The first question: is it possible to get to know whether the > > archive is encrypted or not before I open it? > > Yes, but not at API level. The archive header and trailer contain a flag > that tells whether the archive has been encrypted. Starting with future > release 2.5.0 the encryption algorithm is also present in the > header/trailer, so user will only have to specify the password even if > the algo is not blowfish. > > > When I try to open it without a password I get the error message > > anyway. > > This is due to that flag in the archive header (used when > sequential-read is used) and archive trailer (used with direct access > (default) mode). > > > > > Now to some deeper considerations about encryption. When creating > > backups automated on a regular basis (eg. every day the computer is > > running) it is annoying to enter the password every time to encrypt > > the backup. When thinking about this problem I came up with the > > following idea: We could use in addition to the symmetric > > encryption system of the dar archive a asymmetric encryption schema > > like RSA. In asymmetric encryption a different key is used for > > decryption then for encryption. > > This is implemented in the current development code (what will be > release the 2.5.0)!!! Also archive signing is available!!! You can at > the same time encrypt for several recipients and sign with you own > public key. > > Note that the asymmetrical encryption is used only to cipher a randomly > chosen key used of symmetrical encryption. This is always a symmetrical > encryption algorithm that is used to encrypt the whole archive. Archive > signature is done on that randomly chosen key. There is thus a weakness > if the archive is signed and at the same time encrypted for *several* > recipients. > Each recipients can known the random key used to encrypt the archive, > and can thus reuse that key to create a completely different archive, > faking the signature of the original sender. > > To overcome that weakness in signature (not encryption), I have > planned to add a hash of the archive catalogue (which contains CRC of > each file's data and EA), and sign this hash too in addition to the > randomly chosen key for symmetrical encryption. If you seen other > point of find better idea, feel free to expose them here! :) > > > Once for every user/computer a pair of public and private key is > > generated. The public key can be stored in plain text because it is > > only used for encryption. But the private key must be encrypted so > > that the user has to enter a password to open the key and to > > decrypt data that has been encrypted with his public key. > > The asymmetrical encryption is implemented based on libgpgme, thus it > uses the same keyring as gpg. In particular, the public key > verification, expiration and management is done there. For private > key, dar does not have to store them just invoke their use specifying > the associated email address: this is the most simple option I found > to target a particular public or private key, instead of specifying, > full name, or Key ID which are either difficult to write down and > associate to a peer or may lead to ambiguity between different mailbox > of a given person (home/work, etc.). > > > When a backup is created a random password for the encryption of > > the archive can be generated. This password is only used for this > > one archive. And now comes the magic thing, the password can be > > encrypted with the public key which can be done without user > > interaction. > > yes, this is that way it is implemented in current development code. > > > To decrypt the password for the backup the private key is needed > > which itself is encrypted. > > To my point of view, there is a risk to transmit the private key, even > encrypted. It should not be necessary thanks precisely to the > public/private key separation. To encrypt I only need the public key > of my recipent(s). The archive can be sent without additional > information to the expected recipients that will be able to uncipher > the archive. > > > In this case the user has to enter his password. > > so we are back with passwords, not much different than symmetrical > encryption, no? > > > So a user only needs to remember the password of his private key. > > With this he can decrypt all the passwords for his backups. > > rather use dar with symmetrical encryption of your ~/.gnupg > configuration directory that contains all the private and public keys > you have! no? :) > > Having the private key not transmitted and kept in secured storage (at > the discretion of the user) let the user choose to have a passphrase or > not to have a passphrase on his private key, without compromising > security by requiring the private key to be exposed. > > > In a backup solution the user should of course not care about all > > the keys, he only needs to enter his password for the private key > > and every thing else is done by the program. > > Yes, you underline the problem for disaster recovery when using > public/private keys. An alternative to encrypting the private key > alongside the archive would be for the advised user to keep a copy of > his key a a secure location (trusted friend) or to store it remotely > (cloud) after having encrypted (using dar for example) it with a strong > but symmetrical algorithm using a password he would have to remind. > > To my point of view, symmetrical encryption is suitable for backup to > the cloud, where you can recover the whole data by the sole > requirement of a password. Asymmetrical encryption instead seems more > to exchange data between different persons either directly (email) or > throught repositories (cloud, ftp, ...). > > > I have implemented a proof of concept to create backups like this > > and it is working really well. What du you think about this idea? > > > > If you like it, I have a feature request for you. To do all the RSA > > stuff I need to store some extra data: 1) the encrypted password of > > the dar archive > this is done in current dev code, > > > 2) the ID of the public key with which the password has been > > encrypted. (this is useful because the public key is also part of > > the private key so it is easier to match the corresponding private > > key to the key-file of the backup) > that's not necessary, if you have the private key in your keyring the > fact to asymmetrically encrypt that password allows libgpgme able to > decrypt the password automatically (check current dev implementation, > man page is up to date on that point) [the current feature I'm working > on is multi-threading inside libdar which you cannot activate for now, > so current dev code is quite functional for the asymmetrical > encryption, ok not for production use, if you have any problem > compiling dar dev code and want to play with that asymmetrical > feature, tell me.] > > > This information can be saved in separate files but this lets to > > confusion and if any of this files get lost, it is impossible to > > recovery the password of the backup. > > as you say, same to me, I don't like external file, and as said above > this is not necessary. > > > To make things easier it would be nice to store them in the header > > of the dar archive. To be exact this would be 256 Byte for the > > encrypted password and a SHA256 value for the ID of the public key > > The random key has a variable length (+0 to +256 bytes) and at minimum > 512 bytes (user parametrable using the --key-length option), for API > check the src/libdar/archive_option.hpp file, all that is 'key' related. > > > As I have just noticed dar supports a user defined message to be > > written inside the archive header (--user-comment). > that's correct, > > > You have documented that this message is unencrypted > yes, see --user-comment option in man page. > > > even if the archive is encrypted. But I have found no way to read > > the message without providing the password of the archive. > Right, it is stored in clear but the API call that opens an archive > aborts if the archive contents --- which is encrypted ---- is not > readable (due to wrong key or data corruption). > > > Is it possible the read the user message without knowing the > > password of the archive? > yes, just open the archive in a text editor! :) You will find it near > the beginning of the archive (unless -at option has been used). > > > (On a long run it would be nice to have a separate option in the > > archive header for the RSA stuff.) > There is, check the --sign and --key options in man page. Note, that > you can read the man page getting the source code (branch master in > GIT), 'cd man' there 'man ./dar.1' the easiest way to get it without > installing the whole software. > > Same thing once compiled you can run dar without installing it by 'cd > src/dar_suite' and there running "./dar ..." > > > > > Finally there is another thing which I had to consider. When > > creating a backup with a reference to an encrypted archive, I need > > the password. To bypass this problem I have isolated the catalogue. > > I know it would increase the complexity of the archive header to > > store the catalogue either encrypted or unencrypted, but it would > > eliminate the need of the isolated catalogue file. I could imagine > > that is a quite common issue. > > Maybe I don't follow you. If I'm wrong don't hesitate to tell me: > You can do a differential/incremental backup with a encrypted archive as > reference using the -ref-key option to provide the necessary credential > to read that archive. The archive you create that way is either > unencrypted or encrypted if you use the -K option with same or other > credentials. (for merging operation that involve two source archive > there is also the -aux-key option). > > An isolated catalogue may be encrypted too (-ref-key option to read > the source archive and --key to encrypt the resulting archive). > > But, yes, dar_manager cannot still read encrypted archives, to manage an > encrypted archive with dar_manager you need an unciphered isolated > catalogue... this feature is still in the todo list, but I guess I will > not have time to implement it for the release 2.5.0 which I have > targetted for the end of year (end of new feature implementations, > then comes a testing/optimzation phase, for I hope a major relase in > 2015). > > The src/build/Changelog file gives you an overview of new features that > will be available with release 2.5.0. > > > Are you planing to add a possibility to store the catalogue > > unencrypted even if the archive data is encrypted? > no, this does not makes sens to me. If the archive has to be encrypted > this is to prevent anyone except the authorized person(s) to read its > content including its table of contents. > > > > > Regards, Tobias > > > > > Kind Regards, > Denis. |
From: Denis C. <dar...@fr...> - 2014-09-24 12:32:58
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 24/09/2014 02:00, Tobias Specht wrote: > Hi Denis, Hi Tobias, > > As I'm working on a user friendly automated backup solution based > on dar I have also done some considerations about encryption. For > me encryption is a major topic to ensure privacy and to do so it > must be easy to use for everyone, not only for IT-experts. You are right... and libdar needs such user friendly interfaces. :) > > The first question: is it possible to get to know whether the > archive is encrypted or not before I open it? Yes, but not at API level. The archive header and trailer contain a flag that tells whether the archive has been encrypted. Starting with future release 2.5.0 the encryption algorithm is also present in the header/trailer, so user will only have to specify the password even if the algo is not blowfish. > When I try to open it without a password I get the error message > anyway. This is due to that flag in the archive header (used when sequential-read is used) and archive trailer (used with direct access (default) mode). > > Now to some deeper considerations about encryption. When creating > backups automated on a regular basis (eg. every day the computer is > running) it is annoying to enter the password every time to encrypt > the backup. When thinking about this problem I came up with the > following idea: We could use in addition to the symmetric > encryption system of the dar archive a asymmetric encryption schema > like RSA. In asymmetric encryption a different key is used for > decryption then for encryption. This is implemented in the current development code (what will be release the 2.5.0)!!! Also archive signing is available!!! You can at the same time encrypt for several recipients and sign with you own public key. Note that the asymmetrical encryption is used only to cipher a randomly chosen key used of symmetrical encryption. This is always a symmetrical encryption algorithm that is used to encrypt the whole archive. Archive signature is done on that randomly chosen key. There is thus a weakness if the archive is signed and at the same time encrypted for *several* recipients. Each recipients can known the random key used to encrypt the archive, and can thus reuse that key to create a completely different archive, faking the signature of the original sender. To overcome that weakness in signature (not encryption), I have planned to add a hash of the archive catalogue (which contains CRC of each file's data and EA), and sign this hash too in addition to the randomly chosen key for symmetrical encryption. If you seen other point of find better idea, feel free to expose them here! :) > Once for every user/computer a pair of public and private key is > generated. The public key can be stored in plain text because it is > only used for encryption. But the private key must be encrypted so > that the user has to enter a password to open the key and to > decrypt data that has been encrypted with his public key. The asymmetrical encryption is implemented based on libgpgme, thus it uses the same keyring as gpg. In particular, the public key verification, expiration and management is done there. For private key, dar does not have to store them just invoke their use specifying the associated email address: this is the most simple option I found to target a particular public or private key, instead of specifying, full name, or Key ID which are either difficult to write down and associate to a peer or may lead to ambiguity between different mailbox of a given person (home/work, etc.). > When a backup is created a random password for the encryption of > the archive can be generated. This password is only used for this > one archive. And now comes the magic thing, the password can be > encrypted with the public key which can be done without user > interaction. yes, this is that way it is implemented in current development code. > To decrypt the password for the backup the private key is needed > which itself is encrypted. To my point of view, there is a risk to transmit the private key, even encrypted. It should not be necessary thanks precisely to the public/private key separation. To encrypt I only need the public key of my recipent(s). The archive can be sent without additional information to the expected recipients that will be able to uncipher the archive. > In this case the user has to enter his password. so we are back with passwords, not much different than symmetrical encryption, no? > So a user only needs to remember the password of his private key. > With this he can decrypt all the passwords for his backups. rather use dar with symmetrical encryption of your ~/.gnupg configuration directory that contains all the private and public keys you have! no? :) Having the private key not transmitted and kept in secured storage (at the discretion of the user) let the user choose to have a passphrase or not to have a passphrase on his private key, without compromising security by requiring the private key to be exposed. > In a backup solution the user should of course not care about all > the keys, he only needs to enter his password for the private key > and every thing else is done by the program. Yes, you underline the problem for disaster recovery when using public/private keys. An alternative to encrypting the private key alongside the archive would be for the advised user to keep a copy of his key a a secure location (trusted friend) or to store it remotely (cloud) after having encrypted (using dar for example) it with a strong but symmetrical algorithm using a password he would have to remind. To my point of view, symmetrical encryption is suitable for backup to the cloud, where you can recover the whole data by the sole requirement of a password. Asymmetrical encryption instead seems more to exchange data between different persons either directly (email) or throught repositories (cloud, ftp, ...). > I have implemented a proof of concept to create backups like this > and it is working really well. What du you think about this idea? > > If you like it, I have a feature request for you. To do all the RSA > stuff I need to store some extra data: 1) the encrypted password of > the dar archive this is done in current dev code, > 2) the ID of the public key with which the password has been > encrypted. (this is useful because the public key is also part of > the private key so it is easier to match the corresponding private > key to the key-file of the backup) that's not necessary, if you have the private key in your keyring the fact to asymmetrically encrypt that password allows libgpgme able to decrypt the password automatically (check current dev implementation, man page is up to date on that point) [the current feature I'm working on is multi-threading inside libdar which you cannot activate for now, so current dev code is quite functional for the asymmetrical encryption, ok not for production use, if you have any problem compiling dar dev code and want to play with that asymmetrical feature, tell me.] > This information can be saved in separate files but this lets to > confusion and if any of this files get lost, it is impossible to > recovery the password of the backup. as you say, same to me, I don't like external file, and as said above this is not necessary. > To make things easier it would be nice to store them in the header > of the dar archive. To be exact this would be 256 Byte for the > encrypted password and a SHA256 value for the ID of the public key The random key has a variable length (+0 to +256 bytes) and at minimum 512 bytes (user parametrable using the --key-length option), for API check the src/libdar/archive_option.hpp file, all that is 'key' related. > As I have just noticed dar supports a user defined message to be > written inside the archive header (--user-comment). that's correct, > You have documented that this message is unencrypted yes, see --user-comment option in man page. > even if the archive is encrypted. But I have found no way to read > the message without providing the password of the archive. Right, it is stored in clear but the API call that opens an archive aborts if the archive contents --- which is encrypted ---- is not readable (due to wrong key or data corruption). > Is it possible the read the user message without knowing the > password of the archive? yes, just open the archive in a text editor! :) You will find it near the beginning of the archive (unless -at option has been used). > (On a long run it would be nice to have a separate option in the > archive header for the RSA stuff.) There is, check the --sign and --key options in man page. Note, that you can read the man page getting the source code (branch master in GIT), 'cd man' there 'man ./dar.1' the easiest way to get it without installing the whole software. Same thing once compiled you can run dar without installing it by 'cd src/dar_suite' and there running "./dar ..." > > Finally there is another thing which I had to consider. When > creating a backup with a reference to an encrypted archive, I need > the password. To bypass this problem I have isolated the catalogue. > I know it would increase the complexity of the archive header to > store the catalogue either encrypted or unencrypted, but it would > eliminate the need of the isolated catalogue file. I could imagine > that is a quite common issue. Maybe I don't follow you. If I'm wrong don't hesitate to tell me: You can do a differential/incremental backup with a encrypted archive as reference using the -ref-key option to provide the necessary credential to read that archive. The archive you create that way is either unencrypted or encrypted if you use the -K option with same or other credentials. (for merging operation that involve two source archive there is also the -aux-key option). An isolated catalogue may be encrypted too (-ref-key option to read the source archive and --key to encrypt the resulting archive). But, yes, dar_manager cannot still read encrypted archives, to manage an encrypted archive with dar_manager you need an unciphered isolated catalogue... this feature is still in the todo list, but I guess I will not have time to implement it for the release 2.5.0 which I have targetted for the end of year (end of new feature implementations, then comes a testing/optimzation phase, for I hope a major relase in 2015). The src/build/Changelog file gives you an overview of new features that will be available with release 2.5.0. > Are you planing to add a possibility to store the catalogue > unencrypted even if the archive data is encrypted? no, this does not makes sens to me. If the archive has to be encrypted this is to prevent anyone except the authorized person(s) to read its content including its table of contents. > > Regards, Tobias > Kind Regards, Denis. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIVAwUBVCK58QgxsL0D2LGCAQIM1Q//ctjjVLfbUiK6k4TD5kgBgfZFCc4yvP2l pJmb2RD3aPLPa+NX4rqOz/vLpZHwIuyKo3Rs5Tvp55wACv6QwILvYEp7hD+scEkb eGiXHCNVabqJMAudZu1Tslz+aaOf6lhZmWe5HGJL54/DH0ays7IwoG+PWscEfHl3 HIAbrtJyRLbSzn7Tll3Uz14rOAvbR7cqEicfplVIR0S/2kKsmI1ruPDCDvCBKFC3 u7zjVcfe02Y1ebH/AbQdEUH8bLbYhhUvWiF6ioc8KWZ3VDmLSHN2bt7jSXpR53ax G5yCpCYyUFASQDQArURt9lThIEGCqf+Kt94fD182frxPrMEHgGFeFf7ECESCVF91 hHoVhNsBY7QP19iE4DST4oXw6WlXOGYd3840rsZV3vFOaKiis7FWtLowxFyj/4/g Sjo+OHflmp3DkjeoN5NKM15sxoECFlbkoCJjpOtgTkhOm6541SqQgxbe7KqSa3kf y0TtxC9V9YHhyQTye4L7OTsJzqrrfYmB11qMu+4RpE/xQpaIoGTvQyJNQ+d/V9x/ pis67gx+0YkGpmV+6FIlSnshaeRltbiNsz48f0LdNi9MCwPEycGP0GsE4zPLL1TZ ycajNkM5Ux8tomPwNcUQGCJT29BvDQaYYSkr5rFOBPh398s8sn9t080qjZ3Gahkf iUMZKj1V3Tc= =r/qH -----END PGP SIGNATURE----- |
From: Tobias S. <spe...@gm...> - 2014-09-24 00:00:35
|
Hi Denis, As I'm working on a user friendly automated backup solution based on dar I have also done some considerations about encryption. For me encryption is a major topic to ensure privacy and to do so it must be easy to use for everyone, not only for IT-experts. The first question: is it possible to get to know whether the archive is encrypted or not before I open it? When I try to open it without a password I get the error message anyway. Now to some deeper considerations about encryption. When creating backups automated on a regular basis (eg. every day the computer is running) it is annoying to enter the password every time to encrypt the backup. When thinking about this problem I came up with the following idea: We could use in addition to the symmetric encryption system of the dar archive a asymmetric encryption schema like RSA. In asymmetric encryption a different key is used for decryption then for encryption. Once for every user/computer a pair of public and private key is generated. The public key can be stored in plain text because it is only used for encryption. But the private key must be encrypted so that the user has to enter a password to open the key and to decrypt data that has been encrypted with his public key. When a backup is created a random password for the encryption of the archive can be generated. This password is only used for this one archive. And now comes the magic thing, the password can be encrypted with the public key which can be done without user interaction. To decrypt the password for the backup the private key is needed which itself is encrypted. In this case the user has to enter his password. So a user only needs to remember the password of his private key. With this he can decrypt all the passwords for his backups. In a backup solution the user should of course not care about all the keys, he only needs to enter his password for the private key and every thing else is done by the program. I have implemented a proof of concept to create backups like this and it is working really well. What du you think about this idea? If you like it, I have a feature request for you. To do all the RSA stuff I need to store some extra data: 1) the encrypted password of the dar archive 2) the ID of the public key with which the password has been encrypted. (this is useful because the public key is also part of the private key so it is easier to match the corresponding private key to the key-file of the backup) This information can be saved in separate files but this lets to confusion and if any of this files get lost, it is impossible to recovery the password of the backup. To make things easier it would be nice to store them in the header of the dar archive. To be exact this would be 256 Byte for the encrypted password and a SHA256 value for the ID of the public key. As I have just noticed dar supports a user defined message to be written inside the archive header (--user-comment). You have documented that this message is unencrypted even if the archive is encrypted. But I have found no way to read the message without providing the password of the archive. Is it possible the read the user message without knowing the password of the archive? (On a long run it would be nice to have a separate option in the archive header for the RSA stuff.) Finally there is another thing which I had to consider. When creating a backup with a reference to an encrypted archive, I need the password. To bypass this problem I have isolated the catalogue. I know it would increase the complexity of the archive header to store the catalogue either encrypted or unencrypted, but it would eliminate the need of the isolated catalogue file. I could imagine that is a quite common issue. Are you planing to add a possibility to store the catalogue unencrypted even if the archive data is encrypted? Regards, Tobias |
From: Tobias S. <spe...@gm...> - 2014-03-11 19:08:09
|
Hi Denis, thank you very much. That's exactly what I was looking for. Regards Tobias Am Sonntag, den 09.03.2014, 21:18 +0100 schrieb Denis Corbin: > On 04/03/2014 21:20, Denis Corbin wrote: > > On 28/02/2014 21:53, Tobias wrote: > >> Hi Denis, > [...] > > > > Yes, it will be possible in seconds starting release 2.4.13 and > > second and microsecond starting 2.5.0. > > All this is now available from GIT respectively on branch > "branch_2.4.x" and on branch "master" > > > > > > > > >> Regards, Tobias > > > > Regards, Denis. > > > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN optimization and the > freedom to use Git, Perforce or both. Make the move to Perforce. > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > Dar-libdar_api mailing list > Dar...@li... > https://lists.sourceforge.net/lists/listinfo/dar-libdar_api |