From: Mantis B. T. <no...@bu...> - 2017-03-14 15:09:37
|
A NOTE has been added to this issue. ====================================================================== http://bugs.bacula.org/view.php?id=2269 ====================================================================== Reported By: azurit Assigned To: kern ====================================================================== Project: Bacula Bug Reports Issue ID: 2269 Category: File Daemon Reproducibility: always Severity: minor Priority: high Status: assigned ====================================================================== Date Submitted: 2017-02-28 10:05 UTC Last Modified: 2017-03-14 15:09 UTC ====================================================================== Summary: Possible memory leak Description: File daemon is NOT deallocating memory after accurate backup (so, in our environment, this leads to 4 GB memory usage by FD). Memory is released only after restart of file daemon. ====================================================================== ---------------------------------------------------------------------- (0007469) kern (administrator) - 2017-03-10 16:19 http://bugs.bacula.org/view.php?id=2269#c7469 ---------------------------------------------------------------------- Please show some evidence that Bacula is really keeping memory. In particular I will want to see that the memory usage continuously increase each time you run an accurate job. So running 10 accurates with some sort of memory monitoring between the runs should be a good indication. ---------------------------------------------------------------------- (0007470) azurit (reporter) - 2017-03-10 16:45 http://bugs.bacula.org/view.php?id=2269#c7470 ---------------------------------------------------------------------- I don't understand your request. Is file daemon supposed to use 4 GB of RAM while there's no job running? ---------------------------------------------------------------------- (0007471) azurit (reporter) - 2017-03-11 07:36 http://bugs.bacula.org/view.php?id=2269#c7471 ---------------------------------------------------------------------- Here it is: I started to monitor memory usage of Bacula FD. As you can see on the graph, memory usage raised on every started job and didn't goes down. Jobs ---- 1) started: 1:00, accurate info received: 1:07,ended: 2:37 2) started: 1:00, accurate info received: 1:11, ended: 3:03 3) started: 1:11, accurate info received: 1:17, ended: 2:55 4) started: 1:11, accurate info received: 1:18,ended: 2:46 ---------------------------------------------------------------------- (0007472) kern (administrator) - 2017-03-11 09:37 http://bugs.bacula.org/view.php?id=2269#c7472 ---------------------------------------------------------------------- Yes, something seems to be wrong. I will take a look at it. However, I saw something like this some time ago with RedHat 5 in the case of a restore. While Bacula released memory it used, glibc did not release the memory. That may be the case here. As I say, I will test it shortly. However in the mean time, can you do the following tests: 1. At the beginning, and after each job that complete, please do a bconsole "status client=xxx" where xxx is the client name. This will tell us what Bacula thinks it has allocated. 2. Does this happen on any other OS or major version of your OS (Debian)? ---------------------------------------------------------------------- (0007473) azurit (reporter) - 2017-03-12 18:11 http://bugs.bacula.org/view.php?id=2269#c7473 ---------------------------------------------------------------------- 1. I will report back when i'm done with this. Meanwhile, i uploaded new graph (nothing special happend on 18:00 but FD was using 2.5 GB RAM anyway). 2. It was happening also on Debian Wheezy but i wasn't reporting it as Bacula version in Wheezy is really old (5.6), so you would, probably, recommend me to upgrade. ---------------------------------------------------------------------- (0007474) azurit (reporter) - 2017-03-14 07:37 http://bugs.bacula.org/view.php?id=2269#c7474 ---------------------------------------------------------------------- Before jobs: Connecting to Director localhost:9101 1000 OK: 102 server00-dir Version: 7.4.3 (18 June 2016) Enter a period to cancel a command. status client=server00-fd Connecting to Client server00-fd at 127.0.0.1:9102 server00-fd Version: 7.4.3 (18 June 2016) x86_64-pc-linux-gnu debian 8.5 Daemon started 28-Feb-17 14:27. Jobs: run=66 running=0. Heap: heap=160,760 smbytes=477,364 max_bytes=4,312,767,357 bufs=131 max_bufs=6,326 Sizes: boffset_t=8 size_t=8 debug=0 trace=0 mode=0 bwlimit=0kB/s Running Jobs: Director connected at: 14-Mar-17 00:59 No Jobs running. ==== Terminated Jobs: JobId Level Files Bytes Status Finished Name =================================================================== 32503 Incr 16,576 510.1 M OK 12-Mar-17 02:21 server00-mail-0-9-a-d 32505 Incr 14,503 310.4 M OK 12-Mar-17 02:38 server00-mail-l-p 32506 Incr 28,036 483.0 M OK 12-Mar-17 02:39 server00-mail-q-z 32504 Incr 39,127 646.0 M OK 12-Mar-17 02:49 server00-mail-e-k 32554 Incr 499 541.5 M OK 13-Mar-17 01:05 server00 32555 Incr 13,812 1.077 G OK 13-Mar-17 02:29 server00-mail-0-9-a-d 32557 Incr 9,086 456.6 M OK 13-Mar-17 02:43 server00-mail-l-p 32558 Incr 32,795 1.072 G OK 13-Mar-17 02:45 server00-mail-q-z 32556 Incr 15,177 773.9 M OK 13-Mar-17 02:54 server00-mail-e-k 32579 51,920 19.55 G OK 13-Mar-17 12:59 Restore_server00-mail-q-z ==== After jobs: Connecting to Director localhost:9101 1000 OK: 102 server00-dir Version: 7.4.3 (18 June 2016) Enter a period to cancel a command. status client=server00-fd Connecting to Client server00-fd at 127.0.0.1:9102 server00-fd Version: 7.4.3 (18 June 2016) x86_64-pc-linux-gnu debian 8.5 Daemon started 28-Feb-17 14:27. Jobs: run=71 running=0. Heap: heap=160,760 smbytes=412,543 max_bytes=4,312,767,357 bufs=130 max_bufs=6,326 Sizes: boffset_t=8 size_t=8 debug=0 trace=0 mode=0 bwlimit=0kB/s Running Jobs: Director connected at: 14-Mar-17 08:32 No Jobs running. ==== Terminated Jobs: JobId Level Files Bytes Status Finished Name =================================================================== 32555 Incr 13,812 1.077 G OK 13-Mar-17 02:29 server00-mail-0-9-a-d 32557 Incr 9,086 456.6 M OK 13-Mar-17 02:43 server00-mail-l-p 32558 Incr 32,795 1.072 G OK 13-Mar-17 02:45 server00-mail-q-z 32556 Incr 15,177 773.9 M OK 13-Mar-17 02:54 server00-mail-e-k 32579 51,920 19.55 G OK 13-Mar-17 12:59 Restore_server00-mail-q-z 32581 Incr 708 541.5 M OK 14-Mar-17 01:05 server00 32582 Incr 35,184 4.963 G OK 14-Mar-17 02:45 server00-mail-0-9-a-d 32584 Incr 29,351 2.223 G OK 14-Mar-17 02:50 server00-mail-l-p 32585 Incr 40,885 5.710 G OK 14-Mar-17 03:05 server00-mail-q-z 32583 Incr 44,211 4.377 G OK 14-Mar-17 03:09 server00-mail-e-k ==== Also attaching new graph. ---------------------------------------------------------------------- (0007475) kern (administrator) - 2017-03-14 14:00 http://bugs.bacula.org/view.php?id=2269#c7475 ---------------------------------------------------------------------- Well, the output is pretty clear. Bacula has allocated somewhat over 4GB at a maximum, but it has released it back to the system. If the memory is still attached to Bacula, it is glibc that is holding it. Unless you know some APIs that I do not know (Bacula uses malloc() and free()), there is nothing Bacula can do. If you want to "see" Bacula tracking memory usage, just do a few status commands as you did, during the period it is doing the restore. In that case you should see smbytes get very large. ---------------------------------------------------------------------- (0007476) azurit (reporter) - 2017-03-14 15:03 http://bugs.bacula.org/view.php?id=2269#c7476 ---------------------------------------------------------------------- So what can be done? Why other software is not affected by this? ---------------------------------------------------------------------- (0007477) kern (administrator) - 2017-03-14 15:09 http://bugs.bacula.org/view.php?id=2269#c7477 ---------------------------------------------------------------------- First, I am still planning to test it here to see if I can reproduce it. I have been pretty busy lately, and am going to release another Bacula community version in the next couple of days. After that, this is on the top of my list. As I mentioned, I have seen something like this before. I think it was on RedHat 5, with a major Italian Bank, and I reported it in detail to RedHat. They were unable to find a solution, probably because the glibc code is so complicated. The fix was to upgrade to RedHat version 6.0. I have not seen the problem since then. If you are on an older Debian (I am not very familiar with their versions), I recommend trying a newer one. I have not 100% excluded a Bacula problem, but at this point everything is indicating glibc. Issue History Date Modified Username Field Change ====================================================================== 2017-02-28 10:05 azurit New Issue 2017-03-10 16:19 kern Assigned To => kern 2017-03-10 16:19 kern Status new => feedback 2017-03-10 16:19 kern Note Added: 0007469 2017-03-10 16:45 azurit Note Added: 0007470 2017-03-10 16:45 azurit Status feedback => assigned 2017-03-11 07:36 azurit File Added: bacula-fd-memory.png 2017-03-11 07:36 azurit Note Added: 0007471 2017-03-11 09:37 kern Note Added: 0007472 2017-03-12 18:11 azurit File Added: bacula-fd-memory-2.png 2017-03-12 18:11 azurit Note Added: 0007473 2017-03-14 07:37 azurit File Added: bacula-fd-memory-3.png 2017-03-14 07:37 azurit Note Added: 0007474 2017-03-14 14:00 kern Note Added: 0007475 2017-03-14 15:03 azurit Note Added: 0007476 2017-03-14 15:09 kern Note Added: 0007477 ====================================================================== |