Menu

#92 Remove excess logging messages

2.x
open
nobody
logging (1)
2018-06-14
2018-06-14
Greg Clough
No

Hi,

I want some of the INFO messages, but there are many that I'd prefer to mute as they just seem to create noise without any tangible benefit. I’m a new Barman user, so I'd be happy to be told why they are necessary.

Here is my opinion, reading the log file:

1) Most of the lines in my log file are telling me that there are no xlog segments, which doesn’t seem like something that needs to be captured every 1/10th of a second. Can I request that these are only put into the log file when there is an error, as it seems they are creating noise for no benefit.

[472] (barman@barman01 /var/log/barman)$ wc -l *log
32562 barman.log
[473] (barman@barman01 /var/log/barman)$ grep 'INFO: No xlog segments found from streaming for ' *log | wc -l
22310

...
2018-06-14 08:00:03,294 [701] barman.wal_archiver INFO: No xlog segments found from streaming for pgcluster01.
2018-06-14 08:00:03,395 [701] barman.wal_archiver INFO: No xlog segments found from streaming for pgcluster01.
2018-06-14 08:00:03,495 [701] barman.wal_archiver INFO: No xlog segments found from streaming for pgcluster01.
...

2) The second most frequent log message happens each minute when “barman cron” runs. I’m right in thinking that this should be scheduled each minute, correct? (http://docs.pgbarman.org/release/2.4/ - "Our recommendation is to schedule barman cron to run every minute.")

Can I request that this is only put into the log file when there is an error, as it seems they are also creating noise for no benefit.

[480] (barman@barman01 /var/log/barman)$ wc -l log
32572 barman.log
 [482] (barman@barman01 /var/log/barman)$ grep 'INFO: Another cron process is already running on server ' log | wc -l
8627

...
2018-06-10 04:06:01,388 [17248] barman.server INFO: Another cron process is already running on server postgres01. Skipping to the next server
2018-06-10 04:06:01,441 [17249] barman.server INFO: Another cron process is already running on server pgcluster01. Skipping to the next server
2018-06-10 04:07:01,896 [17262] barman.server INFO: Another cron process is already running on server postgres01. Skipping to the next server
2018-06-10 04:07:01,953 [17261] barman.server INFO: Another cron process is already running on server pgcluster01. Skipping to the next server
2018-06-10 04:08:01,323 [17274] barman.server INFO: Another cron process is already running on server postgres01. Skipping to the next server
2018-06-10 04:08:01,364 [17272] barman.server INFO: Another cron process is already running on server pgcluster01. Skipping to the next server
...

Just doing these two cuts the noise in the logfile by 95%, and allows actual activity to show through:

[486] (barman@barman01 /var/log/barman)$ wc -l *log
32588 barman.log
[487] (barman@barman01 /var/log/barman)$ cat *log | grep -v 'INFO: Another cron process is already running on server ' | grep -v 'INFO: No xlog segments found from streaming for ' | wc -l
1627

[488] (barman@barman01 /var/log/barman)$ cat *log | grep -v 'INFO: Another cron process is already running on server ' | grep -v 'INFO: No xlog segments found from streaming for ' | head -10
2018-06-10 04:00:01,934 [17155] barman.server INFO: Force a CHECKPOINT before pg_switch_wal()
2018-06-10 04:00:02,110 [17156] barman.server INFO: Force a CHECKPOINT before pg_switch_wal()
2018-06-10 04:00:02,295 [17155] barman.server INFO: The WAL file 000000010000000300000018 has been closed on server 'pgcluster01'
2018-06-10 04:00:02,296 [17155] barman.server INFO: Waiting for the WAL file 000000010000000300000018 from server 'pgcluster01' (max: 30 seconds)
2018-06-10 04:00:02,458 [21576] barman.command_wrappers INFO: pgcluster01: pg_receivexlog: finished segment at 3/19000000 (timeline 1)
2018-06-10 04:00:02,459 [17156] barman.server INFO: The WAL file 0000000100000002000000A3 has been closed on server 'postgres01'
2018-06-10 04:00:02,460 [17156] barman.server INFO: Waiting for the WAL file 0000000100000002000000A3 from server 'postgres01' (max: 30 seconds)
2018-06-10 04:00:02,569 [26378] barman.command_wrappers INFO: postgres01: pg_receivexlog: finished segment at 2/A4000000 (timeline 1)
2018-06-10 04:00:02,701 [17155] barman.wal_archiver INFO: Found 1 xlog segments from streaming for pgcluster01. Archive all segments in one run.
2018-06-10 04:00:02,701 [17155] barman.wal_archiver INFO: Archiving segment 1 of 1 from streaming: pgcluster01/000000010000000300000018

3) When backups/WAL are deleted due to a retention policy, the server name should be tagged on all actions. I have two servers that cleanup simultaneously, both of which have the same backup name “20180516T000001”.

The initial reporting of actions includes the backup name and server name:

2018-06-14 08:01:02,352 [746] barman.retention_policies INFO: Reporting backup 20180516T000001 for server postgres01 as OBSOLETE (older than 2018-05-17 08:01:02.352830+01:00)
2018-06-14 08:01:02,361 [746] barman.backup INFO: Enforcing retention policy: removing backup 20180516T000001 for server postgres01
2018-06-14 08:01:02,362 [746] barman.backup INFO: Deleting backup 20180516T000001 for server postgres01
2018-06-14 08:01:02,370 [744] barman.retention_policies INFO: Reporting backup 20180516T000001 for server pgcluster01 as OBSOLETE (older than 2018-05-17 08:01:02.370902+01:00)
2018-06-14 08:01:02,383 [744] barman.backup INFO: Enforcing retention policy: removing backup 20180516T000001 for server pgcluster01
2018-06-14 08:01:02,384 [744] barman.backup INFO: Deleting backup 20180516T000001 for server pgcluster01

Subsequent log entries or deleting the WAL, and the completion of the backup deletion do not include enough information to determine which backup/server they relate to. This could be relevant to debugging, and if I have many more servers on here then it's only going to get worse. I could (and probably will) run them at different times, but IMHO it still would make sense to know which server/cluster these lines relate to:

2018-06-14 08:01:02,686 [746] barman.backup INFO: Delete associated WAL segments:
2018-06-14 08:01:02,743 [746] barman.backup INFO:       000000010000000000000018
2018-06-14 08:01:02,745 [746] barman.backup INFO:       000000010000000000000019
2018-06-14 08:01:02,746 [746] barman.backup INFO:       00000001000000000000001A
...
2018-06-14 08:01:02,784 [746] barman.backup INFO:       000000010000000000000043
2018-06-14 08:01:02,785 [746] barman.backup INFO:       000000010000000000000044
2018-06-14 08:01:02,786 [746] barman.backup INFO:       000000010000000000000045
2018-06-14 08:01:02,787 [746] barman.backup INFO: Deleted backup 20180516T000001 (start time: Thu Jun 14 08:01:02 2018, elapsed time: less than one second)
2018-06-14 08:01:02,790 [744] barman.backup INFO: Delete associated WAL segments:
2018-06-14 08:01:02,843 [744] barman.backup INFO:       000000010000000000000068
2018-06-14 08:01:02,844 [744] barman.backup INFO:       000000010000000000000069
2018-06-14 08:01:02,845 [744] barman.backup INFO:       00000001000000000000006A
...
2018-06-14 08:01:02,910 [744] barman.backup INFO:       00000001000000000000009E
2018-06-14 08:01:02,911 [744] barman.backup INFO:       00000001000000000000009F
2018-06-14 08:01:02,913 [744] barman.backup INFO:       0000000100000000000000A0
2018-06-14 08:01:02,914 [744] barman.backup INFO: Deleted backup 20180516T000001 (start time: Thu Jun 14 08:01:02 2018, elapsed time: less than one second)

Thanks.
Greg Clough

Discussion


Log in to post a comment.

MongoDB Logo MongoDB