Barman for PostgreSQL / Tickets / #101 "Error: xx is already present in server xx. File moved to errors directory"

#101 "Error: xx is already present in server xx. File moved to errors directory"

Milestone: 2.x

Status: closed

Owner: nobody

Labels: None

Updated: 2019-12-03

Created: 2019-12-03

Creator: mac

Private: No

Hello,

Barman has ran flawlessly for years... litterally 2 years without a reboot. All of a sudden, it got stuck with the following error:

Archiving segment 1 of 1 from file archival: star/00000001000004A80000002F
    Error: 00000001000004A80000002F is already present in server star. File moved to errors directory.

I can confirm that this file is present twice on the system, with different contents:

find . | grep 00000001000004A80000002F
./errors/00000001000004A80000002F.20191203T073401Z.duplicate
./wals/00000001000004A8/00000001000004A80000002F

ls -lah ./errors/00000001000004A80000002F.20191203T073401Z.duplicate ./wals/00000001000004A8/00000001000004A80000002F
-rw------- 1 barman barman 16M Dec 3 08:33 ./errors/00000001000004A80000002F.20191203T073401Z.duplicate
-rw------- 1 barman barman 27K Jan 9 2018 ./wals/00000001000004A8/00000001000004A80000002F

Moving the files out of the errors directory doesn't fix the issue - the issue re-appears when the next WAL file get pulled. I have tried barman switch-wal --force --archive staras well - it doesn't help.

The details logs on the Barman side are:

Setup:

Barman running on Debian 8.11

barman --version
2.3

PostgreSQL running on Debian 9.9

PostgreSQL 10.10 (Debian 10.10-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

Streaming replications setup:

cat /etc/postgresql/10/main/conf.d/archive.conf

wal_level = replica
archive_mode = on
archive_command = 'rsync -a %p barman@xxx:/var/lib/barman/star/incoming/%f'
archive_timeout = 60

max_wal_size = 10GB

To be noted that I have increased the max_wal_size to 10GB in an attempt to fix the issue - when checking for anomalies, I've seen this in the server logs:

2019-12-03 02:13:09.875 CET [27780] HINT: Consider increasing the configuration parameter "max_wal_size".
2019-12-03 02:13:23.247 CET [27780] LOG: checkpoints are occurring too frequently (14 seconds apart)
2019-12-03 02:13:23.247 CET [27780] HINT: Consider increasing the configuration parameter "max_wal_size".
2019-12-03 02:13:37.152 CET [27780] LOG: checkpoints are occurring too frequently (14 seconds apart)
2019-12-03 02:13:37.152 CET [27780] HINT: Consider increasing the configuration parameter "max_wal_size".
2019-12-03 02:13:51.370 CET [27780] LOG: checkpoints are occurring too frequently (14 seconds apart)
2019-12-03 02:13:51.370 CET [27780] HINT: Consider increasing the configuration parameter "max_wal_size".

But it doesn't seem to fix the issue.

Any idea on how to troubleshoot the issue? Or do you need more information?

Thank you :)

Discussion

Marco Nenciarini - 2019-12-03

Hi,

what you see means that at the some time in the past, your postgres server restarted the wal sequence. That usually happen when you use pg_upgrade or you replace your cluster in another way without changing the name of the server in barman. After some time you reach the old WAL name and therefore the error you see.

The version 2.10 of barman (it will be officially out on 5 December 2019) has a new mechanism to prevent this kind of issues.

It introduces verification of the PostgreSQL instance's system identifier in the check command, in order to prevent users from executing commands in Barman when there is an inconsistency between the situation on disk and the live information coming from the PostgreSQL connection(s). Barman will prevent users from taking a backup or archiving a WAL file on an existing folder that contains data from another instance with a different identifier.

The best way to fix the issue in your installation is to move the whole /var/lib/barman/star directory to a different name and restart from scratch with a new backup.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Marco Nenciarini - 2019-12-03

status: open --> closed

Milestone: 1.x --> 2.x
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mac - 2019-12-03

I had overlooked the creation date of the WAL files... one of them is more than 1 year old, we actually have a bunch of them.

For some reason they were not purged by Barman - given the age of those files, I don't know if this was linked to a Barman upgrade or to some other reason.

Deleting those old WAL files have fixed the issue.

As far as I'm concerned this ticket can be closed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

"Error: xx is already present in server xx. File moved to errors directory"

Backup and Recovery Manager for PostgreSQL

Milestone

Searches

Help

#101 "Error: xx is already present in server xx. File moved to errors directory"

PostgreSQL 10.10 (Debian 10.10-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

max_wal_size = 10GB

Discussion