Due to failing software I needed to recover a database of a monitoring system (lots of writes) with PITR:
Backup 20131209T080728: Server Name : zbx-sql-002 Status : DONE PostgreSQL Version: 90202 PGDATA directory : /mnt/data/postgresql Base backup information: Disk usage : 57.0 GiB Timeline : 1 Begin WAL : 00000001000003BD000000C0 End WAL : 00000001000003BD000000D5 WAL number : 22 Begin time : 2013-12-09 08:07:28.426213 End time : 2013-12-09 09:33:05.236637 Begin Offset : 32 End Offset : 15489408 Begin XLOG : 3BD/C0000020 End XLOG : 3BD/D5EC5980 WAL information: No of files : 16026 Disk usage : 53.0 GiB Last available : 00000001000003FC000000AE Catalog information: Retention Policy: VALID Previous Backup : - (this is the oldest base backup) Next Backup : - (this is the latest base backup)
The original server runs on a 100GB disk, and this base backup is 57GB and 53GB wal files so I figured a 150GB disk would suffice. However I didn't check the number of wal files (16026) which is is > 250GB (16026*16=256320MB) and therefor the restore process halted with the an error that the disk was full.
Instead of copying the all wall files to an other location and then executing the recovery.conf with a simple copy command, could the recovery command be a bit more complex to not having to copy all files?
Currently the command is:
restore_command = 'cp barman_xlog/%f %p'
But it could be:
restore_command = 'barman get-wal <servername> <backupname> %f <recovery_folder>/%p'
Now the WAL file would be directly applied by the server and reducing the disk footprint. Maybe the recovery process is also a bit quicker as it removes a copy command.
A small proof-of-concept edit to cli.py:
from barman import xlog
@named('get-wal') @arg('server_name', completer=server_completer_all, help='specifies the server name for the command') @arg('wal_id', help='WAL ID') @arg('wal_dest', help='Destination to place the WAL file') @expects_obj def get_wal(args): server = get_server(args) if server == None: raise SystemExit("ERROR: unknown server '%s'" % (args.server_name)) try: backup_manager = server.backup_manager compressor = backup_manager.compression_manager.get_decompressor() xlogs = {} hashdir = xlog.hash_dir(args.wal_id) xlogs[hashdir] = [args.wal_id] backup_manager.recover_xlog_copy(compressor, xlogs, args.wal_dest, None) except: sys.exit(1)
And it "works" although I haven't tested it as a recover command. The correct wal file is copied to the directory I specified. However one of my servers switched from normal to compressed wal files. If I choose a compressed wal file it works, however a already decompressed wal file ends up as a compressed wal file in the wal_dest folder.
See original discussion: https://groups.google.com/forum/#!topic/pgbarman/HmwkdRBxNaI