Menu

#42 Recovery Disk Size and get-wal solution

1.x
open
None
2015-02-09
2014-05-05
No

Due to failing software I needed to recover a database of a monitoring system (lots of writes) with PITR:

Backup 20131209T080728:
Server Name       : zbx-sql-002
Status            : DONE
PostgreSQL Version: 90202
PGDATA directory  : /mnt/data/postgresql

Base backup information:
  Disk usage      : 57.0 GiB
  Timeline        : 1
  Begin WAL       : 00000001000003BD000000C0
  End WAL         : 00000001000003BD000000D5
  WAL number      : 22
  Begin time      : 2013-12-09 08:07:28.426213
  End time        : 2013-12-09 09:33:05.236637
  Begin Offset    : 32
  End Offset      : 15489408
  Begin XLOG      : 3BD/C0000020
  End XLOG        : 3BD/D5EC5980

WAL information:
  No of files     : 16026
  Disk usage      : 53.0 GiB
  Last available  : 00000001000003FC000000AE

Catalog information:
  Retention Policy: VALID
  Previous Backup : - (this is the oldest base backup)
  Next Backup     : - (this is the latest base backup)

The original server runs on a 100GB disk, and this base backup is 57GB and 53GB wal files so I figured a 150GB disk would suffice. However I didn't check the number of wal files (16026) which is is > 250GB (16026*16=256320MB) and therefor the restore process halted with the an error that the disk was full.

Instead of copying the all wall files to an other location and then executing the recovery.conf with a simple copy command, could the recovery command be a bit more complex to not having to copy all files?

Currently the command is:

restore_command = 'cp barman_xlog/%f %p'

But it could be:

restore_command = 'barman get-wal <servername> <backupname> %f <recovery_folder>/%p'

Now the WAL file would be directly applied by the server and reducing the disk footprint. Maybe the recovery process is also a bit quicker as it removes a copy command.

A small proof-of-concept edit to cli.py:

from barman import xlog

@named('get-wal')
@arg('server_name',
     completer=server_completer_all,
     help='specifies the server name for the command')
@arg('wal_id', help='WAL ID')
@arg('wal_dest', help='Destination to place the WAL file')
@expects_obj
def get_wal(args):
    server = get_server(args)
    if server == None:
        raise SystemExit("ERROR: unknown server '%s'" % (args.server_name))
    try:    
        backup_manager = server.backup_manager
        compressor = backup_manager.compression_manager.get_decompressor()
        xlogs = {}
        hashdir = xlog.hash_dir(args.wal_id) 
        xlogs[hashdir] = [args.wal_id]
        backup_manager.recover_xlog_copy(compressor, xlogs, args.wal_dest, None)
    except: 
        sys.exit(1)

And it "works" although I haven't tested it as a recover command. The correct wal file is copied to the directory I specified. However one of my servers switched from normal to compressed wal files. If I choose a compressed wal file it works, however a already decompressed wal file ends up as a compressed wal file in the wal_dest folder.

See original discussion: https://groups.google.com/forum/#!topic/pgbarman/HmwkdRBxNaI

Discussion


Log in to post a comment.