Menu

Home

Anupam Mathur

Background:
Facing a failed hard disk, came face to face with the fact that I do not have a set backup process. Luckily i had a copy of most of the files, but never-the-less got the all important lesson. Went ahead and promptly created an image of the remaining data and burned it on an optical disc. Talk about bad luck, the system crashed after a few months. I was super confident that i have a backup image this time, but as luck would have it, the disk had got scratched, and system refused to read the disk!!
After evaluating a few available backup software, it seemed like no single application served all of my needs, prompting me to write my own program. The requirements that got formalized:

. Should be file based. If a few bytes in an image file get corrupted, you loose the whole backup image.
. Backup format should be non-proprietary.
. Compression, to save space required for backup.
. Handle file modifications (maintain versions).
. Can exclude a configurable list of files.
. Maintain source file and directory structure, so in worst case, files can still be recovered manually.
. Batch process, backup multiple source directories.

Current Program:
Pre requisite:
Python 3.3 or higher
7-zip
Python libraries: logbook;

Features:
. Reads input parameters from an external input file. One input file can contain multiple directories for backup.
. Reads configuration parameters from an external file.
. Compress individual files. Creates 1 archive file for each source file.
. Mirrors the source directory structure under the target directory.
. Files are managed and identified through their checksum (SHA256).
. Uses a SQL database to manage backup file information. SQL database provides better information organisation and faster search.
. Uses the excellent 7-zip utility for compression.
. Uses LZMA2 algorithm with 'fast' option. Provides one of the fastest response time (only LZMA2 - fastest is faster by a small scale), still with decent compression (beats zip,rar and other formats, only higher compression levels of LZMA/LZMA2 give smaller size, but at a high performance cost). For details see : http://www.tomshardware.com/reviews/winrar-winzip-7-zip-magicrar,3436-7.html

Input parameters:
1. Active: Should this directory be processed for backup. Accepted Values Y/N.
2. SourcePath: The directory that needs to be back-up.
3. BackupPath: The destination directory. The backup directory structure will be created under this folder.
4. Compress: Should files in this backup be compressed (Archived with 7zip)? Accepted Values Y/N.
5. Encrypted: Should files in this backup be encrypted? Accepted Values Y/N. ToDo: Pending feature implementation.
6. CompressMinSize: Compression threshold size. Files smaller than this size will not be compressed. ToDo: Pending feature implementation.

Configuration Parameters:
1. ExcludeFileType: List of files which will not be backed-up. Will be excluded in the backup.
2. DoNotCompressFileType: List of files which will not be compressed/archived, even if the compress flag is set to Y. These file formats are already considered to be compressed, and it will an overhead on the backup process in terms of resource and time to get a marginal storage size improvement. This is especially true for most of the media files.
3. LogFileName: Name of the log file.
4. DatabaseFileName Name for the database file.
5. ZipExecutablePath Directory path for the 7-zip executable.
6. DatabaseFilePath

Project Members: