NAME File::RsyBak - Backup files/directories with histories, using rsync VERSION This document describes version 0.361 of File::RsyBak (from Perl distribution File-RsyBak), released on 2019-03-11. SYNOPSIS From your Perl program: use File::RsyBak qw(backup); backup( source => '/path/to/mydata', target => '/backup/mydata', histories => [-7, 4, 3], # 7 days, 4 weeks, 3 months extra_rsync_opts => [qw/--exclude Cache --exclude cache --exclude tmp --exclude temp/], ); Or, just use the provided script from the command-line: % rsybak --source /path/to/mydata --target /backup/mydata --extra-rsync-opts-json '["--exclude","Cache","--exclude","cache","--exclude","tmp","--exclude","temp"]' Or, if you have this configuration in "/etc/rsybak.conf" or "~/rsybak.conf": [system] source = /path/to/mydata target = /backup/mydata ; also specify as JSON extra_rsync_opts = ["--exclude","Cache","--exclude","cache","--exclude","tmp","--exclude","temp"] you can just run: % rsybak --config-profile system Example resulting backup (after several runs so that backup history has accumulated): % ls /path/to/mydata/ myfile anotherfile mydir/ % ls /backup/mydata/ current/ hist.2013-10-31@12:04:17+00/ hist.2013-11-01@12:09:31+00/ hist.2013-11-02@12:09:41+00t/ hist.2013-11-03@12:15:02+00/ hist.2013-11-04@12:13:19+00/ hist.2013-11-05@12:11:31+00/ hist2.2013-10-08@12:07:50+00/ hist2.2013-10-15@12:06:03+00/ hist2.2013-10-21@12:02:42+00/ hist2.2013-10-27@12:06:25+00t/ hist3.2013-06-25@12:15:39+00/ hist3.2013-08-31@12:05:31+00/ hist3.2013-10-02@12:05:57+00/ Each directory under "/backup/mydata" is a "snapshot" backup of "/path/to/mydata": % ls /backup/mydata/current/ myfile anotherfile mydir/ % ls /backup/mydata/hist.2013-10-31@12:04:17+00/ myfile anotherfile mydir/ % ls /backup/mydata/hist3.2013-10-02@12:05:57+00/ myfile anotherfile mydir/ someoldfile DESCRIPTION This module is basically just a wrapper around rsync to create a filesystem backup system. Some characteristics of this backup system: * Supports backup histories and history levels For example, you can create 7 level-1 backup histories (equals 7 daily histories if you run backup once daily), 4 level-2 backup histories (equals 4 weekly histories) and 3 level-3 backup histories (roughly equals 3 monthly histories). The number of levels and history per levels are customizable. * Backups (and histories) are not compressed/archived ("tar"-ed) They are just verbatim copies (produced by "rsync -a") of source directory. The upside of this is ease of cherry-picking (taking/restoring individual files from backup). The downside is lack of compression and the backup not being a single archive file. This is because rsync needs two real directory trees when comparing. Perhaps if rsync supports tar virtual filesystem in the future... * Hardlinks are used between backup histories to save disk space This way, we can maintain several backup histories without wasting too much space duplicating data when there are not a lot of differences among them. * High performance Rsync is implemented in C and has been optimized for a long time. rm is also used instead of Perl implementation File::Path::remove_path. * Unix-specific There are ports of rsync and rm on Windows, but this module hasn't been tested on those platforms. HOW IT WORKS First-time backup (when TARGET does not yet exist) First, we lock target directory to prevent other backup process from interfering: mkdir -p TARGET flock TARGET/.lock Then we copy source to temporary directory: rsync SRC TARGET/.tmp If copy finishes successfully, we rename temporary directory to final directory 'current': touch TARGET/.current.timestamp rename TARGET/.tmp TARGET/current If copy fails in the middle, TARGET/.tmp will still be lying around and the next backup run will just continue the rsync process: rsync SRC TARGET/.tmp Finally, we remove lock: unlock TARGET/.lock Subsequent backups (after TARGET/current exists) First, we lock target directory to prevent other backup process from interfering: flock TARGET/.lock Then we rsync source to target directory (using --link-dest=TARGET/current): rsync SRC TARGET/.tmp If rsync finishes successfully, we rename target directories: rename TARGET/current TARGET/hist.<timestamp> touch TARGET/.current.timestamp rename TARGET/.tmp TARGET/current where <timestamp> is the mtime of TARGET/.current.timestamp file. If rsync fails in the middle, TARGET/.tmp will be lying around and the next backup run will just continue the rsync process. Finally, we remove lock: unlock TARGET/.lock Maintenance of histories/history levels TARGET/hist.* are level-1 backup histories. Each backup run will produce a new history: TARGET/hist.<timestamp1> TARGET/hist.<timestamp2> # produced by the next backup TARGET/hist.<timestamp3> # and the next ... TARGET/hist.<timestamp4> # and so on ... TARGET/hist.<timestamp5> ... You can specify the number of histories (or number of days) to maintain. If the number of histories exceeds the limit, older histories will be deleted, or one will be promoted to the next level, if a higher level is specified. For example, with histories being set to [7, 4, 3], after TARGET/hist.<timestamp8> is created, TARGET/hist.<timestamp1> will be promoted to level 2: rename TARGET/hist.<timestamp1> TARGET/hist2.<timestamp1> TARGET/hist2.* directories are level-2 backup histories. After a while, they will also accumulate: TARGET/hist2.<timestamp1> TARGET/hist2.<timestamp8> TARGET/hist2.<timestamp15> TARGET/hist2.<timestamp22> When TARGET/hist2.<timestamp29> arrives, TARGET/hist2.<timestamp1> will be promoted to level 3: TARGET/hist3.<timestamp1>. After a while, level-3 backup histories too will accumulate: TARGET/hist3.<timestamp1> TARGET/hist3.<timestamp29> TARGET/hist3.<timestamp57> Finally, TARGET/hist3.<timestamp1> will be deleted after TARGET/hist3.<timestamp85> comes along. FUNCTIONS backup Usage: backup(%args) -> [status, msg, payload, meta] Backup files/directories with histories, using rsync. Examples: * Example #1: backup( source => "/home/jajang/mydata", target => "/backup/jajang/mydata"); Backup /home/jajang/mydata to /backup/jajang/mydata using the default number of histories ([-7, 4, 3]). This function is not exported by default, but exportable. Arguments ('*' denotes required arguments): * backup => *bool* (default: 1) Whether to do backup or not. If backup=1 and rotate=0 then will only create new backup without rotating histories. * extra_dir => *bool* Whether to force creation of source directory in target. If set to 1, then backup(source => '/a', target => '/backup/a') will create another 'a' directory in target, i.e. /backup/a/current/a. Otherwise, contents of a/ will be directly copied under /backup/a/current/. Will always be set to 1 if source is more than one, but default to 0 if source is a single directory. You can set this to 1 to so that behaviour when there is a single source is the same as behaviour when there are several sources. * extra_rsync_opts => *array[str]* Pass extra options to rsync command. Extra options to pass to rsync command when doing backup. Note that the options will be shell quoted, , so you should pass it unquoted, e.g. ['--exclude', '/Program Files']. * histories => *array[int]* (default: [-7,4,3]) Histories/history levels. Specifies number of backup histories to keep for level 1, 2, and so on. If number is negative, specifies number of days to keep instead (regardless of number of histories). * rotate => *bool* (default: 1) Whether to do rotate after backup or not. If backup=0 and rotate=1 then will only do history rotating. * source* => *str* Director(y|ies) to backup. * target* => *str* Backup destination. Returns an enveloped result (an array). First element (status) is an integer containing HTTP status code (200 means OK, 4xx caller error, 5xx function error). Second element (msg) is a string containing error message, or 'OK' if status is 200. Third element (payload) is optional, the actual result. Fourth element (meta) is called result metadata and is optional, a hash that contains extra information. Return value: (any) FAQ How do I exclude some directories? Just use rsync's --exclude et al. Pass them to extra_rsync_opts. What is a good backup practice (using RsyBak)? Just follow the general practice. While this is not a place to discuss backups in general, some of the principles are: * backup regularly (e.g. once daily or more often) * automate the process (else you'll forget) * backup to another disk partition and computer * verify your backups often (what good are they if they can't be restored) * make sure your backup is secure (encrypted, correct permission, etc) How do I restore backups? Backups are just verbatim copies of files/directories, so just use whatever filesystem tools you like. How to do remote backup? From your backup host: [BAK-HOST]% rsybak --source USER@SRC-HOST:/path --dest /backup/dir Or alternatively, you can backup on SRC-HOST locally first, then send the resulting backup to BAK-HOST. HISTORY The idea for this module came out in 2006 as part of the Spanel hosting control panel project. We need a daily backup system for shared hosting accounts that supports histories and cherry-picking. Previously we had been using a Python-based script rdiff-backup. It was not very robust, the script chose to exit on many kinds of non-fatal errors instead of ignoring the errors and continuning backup. It was also very slow: on a server with hundreds of accounts with millions of files, backup process often took 12 hours or more. After evaluating several other solutions, we realized that nothing beats the raw performance of rsync. Thus we designed a simple backup system based on it. First public release of this module is in Feb 2011. I have since used this script in various production servers as well as personal PCs/laptops. HOMEPAGE Please visit the project's homepage at <https://metacpan.org/release/File-RsyBak>. SOURCE Source repository is at <https://github.com/perlancar/perl-File-RsyBak>. BUGS Please report any bugs or feature requests on the bugtracker website <https://rt.cpan.org/Public/Dist/Display.html?Name=File-RsyBak> When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature. SEE ALSO File::Backup File::Rotate::Backup Snapback2, which is a backup system using the same basic principle (rsync snapshots), created in as early as 2004 (or earlier) by Mike Heins. Do check it out. I wish I had found it first before reinventing it in 2006 :-) AUTHOR perlancar <perlancar@cpan.org> COPYRIGHT AND LICENSE This software is copyright (c) 2019, 2017, 2015, 2014, 2013, 2012, 2011 by perlancar@cpan.org. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.