some notes

Backup Linux to NTFS Using GNU tar

We have a nice, disk-based backup system at work that uses BackupExec, and a disaster recovery policy that sees full backups stored offsite every week.

Rather than duplicate the system for the unix servers I manage, I've been writing my backups to a share on the Windows system, and they get backed up and archived with everything else.

But because the share is NTFS, I have to make a really nasty choice: preserve unix permissions and file times by using archives (which backup everything every night) or use rsync to only backup what's changed, but loose all the file metadata thanks to Windows' brain-dead filesystem.

Additionally, the share that I'm writing to has a 2GB filesize limit, which seems to preclude the use of tar altogether, since full backups are at 6GB and growing for our media server.

But then I discovered that GNU tar can do incremental backups! And there is a handy unix utility called split which will break files (or standard input) into conveniently-sized chunks.

Here, then, is how I solved my backup dillemma: incremental tar piped to split.

Before we get started, I should say that most of the backup utilities I've looked at have been too complex (bacula), or require a program to run on the backup server (rdiff-backup), or aren't the right fit for some other reason. Many specialize in writing to CD-R, others have really insane dependencies. I have always been really happy with simple solutions involving standard GNU tools.

Warning: you must have a general understanding of the command line and GNU tar to understand what's going on in this writeup. Backup and recovery is NOT as complex as some vendors would have you believe, but there IS a lot of room to get things wrong. Proceed at your own risk.

Backup Commands

So here's what it looks like to do a full backup of the /var/log directory using tar and split. Feel free to follow along yourself by carrying these commands out in your /tmp directory.
# rm ~/var-log.snar
# tar czvf - --listed-incremental ~/var-log.snar /var/log \
| split -b 1m - var-log-full.tar.gz.
Tar uses a .snar file to store the information required to perform incremental backups. So the first thing we do, since this is a full backup, is remove any existing .snar file. Next we build the tar command: "czvf -" means create (c) a compressed (z) archive, printing the filenames, and write it to standard out (f -). We provide the --listed-incremental option, followed by the path to the .snar file, and then supply /var/log as the location that should be backed up.

Then we pipe the output of that tar command (which will be the archive itself) to the unix split command, which will split it into 1MB chunks and write them to files named var-log-full.tar.gz.aa, var-log-full.tar.gz.ab, etc.

Now hit the server's website a few times, or log out and log back in, to change the contents of one or more logfiles. Here's what it looks like to generate an incremental backup that will include just the files that changed:
# tar czvf - --listed-incremental ~/var-log.snar /var/log \
| split -b 1m - var-log-mon.tar.gz.
This is basically the same command, only using the existing .snar file for incrementalization (is that a word?) and saving the split archive as var-log-mon.tar.gz.a* (mon for Monday). A similar command is run for tue, wed, thu, etc.

In case you aren't actually following along, we now have two sets of split backup archives, one full, the other incremental:
# ls -lh
total 1.4M
-rw-r--r--  1 root root 1.0M Mar 15 12:43 var-log-full.tar.gz.aa
-rw-r--r--  1 root root 436K Mar 15 12:43 var-log-full.tar.gz.ab
-rw-r--r--  1 root root  70K Mar 15 12:44 var-log-mon.tar.gz.aa

Restore Commands

Restoring the files is a matter of concatenating (using the cat command) and untarring the full archive, and then doing the same with the latest incremental (you can ignore intermediate incrementals!) You can do this in place if you feel brave, but I usually restore to a temporary directory and then copy or move the files into place by hand.

The command to untar the full archive is:
# cat var-log-full.tar.gz.* | tar xzvf - --listed-incremental=/dev/null
And the command to bring that untarred archive up to date with Monday's incremental backup is:
# cat var-log-mon.tar.gz.* | tar xzvf - --listed-incremental=/dev/null
Note that the argument to --listed-incremental is /dev/null, per the tar manual. I'm not sure why this has to be there, but they say it does so it seems like a pretty good idea to have it.

By Chris Snyder on March 15, 2006 at 1:04pm

jump to top