Most website hosting companies do their own backups, but they often make no guarantees about actually recovering your data. If you put any amount of effort into a website, you really owe it to yourself to have a decent backup and recovery strategy. You don’t have to do anything too fancy or complicated, but at a minimum you should backup your home directory with tar and any databases that your website uses.
The first thing you need to do is decide what kind of recoverability you need. In enterprise environments, you’ll often hear terms like Recovery Point Objective and Recovery Time Objective. All that really means is how much data you’re willing to lose and how fast you need it to be recovered if something goes wrong. For a personal website, the volume of data is likely to be small enough that recovery time is a non-issue. I think the more relevant consideration is whether you want to be able to do point in time recovery. The main questions to ask yourself are listed below.
- How often are you going to backup your data?
- Where are you going to run backups?
- How long are you going to keep backup files?
- How are you going to know that your backups are working?
While not much of a strategy, this is your default option. If you do nothing, you’ll have whatever level of recoverability your hosting service provides. The pro of this strategy is that it’s easy. You don’t have to do anything. The con is that you will most likely lose some amount of your work if there is ever a failure.
My web host provides cPanel. While I think it’s an impressive piece of software, the backup features leave something to be desired. There are a few options for specifying what is backed up, after which you receive an email when the backup is completed, and you then have a few days to download the backup file from the server. While better than nothing, I personally don’t have the time or inclination to do a manual backup at some regular interval. The pro of this strategy is that you don’t have to write any scripts. The con is that you have to remember to do it, and it takes up your time.
Automated Server Side Backups
Ideally you want your backups to run automatically on the server. This can be as easy as setting up a cron job to execute a script once a day. One of the challenges when working with a server that you don’t actually own, is that you also have to find a way to get the backup files to your personal computer or some other reliable storage location. Having backups files sitting on the server when its file system gets corrupted isn’t going to do much to help you recover your website. Tools like FTP, SCP and rsync can help you copy backup files to your local system.
Automated Client Side Backups
If you can’t run backups on the server, you can always run them on your computer. One of the downsides with this approach is that your personal computer may not always be on or connected to the internet, making the possibility of a scheduled backup failing more likely. You’ll also probably have to pipe command output back through SSH, which not all people are familiar with. On the plus side, you don’t have to worry about copying files to your computer, as they’ll just be there once your backup completes.
Hybrid Backup Strategy
Another option is to run server side backups and have a local job automatically pull down new backup files. This option offers the reliability of server side backups, with the risk that something could happen to the server between the time that your backup completes and the local process downloads the files.
Once you’ve figured out how to automate your backup process, you should also make sure that you’re notified if something doesn’t work. This is the part that I think many people get wrong. You’ll often see scripts that attempt to build the notification in as part of the backup process. While that seems like a good idea at first, the problem is that if anything goes wrong with the backup script, the notification may never get sent. Recovery time is not when you want to discover a new bug with your backup code.
My preferred way to handle this is to have scripts do something to indicate success, and then I have a separate process that checks for success notifications and reports if any are missing. In a work environment this could be as simple as a web page with red/green icons for failure or success. The important thing is that you want something you look at regularly, where the lack of a recent success message is going to really stand out to you. On your personal computer, you could use something like growlnotify if you’re on a mac.
Depending on how paranoid you are, you may want to keep an offsite copy. You could burn a CD/DVD once a month and drop it off at a friend’s house, or maybe mail it to a PO box. There are also an ever increasing number of “cloud” solutions that let you store files out on the internet. You could use something like Dropbox, although it’s not a pure backup product. I’m currently trying out Mozy, which lets you encrypt backups with your own key. I haven’t used it enough to recommend it yet, but it does seem to address the confidentiality concern that many people have about backing up their files with somebody else.
Do a Recovery
Last but not least, you should really do a test recovery. The last thing you want is to put all this thought and effort into backing up your files, only to find out after a problem that your backups are useless. Most likely all you need is an Apache server and a MySQL database, both of which are free and extremely easy to find good documentation for.
Backing up your personal website is something you should strongly consider. If you’re familiar with shell scripting and standard UNIX tools, you should be able to put something simple together in a few hours. The basic process is to decide what your recoverability requirements are, choose where to run your backups and create some kind of notification mechanism. I’ve been working on a simple backup script for my own site, and I plan on making it publicly available in the near future.