Maintenance:Backups

From Cerb Wiki
Jump to: navigation, search

Introduction

To fully protect your Cerb4 helpdesk data you need to backup both the MySQL database and the /cerb4/storage/ filesystem. While there are countless good approaches for performing backups, this document will focus on the best practices we've discovered over the past several years of hosting hundreds of helpdesk instances on our On-Demand network. The examples will be Unix-based, since the command line is the land of milk and honey (and flexible automation).

Requirements

  • A Unix-based server with shell access
  • A Cerberus Helpdesk 4.0 installation

Setting up the environment

Creating a backups user

For convenience and permissions, it's a good idea to make a backups user on the local system. If at all possible, you should put the backups user home directory on a different hard disk than your live databases to provide for fault tolerance and better write performance. The examples below will refer to this location as ~backups. A separate location is important -- while a RAID configuration will protect you from the failure of individual storage hardware devices, it won't protect you from filesystem corruption or non-hardware-related data loss (e.g. bugs, errant queries, maliciousness).

  • You can usually accomplish this with something like:
adduser --home /backups --disabled-password --disabled-login backups

Creating a backups database user with a shadow password

It's a really smart idea to make a separate backup user that is read-only, especially when you start automating backups.

  • In MySQL you can do this with the following query (make up your own password!):
GRANT SELECT, RELOAD, LOCK TABLES ON *.* TO backups@localhost IDENTIFIED BY 's3cretp4ssw0rd';
  • You should then create a shadow file which will "securely" store your password for automation:
echo -n "s3cretp4ssw0rd" > ~backups/.db.shadow; #put the password text inside a hidden file
chown backups:backups ~backups/.db.shadow; #make the backups user the owner
chmod 400 ~backups/.db.shadow; #make the file read-only by the owner and invisible to world

In the examples below we'll use this shadow file in place of literally typing the password on the command line. In addition to enabling automation, this also helps prevent sensitive information from being visible to other users in the global process list.

Backing up the database

The database stores the majority of your helpdesk information.

Using mysqlhotcopy (recommended)

One of the quickest ways to backup (and restore) a MyISAM-based MySQL database is to use the mysqlhotcopy tool, which copies the raw .frm, .MYD, and .MYI files to a new location. This utility will flush any pending row changes from memory to the disk and then lock the tables while copying them to a new location. Unless your database is huge (relative to your hardware), or your disk or network I/O is very slow, you should be able to hotcopy a live database with minimal interruption. If you're using replication you can make hotcopies of a slave without any service interruption.

Pros:

  • It's as fast as your disk or network I/O.
  • Restoring a hotcopy on similar hardware is nearly instantaneous.
  • It's very flexible, allowing for table inclusion/exclusion by regexp patterns, truncating indexes, redirecting output over SCP, etc.

Cons:

  • It's Unix only.
  • It's only compatible with MyISAM tables (not InnoDB).
  • You may have issues restoring a hotcopy on a new machine with a significantly different architecture (e.g. 32-bit to 64-bit and endianness).

Usage: (to local filesystem) <geshi lang=text> mysqlhotcopy -u backups -p`cat ~backups/.db.shadow` --addtodest --noindices c4_database ~backups/dbs/ </geshi>

Usage: (to SCP)

mysqlhotcopy -u backups -p`cat ~backups/.db.shadow` --addtodest --noindices \ # wrap
 --method='scp -c arcfour -C -2' c4_database backups@remotehost:~backups/dbs/

Backing up the storage filesystem

The /cerb4/storage filesystem stores the pending mail parser queue, import queue, and all the file attachments from mail. It's the only filesystem hierarchy you need to backup for a full recovery (the rest of the files are temporary caches, or can be downloaded from the project website). Since the bulk of the storage directory is comprised of tons of small file attachments that will never be modified (only deleted), it's the ideal candidate for incremental backups.

Using rsync (recommended)

rsync is one of the simplest ways to copy only changed files to a new location. In a nutshell, its purpose is to keep two copies of the same directory in-sync.

Pros:

  • It's available on most Unix distributions.
  • It's fast and flexible.
  • You can use key-based authentication to automate remote copies.

Cons:

  • It's Unix-only, though there are several clones and ports for Windows.

Usage: (to local filesystem)

rsync -a --verbose --delete /path/to/cerb4/storage ~backups/storage

Usage: (to SSH)

rsync -aze ssh --verbose --delete /path/to/cerb4/storage backups@remotehost:~backups/storage

Tips:

  • When dealing with directories in rsync, including a trailing slash (storage/) means to copy the directory's contents and not the directory. Excluding the trailing slash (storage) will copy a directory *and* its contents.
  • The --delete option will remove files from the destination directory that are no longer in the source directory. Since this can be dangerous if you mistype a directory, you may omit this option until you're confident things work.

Keeping off-site backups

It's crucial to assume that anything that *can* go wrong *will* go wrong (at some point). You can't trust your local RAID, your server, or your datacenter, to store the only copy of data that your business is doomed without.

At the simplest, off-site backups may involve downloading a copy of your backups to your office and burning an extra copy to DVD. Keep in mind, it does you no good to have 250GB of backups on your office network with a 256Kbps upstream to your datacenter. However, there's something to be said for the secure feeling of having tangible, offline copy of your critical data.

If you have the resources, you may also choose to have a standby server in a different location than your production server. This would allow you to make server-to-server backups, which would require two hardware configurations, or two datacenters, to fail at the same time before you follow them into the blackness of failure.

Our favorite choice, cloud and utility computing, also provides a great opportunity for off-site backups, since you can store massive amounts of data, highly-redundantly, for a few bucks a month; and you'll likely be able to move data to and from a cloud computing network MUCH faster than using your office DSL.

Using Amazon EC2/S3 (recommended)

Amazon S3 is a storage service. At the time of this writing, Amazon S3 costs 10 cents (USD $) per month per gigabyte stored. For that price, your data is protected by being replicated to multiple locations. You also get reasonably fast network access to it (generally 10-20 MB/sec for us at WGM in Southern California). That's $1/mo per redundant 10GB! Be sure to read the terms on their site, as there are also similar rates for bandwidth (though most of the time you'll just be uploading).

Pros:

  • It's highly redundant and secure.
  • It's inexpensive.
  • Amazon is a very well known and respected online technology company.

Cons:

  • Cloud and utility computing are in their infancy, and some growing pains are inevitable.
  • It costs money (but it's worth every penny!)

Usage:

  • Sign up for Amazon Web Services (click 'Sign Up' on the right).
    Backups aws signup.png
  • Once your registration is complete, click on 'AWS Access Identifiers' from the account menu.
    Backups aws accessid.png
  • Make a note of your Access Key ID and Secret Access Key. You'll need to plug these values into the various tools you use to interact with their services.
    Backups aws access secret.png

Using Jets3t at the server command line

The Jets3t project provides a Synchronize tool that works much like rsync, replicating changes from a local directory structure to a remote S3 bucket. It requires a Java Runtime Environment (JRE) of version 1.5 or later to be available.

Usage:

~backups/jets3t/bin/synchronize.sh -k UP yourbucket/backups/server1/dbs/20080912 *.gz

Tips:

  • The -k option above prevents files from being deleted from S3 if they no longer exist in the local filesystem.
  • If you don't require SSL (i.e. your content is public) you can modify jets3t/config/jets3t.properties and set s3service.https-only to false. This should give you a moderate speed boost on uploading.

Using S3Fox in Firefox3

Our favorite tool for browsing S3 buckets is the S3Fox extension for Firefox. Once installed, you can simply drag files back and forth between your S3 account and your local machine. It will also allow you to modify the access level (ACL) of any file, which gives you the option of creating a publicly sharable URL. This is great, since S3 is useful for far more than just off-site backups -- you can host downloads or high-resolution screencasts without bogging down your servers (and paying a bandwidth ransom at the datacenter) if your content becomes popular. Backups aws s3fox.png