Python Based Backup Script for Linux

Here at CDOT, our current backup solution was a little archaic, and hard to expand on. I decided to make a new method of backup that can be run from a single computer and backup our entire infrastructure. This script is currently, as I’m writing this not in a finished state, however it is in a state where it works and is usable as a replacement to our previous system. I would like to pose a warning that this method of backup across systems is not a very secure method, and it does pose security threats. Since it does require you to give some users nopasswd sudo access to some or all programs. I am looking for a way around this, and would appreciate any input on this matter.

Here is a copy of the script:

There were a few goals that were kept in mind with this script:
– Script resides on a single computer (complete)
– Do not run multiple backups using the same hard drive (complete)
– Check space requirements before performing a backup on source and destination (in progress)
– Emails out daily reports on success or fail (not complete)
– Logs all information /var/log/smart-bk/ (complete)
– Easy(ish) to add a new backup schedule (complete)
– Can view all backups that are currently running (complete)
– Can view all the backups in the queue to run (complete)
– Can view all the schedules that are added (complete)
– Records a record of all previously run backups (not complete)
– Website to view status of currently running backups (not complete)

At this time, not all of these goals have been completed, but I would like them to be sooner or later. Right now I’m setting up a little documentation on how it currently works, what it’s missing, and what my next steps will be.

Scheduler System
The main chunk of the script is setting up a scheduler system. A person or script will add backups they would like to be performed to a schedule using specific parameters. A schedule looks like this:

id|day|time|type|source host|dest host|source dir|dest dir|source user|dest user

What do these fields mean?

id - This is just a unique field identifier.
day - This is the day the backup last was last run. This is used to check if the schedule is expired(in the past) or has already completed.
time - This is the time at which the backup will start. This allows you to order different schedules to happen earlier or later in the day.
type - This is the type of backup. Currently there are 3.
     - archive backup wraps the directory specified in a tar archive and compresses it with bzip. Uses options: tar -cpjvf
     - rsync is a very simple rsync that preserves most things. Uses options: rsync -aHAXEvz
     - dbdump backup, this is specifically a koji db backup currently. Uses options: pg_dump koji
source_host - This host is the target for backup. You want the files backup up from here.
dest_host - This host is your backup storage location. All files backed up will go here.
source_dir - This directory correlates to source_host. This is the directory that is backed up.
dest_dir - This directory correlates to dest_host. This is where the backup is stored.
source_user - User to use on the source host.
dest_user - User to use on the dest host.

All data for this script is stored inside a sqlite3 db.

sqlite> .schema 
CREATE TABLE Queue(scheduleid INTEGER, queuetime TEXT, FOREIGN KEY(scheduleid) REFERENCES Schedule(id));
CREATE TABLE Running(scheduleid INTEGER, starttime TEXT, FOREIGN KEY(scheduleid) REFERENCES Schedule(id));
CREATE TABLE Schedule(id INTEGER PRIMARY KEY, day TEXT, time TEXT, type TEXT, source_host TEXT, dest_host TEXT, source_dir TEXT, dest_dir TEXT, source_user TEXT, dest_user TEXT);

How To Use sbk
Checking all the available options:

[backup@bahamas ~]$ sbk -h


Usage: sbk [options]

The smart backup scheduler program sbk is used to run backups from computer to
computer. sbk does this by adding and removing schedules from a schedule
database. Once added to the schedule database, sbk should be run with '--
queue' in order to intelligently add hosts to a queue and start running
backups. It is recommended to run this as a cron job fairly often, more
fequently depending on the number of schedules.

  -h, --help          show this help message and exit
  -q, --queue         queue schedules and start backups
  -a, --add           add new schedule at specific time
  -s, --show          show the schedule and host info
  -r, --remove        remove existing schedule
  --remove-queue      remove existing schedule from queue
  --remove-run        remove existing schedule from running
  --expire            expire the day in schedule
  --add-queue         add a single schedule to queue
  --sid=scheduleid    specify schedule id for removing schedules
  --time=18:00        specify the time to run the backup
  --backup-type=type  archive, pg_dump, rsync
  --source-host=host  specify the source backup host
  --source-dir=dir    specify the source backup dir
  --source-user=user  specify the source user
  --dest-host=host    specify the destination backup host
  --dest-dir=dir      specify the destination backup dir
  --dest-user=user    specify the destination user
  --log-dir=dir       specify the directory to save logs

Showing Schedule Information
Show all schedules, schedules in queue, and running schedules:

[backup@bahamas ~]$ sbk -s


id|day|time|type|source host|dest host|source dir|dest dir|source user|dest user

|schedule id|queue time|

|schedule id|start time|

Adding new schedules
All of these options are unfortunately required.
Add a new schedule:

[backup@bahamas ~]$ sbk --add  --time=11:00 --backup-type=archive --source-host=japan --dest-host=bahamas --source-dir=/etc/ --dest-dir=/data/backup/japan/etc/ --source-user=backup --dest-user=backup

Removing schedules
In order to remove a schedule, a “sid” must be specified. This is simply the “id” of the schedule, which is unique to schedules.
Remove a schedule:

[backup@bahamas ~]$ sbk --remove --sid=1

Start the Backups
Start intelligently queuing schedules and starting backups(best to run this in crontab:

sbk -q
sbk --queue

If you found this post interesting, there is more information about this backup system and it’s uses on the zenit wiki


About oatleywillisa

Computer Networking Student
This entry was posted in SBR600 and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s