Add My Backup Strategy article

This commit is contained in:
Tanner Collin 2021-04-09 05:18:18 +00:00
parent 21c25413fc
commit 8fb1b29aef
2 changed files with 338 additions and 0 deletions

337
content/backup-strategy.md Normal file
View File

@ -0,0 +1,337 @@
Title: My Backup Strategy
Date: 2021-04-08
Category: Writing
Summary: Details about the backup system for my data.
Wide: true
[TOC]
Regularly backing up all the data I care about is very important to me. This
article outlines my strategy to make sure I never lose essential data.
## Motivation
Backups should be as automatic as possible. This ensures laziness and
forgetfulness won't interfere with the regularity.
All software used to create and store the backups should be free and open source
so I'm not depending on the survival of a company.
Backups need to be tested to ensure they are correct and happening regularly.
Multiple copies of the backups should exist, including at least one offsite to
protect against my building burning down.
Backups should also be incremental when possible (rather than mirror copies) so
an accidental deletion isn't propagated into the backups, making the file
irrecoverable.
## Strategy
I have one backup folder `/mnt/backup` on my media server at home that serves as
the destination for all my backup sources. All scheduled automatic backups write
to their own subfolder inside of it.
This backup folder is then synced to encrypted 2.5" 1 TB hard drives which I
rotate between my bag, offsite, and my parent's house.
## Backup Sources
I use the tool `rdiff-backup` extensively because it allows me to take
incremental backups locally or over SSH. It acts very similar to `rsync` and has
no configuration.
### Email
I have every email since 2010 backed up continuously in case my email provider
disappears.
I use `offlineimap` to sync my mail to the directory `~/email` on my media
server as a Maildir. Since offlineimap is only a syncing tool, the emails need
to be copied elsewhere to be backed up. I run `rdiff-backup` from a weekly cron
job:
```
*/15 * * * * offlineimap > /var/log/offlineimap.log 2>&1
00 12 * * 1 date -Iseconds > /home/email/email/backup_check.txt
20 12 * * 1 rdiff-backup /home/email/email /mnt/backup/local/email/
40 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/email/
```
Here's my `.offlineimaprc` for reference:
```
[general]
accounts = main
[Account main]
localrepository = Local
remoterepository = Remote
[Repository Local]
type = Maildir
localfolders = ~/email
[Repository Remote]
type = IMAP
readonly = True
folderfilter = lambda foldername: foldername not in ['Trash', 'Spam', 'Drafts']
remotehost = example.com
remoteuser = mail@example.com
remotepass = supersecret
sslcacertfile = /etc/ssl/certs/ca-certificates.crt
```
### Notes
I use Standard Notes to take notes and wrote the tool
[standardnotes-fs](https://github.com/tannercollin/standardnotes-fs) to mount my
notes as a file system to view and edit them as plain text files.
I take weekly backups of the mounted file system on my media server with cron:
```
00 12 * * 1 date -Iseconds > /home/notes/notes/backup_check.txt
15 12 * * 1 rdiff-backup /home/notes/notes /mnt/backup/local/notes/
```
### Nextcloud
I self-host a Nextcloud instance to store all my personal documents (non-code
projects, tax forms, spreadsheets, etc.). Since it's only a syncing software,
the files need to be copied elsewhere to be backed up.
I take weekly backups of the Nextcloud data folder with cron:
```
00 12 * * 1 rdiff-backup /var/www/nextcloud/data/tanner/files /mnt/backup/local/nextcloud/
30 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/nextcloud/
```
### Gitea
I self-host a Gitea instance to store all my git repositories for code-based
projects. My home folder is also a git repo so I can easily sync my config files
and password database between servers and machines.
I take weekly backups of the Gitea data folder with cron:
```
00 12 * * 1 date -Iseconds > /home/gitea/gitea/data/backup_check.txt
10 12 * * 1 rdiff-backup --exclude **data/indexers --exclude **data/sessions /home/gitea/gitea/data /mnt/backup/local/gitea/
35 12 * * 1 rdiff-backup --remove-older-than 12B --force /mnt/backup/local/gitea/
```
### Telegram
Telegram Messenger is my main app for communication. My parents, most of my
friends, and friend groups are on there so I don't want to lose those messages
in case Telegram disappears or my account gets banned.
Telegram includes a data export feature, but it can't be automated. Instead I
run the deprecated software
[telegram-export](https://github.com/expectocode/telegram-export) hourly with
cron:
```
0 * * * * bash -c 'timeout 50m /home/tanner/opt/telegram-export/env/bin/python -m telegram_export' > /var/log/telegramexport.log 2>&1
```
It likes to hang, so `timeout` kills it if it's still running after 50 minutes.
Hasn't corrupted the database yet.
### Phone
[Signal
Messenger](https://play.google.com/store/apps/details?id=org.thoughtcrime.securesms&hl=en_CA&gl=US)
automatically exports a copy of my text messages database, and
[Aegis](https://play.google.com/store/apps/details?id=com.beemdevelopment.aegis&hl=en_CA&gl=US)
allows me to export an encrypted JSON file of my two-factor authentication
codes.
I mount my phone's internal storage as a file system on my desktop using
[adbfs-rootless](https://github.com/spion/adbfs-rootless). I then rsync the
files over to my media server:
```
$ ./adbfs ~/mntphone
$ time rsync -Wav \
--exclude '*cache' --exclude nobackup \
--exclude '*thumb*' --exclude 'Telegram *' \
--exclude 'collection.media' \
--exclude 'org.thunderdog.challegram' \
--exclude '.trashed-*' --exclude '.pending-*' \
~/mntphone/storage/emulated/0/ \
localmediaserver:/mnt/backup/files/phone/
```
Unfortunately this is a manual process because I need to plug my phone in each
time. Ideally it would happen automatically while I'm asleep and the phone is
charging.
### Miscellaneous Files
The directory `/backup/files` is a repository for any kind of files I want to
keep forever. My phone data, old archives, computer files, Minecraft worlds,
files from previous jobs, and so on.
All the files will be included in the 1 TB hard drive backup rotations.
### Web Services
Web services that I run like [txt.t0.vc](https://txt.t0.vc) and
[QotNews](https://news.t0.vc) are backed up daily, weekly, and monthly depending
on how frequently the data changes.
I run `rdiff-backup` on the remote server with cron:
```
00 14 * * * date -Iseconds > /home/tanner/tbot/t0txt/data/backup_check.txt
04 14 * * * rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/daily/t0txt/
14 14 * * * rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/daily/t0txt/
24 14 * * 1 rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/weekly/t0txt/
34 14 * * 1 rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/weekly/t0txt/
44 14 1 * * rdiff-backup /home/tanner/tbot/t0txt/data tbotbak@remotebackup::/mnt/backup/remote/tbotbak/monthly/t0txt/
55 14 1 * * rdiff-backup --remove-older-than 12B --force tbotbak@remotebackup::/mnt/backup/remote/tbotbak/monthly/t0txt/
```
The user `tbotbak` has write access only to the `/mnt/backup/remote/tbotbak`
directory. It has its own passwordless SSH key that's only permitted to run the
`rdiff-backup --server` command for security.
### Protospace
I run a lot of services for [Protospace](https://protospace.ca/), my city's
makerspace.
The member portal I wrote called [Spaceport](https://my.protospace.ca/) creates
an archive I download daily:
```
40 10 * * * wget --content-disposition \
--header="Authorization: secretkeygoeshere" \
--directory-prefix /mnt/backup/remote/portalbak/ \
--no-verbose --append-output=/var/log/portalbackup.log \
https://api.my.protospace.ca/backup/
```
The main website and [wiki](https://wiki.protospace.ca) that I sysadmin gets
backed up weekly:
```
0 12 * * 1 mysqldump --all-databases > /var/www/dump.sql
15 12 * * 1 date -Iseconds > /var/www/backup_check.txt
20 12 * * 1 rdiff-backup /var/www pshostbak@remotebackup::/mnt/backup/remote/pshostbak/weekly/www/
```
The Protospace [Minecraft
server](http://games.protospace.ca:8123/?worldname=world&mapname=flat&zoom=3&x=74&y=64&z=354)
I run gets backed up daily:
```
00 15 * * * date -Iseconds > /home/tanner/minecraft/backup_check.txt
00 15 * * * rdiff-backup --exclude **CoreProtect --exclude **dynmap /home/tanner/minecraft psminebak@remotebackup::/mnt/backup/remote/psminebak/
30 15 * * * rdiff-backup --remove-older-than 12B --force psminebak@remotebackup::/mnt/backup/remote/psminebak/
```
I also back up our Google Drive with rclone:
```
45 12 * * 1 rclone copy -v protospace: /mnt/backup/files/protospace/google-drive/
```
## Backup Copies
My backup folder `/mnt/backup` now looks like this:
```
/mnt/backup/
├── files
│   ├── docs
│   ├── phone
│   ├── protospace
│   ├── telegram
│   ├── usbsticks
│   └── ... and so on
├── local
│   ├── email
│   ├── gitea
│   ├── nextcloud
│   └── notes
└── remote
├── portalbak
├── pshostbak
├── psminebak
├── tbotbak
└── telebak
```
This directory tree is the master backup and I make a copy of the entire tree
every Saturday to a hard drive.
The directory is copied over with the following script:
```text
#!/bin/bash
cryptsetup luksOpen /dev/sdf external
mount /dev/mapper/external /mnt/external
time rsync -av --delete /mnt/backup/local/ /mnt/external/backup/local/
time rsync -av --delete /mnt/backup/remote/ /mnt/external/backup/remote/
time rdiff-backup --force -v5 /mnt/backup/files/ /mnt/external/backup/files/
python3 /home/tanner/scripts/checkbackup.py
umount /mnt/external
cryptsetup luksClose external
```
I wrote a Python script `checkbackup.py` that goes through each backup and
compares the timestamp in `backup_check.txt` files to the current time. This
makes sure that the cron ran, backups were taken, and transferred over
correctly.
## Rotating Hard Drives
I rotate through 2.5" 1 TB hard drives each Saturday when I do a backup. They
are quite cheap at [$65 CAD](https://www.memoryexpress.com/Products/MX65194)
each so I can have a bunch floating around.
I keep one connected to the server, one in my bag, one offsite, one at my
mother's house, and one at my dad's house. Every Saturday I run the script above
to take a copy and then swap the drive with the one in my bag. It then gets
swapped when I visit my offsite location. Same for when I visit my parents. This
means that all hard drives eventually get rotated through with new data and
don't sit too long unpowered.
The drives are all encrypted with full-disk LUKS encryption using a password I'm
unlikely to forget.
I run the check-summing `btrfs` file system on them in RAID-1 to protect against
bitrot. This means I can only use 0.5 TB of storage for my backups, but the data
is stored redundantly.
Here's how I set up new hard drives to do this:
```
$ sudo cryptsetup luksOpen /dev/sdf external
$ sudo mkfs.btrfs -f -m dup -d dup /dev/mapper/external
$ sudo mount /dev/mapper/external /mnt/external/
$ sudo mkdir /mnt/external/backup
$ sudo chown -R tanner:tanner /mnt/external/backup
$ sudo umount /mnt/external
$ sudo cryptsetup luksClose external
```
## Future Improvements
I'm working on a system to automatically back up all my home directories to my
media server. I need this to grab Bash histories and code that's
work-in-progress. I've been burned by not having this once when a server died.
I'd like to automate backing up my phone by connecting it to a Raspberry Pi when
I go to sleep.
I need to get better at fully testing my backups by restoring them on a blank
machine.

View File

@ -2,6 +2,7 @@ Title: Choosing a Linux Flavour
Date: 2020-10-31 Date: 2020-10-31
Category: Writing Category: Writing
Summary: A recommendation on which flavour of Linux to run. Summary: A recommendation on which flavour of Linux to run.
Wide: true
[TOC] [TOC]