Disaster Recovery
Azimuth uses Velero as a disaster recovery solution. Velero provides the ability to back up Kubernetes API resources to an object store and has a plugin-based system to enable snapshotting of a cluster's persistent volumes.
Warning
Backup and restore is only available for production-grade HA installations of Azimuth.
The Azimuth playbooks install Velero on the HA management cluster and the Velero CLI tool on the seed node. Once configured with the appropriate credentials, the installation process will create a Schedule on the HA cluster, which triggers a daily backup at midnight and cleans up backups older which are more than 1 week old.
The AWS Velero plugin is used for S3 support and the CSI plugin for volume snapshots. The CSI plugin uses Kubernetes generic support for Volume Snapshots, which is implemented for OpenStack by the Cinder CSI plugin.
Configuration
To enable backup and restore functionality, the following variables should be set in your environment:
velero_enabled: true
velero_s3_url: <object-store-endpoint-url>
velero_bucket_name: <name-of-an-existing-bucket>
velero_aws_access_key_id: <S3-access-key-id>
velero_aws_secret_access_key: <S3-secret-value>
Danger
The S3 credentials should be kept secret. If you want to keep them in Git - which is recommended - then it must be encrypted.
Velero CLI
The Velero installation process also installs the Velero CLI on the Azimuth seed node, which can be used to inspect the state of the backups:
# List the configured backup locations
velero backup-location get
# List the backups and their statuses
velero backup get
See velero -h
for other useful commands.
Restoring from a backup
To restore from a backup, you must first know the name of the target backup. This can be inferred from the object names in S3 if the Velero CLI is no longer available.
Once you have the name of the backup to restore, run the following command with your environment activated (similar to a provision):
This will provision a new HA cluster, restore the backup onto it and then bring the installation up-to-date with your configuration.
Performing ad-hoc backups
In order to perform ad-hoc backups using the same config parameters as the installed backup schedule, run the following Velero CLI command from the seed node:
velero backup create --from-schedule default
This will begin the backup process in the background. The status of this backup (and others) can be
viewed with the velero backup get
command shown above.
Tip
Ad-hoc backups will have the same time-to-live as the configured schedule backups (default = 7 days).
To change this, pass the --ttl <hours>
option to the velero backup create
command.
Modifying the backup schedule
The following config options are available for modifying the regular backup schedule:
# Whether or not to perform scheduled backups
velero_backup_schedule_enabled: true
# Name for backup schedule kubernetes resource
velero_backup_schedule_name: default
# Schedule to use for backups (defaults to every day at midnight)
# See https://en.wikipedia.org/wiki/Cron for format options
velero_backup_schedule_timings: "0 0 * * *"
# Time-to-live for existing backups (defaults to 1 week)
# See https://pkg.go.dev/time#ParseDuration for duration format options
velero_backup_schedule_ttl: "168h"
Note
Setting velero_backup_schedule_enabled: false
does not prevent the backup schedule from being
installed - instead it sets the schedule state to paused
.
This allows for ad-hoc backups to still be run on demand using the configured backup parameters.
Created: April 9, 2024