Backing Up and Restoring a k0rdent Management Cluster#
Any production system needs to provide Disaster Recovery features, and the heart of these capabilities is the ability to perform backup and restore operations. In this chapter we'll look at backing up and restoring k0rdent management cluster so that in case of an emergency, you can restore your system to its previous condition or recreate it on another cluster.
While it's possible to back up a Kubernetes cluster manually, it's better to build on the work
of others. In this case we're going to leverage the velero
project for backup management
on the backend and see how it integrates with k0rdent to ensure data persistence and recovery.
Motivation#
The primary goal of this feature is to provide a reliable and efficient way to back up and restore
a k0rdent deployment in the event of a disaster that impacts the management cluster.
By using velero
as the backup provider, we can create consistent backups across
different cloud storage options while maintaining the integrity of critical resources.
The main goal of the feature is to provide:
- Management Backup: The ability to backup all configuration objects created and managed by k0rdent, including into an offsite location.
- Restore: The ability to create configuration objects from a specific Management Backup in order to create a management cluster in the same state that existed at the time of backup without (re)provisioning of cloud resources.
- Disaster Recovery: The ability to restore k0rdent on another management cluster, plus ensuring that clusters are not recreated or lost.
- Rollback: The possibility to manually restore after a specific event, such as a failed k0rdent upgrade
Velero as Provider for Management Backups#
Velero
is an open-source tool that simplifies backing up and restoring clusters as well as individual resources. It seamlessly integrates into the k0rdent management environment to provide robust disaster recovery capabilities.
The velero
instance is part of the Helm chart that installs k0rdent, which means that it
can be customized if necessary.
k0rdent manages the schedule and is responsible for collecting data to be included in a backup.
Scheduled Management Backups#
Backups should be scheduled on a regular basis, depending on how often information changes.
Preparation#
Before you create a scheduled backup, you need to perform a few preparatory steps:
-
If no
velero
plugins have been installed as suggested in the corresponding section, install it by modifying theManagement
object:apiVersion: k0rdent.mirantis.com/v1alpha1 kind: Management metadata: name: kcm spec: # ... core: kcm: config: velero: initContainers: - name: velero-plugin-for-<provider-name> image: velero/velero-plugin-for-<provider-name>:<provider-plugin-tag> imagePullPolicy: IfNotPresent volumeMounts: - mountPath: /target name: plugins # ...
-
Prepare a cloud storage location, such as an Amazon S3 bucket, to which to save backups.
-
Create a
BackupStorageLocation
object referencing aSecret
with credentials to access the cloud storage (if the multiple credentials feature is supported by the plugin).For example, if you are using Amazon S3, your
BackupStorageLocation
and the relatedSecret
might look like this:--- apiVersion: v1 data: # base64-encoded credentials for Amazon S3 in the following format: # [default] # aws_access_key_id = EXAMPLE_ACCESS_KEY_ID # aws_secret_access_key = EXAMPLE_SECRET_ACCESS_KEY cloud: WW2RlZmF1bHRdCmF3c19hY2Nlc3Nfa2V5X2lkID0gRVhBTVBMRV9BQ0NFU1NfS0VZX0lECmF3c19zZWNyZXRfYWNjZXNzX2tleSA9IEVYQU1QTEVfU0VDUkVUX0FDQ0VTU19LRVkKICA= kind: Secret metadata: name: cloud-credentials namespace: kcm-system type: Opaque --- apiVersion: velero.io/v1 kind: BackupStorageLocation metadata: name: aws-s3 namespace: kcm-system spec: config: region: <your-region-name> default: true # optional, if not set, then storage location name must always be set in ManagementBackup objectStorage: bucket: <your-bucket-name> provider: aws backupSyncPeriod: 1m credential: name: cloud-credentials key: cloud
You can get more information on how to build these objects at the official Velero documentation.
Create a Management Backup#
Periodic backups are handled by a ManagementBackup
object, which uses a Cron expression
for its .spec.schedule
field.
If the .spec.schedule
field is not set, a backup on demand will be created instead.
Optionally, set the name of the BackupStorageLocation
.spec.backup.storageLocation
.
The default location is the BackupStorageLocation
object with .spec.default
set to true
.
For example, you can create a ManagementBackup
object that backs up to the storage object created in the previous step
every 6 minutes would look like this:
apiVersion: k0rdent.mirantis.com/v1alpha1
kind: ManagementBackup
metadata:
name: kcm
spec:
schedule: "0 */6 * * *"
storageLocation: aws-s3
Management Backup on Demand#
To create a single backup of the existing k0rdent management cluster information, you can create a ManagementBackup
object
using a YAML document and the kubectl
CLI. The object then creates only one instance of a backup. For example you can backup
to the location created earlier:
apiVersion: k0rdent.mirantis.com/v1alpha1
kind: ManagementBackup
metadata:
name: example-backup
spec:
storageLocation: aws-s3
What's Included in the Management Backup#
The backup includes all of k0rdent kcm
component resources, parts of the cert-manager
components required for other components creation, and all the required resources
of CAPI
and the ClusterDeployment
s currently in use in the management cluster.
By default, objects satisfying these labels will be included in the backup:
cluster.x-k8s.io/cluster-name="<cluster-deployment-name>"
helm.toolkit.fluxcd.io/name="<cluster-deployment-name>"
cluster.x-k8s.io/provider="bootstrap-<provider>"
cluster.x-k8s.io/provider="control-plane-<provider>"
cluster.x-k8s.io/provider="infrastructure-<provider>"
cluster.x-k8s.io/provider="cluster-api"
controller.cert-manager.io/fao="true"
k0rdent.mirantis.com/component="kcm"
An example sorted set of labels, objects satisfying these labels will be included in the backup:
cluster.x-k8s.io/cluster-name="some-cluster-deployment-name"
cluster.x-k8s.io/provider="bootstrap-k0sproject-k0smotron"
cluster.x-k8s.io/provider="cluster-api"
cluster.x-k8s.io/provider="control-plane-k0sproject-k0smotron"
cluster.x-k8s.io/provider="infrastructure-aws"
controller.cert-manager.io/fao="true"
helm.toolkit.fluxcd.io/name="some-cluster-deployment-name"
k0rdent.mirantis.com/component="kcm"
Restoration#
Note
Please refer to the official migration documentation to familiarize yourself with potential limitations of the Velero backup system.
In the event of disaster, you can restore from a backup by doing the following:
-
Create a clean k0rdent installation, including
velero
and its plugins. Specifically, you want to avoid creating aManagement
object and similar objects because they will be part of your restored cluster. You can remove these objects after installation, but you can also install k0rdent without them in the first place:helm install kcm oci://ghcr.io/k0rdent/kcm/charts/kcm \ --version <version> \ --create-namespace \ --namespace kcm-system \ --set controller.createManagement=false \ --set controller.createAccessManagement=false \ --set controller.createRelease=false \ --set controller.createTemplates=false \ --set velero.initContainers[0].name=velero-plugin-for-<provider-name> \ --set velero.initContainers[0].image=velero/velero-plugin-for-<provider-name>:<provider-plugin-tag> \ --set velero.initContainers[0].volumeMounts[0].mountPath=/target \ --set velero.initContainers[0].volumeMounts[0].name=plugins
-
Create the
BackupStorageLocation
/Secret
objects that were created during the preparation stage of creating a backup (preferably the same depending on a plugin). -
Restore the
kcm
system creating theRestore
object. Note that it is important to set the.spec.existingResourcePolicy
field value toupdate
:apiVersion: velero.io/v1 kind: Restore metadata: name: <restore-name> namespace: kcm-system spec: backupName: <backup-name> existingResourcePolicy: update includedNamespaces: - '*'
-
Wait until the
Restore
status isCompleted
and allkcm
components are up and running.
Caveats#
For some CAPI providers it is necessary to make changes to the Restore
object due to the large number of different resources and logic in each provider.
The resources described below are not excluded from a ManagementBackup
by
default to avoid logical dependencies on one or another provider, and to create a provider-agnostic system.
Note
The described caveats apply only to the Restore
object creation step and do not affect the other steps.
Azure (CAPZ)#
The following resources should be excluded from the Restore
object:
natgateways.network.azure.com
resourcegroups.resources.azure.com
virtualnetworks.network.azure.com
virtualnetworkssubnets.network.azure.com
Due to the webhook conversion,
objects of these resources cannot be restored, and they will
be created in the management cluster by the CAPZ
provider
automatically with the same spec
as in the backup.
The resulting Restore
object:
apiVersion: velero.io/v1
kind: Restore
metadata:
name: <restore-name>
namespace: kcm-system
spec:
backupName: <backup-name>
existingResourcePolicy: update
excludedResources:
- natgateways.network.azure.com
- resourcegroups.resources.azure.com
- virtualnetworks.network.azure.com
- virtualnetworkssubnets.network.azure.com
includedNamespaces:
- '*'
vSphere (CAPV)#
The following resources should be excluded from the Restore
object:
mutatingwebhookconfiguration.admissionregistration.k8s.io
validatingwebhookconfiguration.admissionregistration.k8s.io
Due to the Velero Restoration Order,
some of the CAPV
core objects cannot be restored,
and they will not be recreated automatically.
Because all of the objects have already passed both mutations
and validations, there is not much sense in validating them again.
The webhook configurations will be restored during installation
of the CAPV
provider.
The resulting Restore
object:
apiVersion: velero.io/v1
kind: Restore
metadata:
name: <restore-name>
namespace: kcm-system
spec:
backupName: <backup-name>
existingResourcePolicy: update
excludedResources:
- mutatingwebhookconfiguration.admissionregistration.k8s.io
- validatingwebhookconfiguration.admissionregistration.k8s.io
includedNamespaces:
- '*'
Upgrades and rollbacks#
The Disaster Recovery Feature provides a way to create backups
on each kcm
upgrade automatically.
Automatic Management Backups#
Each ManagementBackup
with a non-empty .spec.schedule
field
can enable the automatic creation of backups before
upgrading to a new version.
To enable, set the .spec.performOnManagementUpgrade
to true
.
For example, a ManagementBackup
object with enabled auto-backup before the kcm
version upgrade looks like this:
apiVersion: k0rdent.mirantis.com/v1alpha1
kind: ManagementBackup
metadata:
name: example-backup
spec:
schedule: "0 */6 * * *"
performOnManagementUpgrade: true
After the enablement, before each upgrade of kcm
to a new version,
a new backup will be created.
Automatically created backups have the
following name template to make it easier to find them:
the name of the ManagementBackup
object with enabled performOnManagementUpgrade
concatenates with the name of the release before the upgrade,
for example, example-backup-kcm-0-1-0
.
Automatically created backups have the label k0rdent.mirantis.com/release-backup
with the name of the release before the upgrade as its value
to simplify querying if required.
Rollbacks#
If during the kcm
upgrade a failure happens, a rollback operation
should be performed to restore the kcm
to its before-the-upgrade state:
-
Follow the first 2 steps from the restoration section, creating a clean
kcm
installation andBackupStorageLocation
/Secret
.Warning
Please consider the restoration caveats section before proceeding.
-
Create the
ConfigMap
object with patches to revert theManagement
.spec.release
, substitute the<version-before-upgrade>
with the version ofkcm
before the upgrade, and create theRestore
object, propagating theConfigMap
to it:--- apiVersion: v1 data: patch-mgmt-spec-release: | version: v1 resourceModifierRules: - conditions: groupResource: managements.k0rdent.mirantis.com patches: - operation: replace path: "/spec/release" value: "<version-before-upgrade>" kind: ConfigMap metadata: name: patch-mgmt-spec-release namespace: kcm-system --- apiVersion: velero.io/v1 kind: Restore metadata: name: <restore-name> namespace: kcm-system spec: backupName: <backup-name> existingResourcePolicy: update includedNamespaces: - '*' resourceModifier: # propagate patches kind: ConfigMap name: patch-mgmt-spec-release
-
Wait until the
Restore
status isCompleted
and allkcm
components are up and running. - Optionally delete the created
ConfigMap
.
Caveats / Limitations#
The credentials stored in backups can and will get stale, so a proper rotation should be considered beforehand.
All velero
caveats and limitations are transitively implied in k0rdent
. In particular, that
means no backup encryption is provided until it is implemented by a velero
plugin that supports
encryption and cloud storage backups.
Velero Backups / Restores deletion#
Delete Restores#
To delete a velero
Restore
from the management cluster
and from cloud storage, delete restores.velero.io
object(s),
such as with the following command:
kubectl delete restores.velero.io -n kcm-system <restore-name>
Warning
Deletion of a Restore
object deletes it from both
the management cluster and from cloud storage.
Delete Backups#
To remove a velero
Backup
from the management cluster,
delete backups.velero.io
object(s), such as with the following command:
kubectl delete backups.velero.io -n kcm-system <velero-backup-name>
Hint
The command above only removes objects from the cluster; the data continues to persist on the cloud storage.
The deleted object will be recreated in the
cluster if its BackupStorageLocation
.spec.backupSyncPeriod
is set and does not equal 0
.
To delete a velero
Backup
from the management cluster
and from cloud storage, create the following DeleteBackupRequest
object:
apiVersion: velero.io/v1
kind: DeleteBackupRequest
metadata:
name: delete-backup-completely
namespace: kcm-system
spec:
backupName: <velero-backup>
Warning
Deletion of a Backup
object via the DeleteBackupRequest
deletes it from both
the management cluster and from the cloud storage.
Optionally, delete the created DeleteBackupRequest
object
from the cluster after Backup
has been deleted.
For reference, follow the official documentation.
Customization#
This section covers different topics of customization regarding backing up and restoring k0rdent.
Velero installation#
The Velero helm chart is supplied with the k0rdent helm chart and is enabled by default. There are 2 ways of customizing the chart values:
-
Install using
helm
and add corresponding parameters to thehelm install
command.Note
Only a plugin is required during restoration; the other parameters are optional.
For example, this command installs k0rdent via
helm install
with a configured plugin,BackupStorageLocation
and propagated credentials:helm install kcm oci://ghcr.io/k0rdent/kcm/charts/kcm \ --version <version> \ --create-namespace \ --namespace kcm-system \ --set-file velero.credentials.secretContents.cloud=<FULL PATH TO FILE> \ --set velero.credentials.useSecret=true \ --set velero.backupsEnabled=true \ --set velero.configuration.backupStorageLocation[0].name=<backup-storage-location-name> \ --set velero.configuration.backupStorageLocation[0].provider=<provider-name> \ --set velero.configuration.backupStorageLocation[0].bucket=<bucket-name> \ --set velero.configuration.backupStorageLocation[0].config.region=<region> \ --set velero.initContainers[0].name=velero-plugin-for-<provider-name> \ --set velero.initContainers[0].image=velero/velero-plugin-for-<provider-name>:<provider-plugin-tag> \ --set velero.initContainers[0].volumeMounts[0].mountPath=/target \ --set velero.initContainers[0].volumeMounts[0].name=plugins
-
Create or modify the existing
Management
object in the.spec.config.kcm
.Note
Only a plugin is required during restoration; the other parameters are optional.
For example, this is a
Management
object with a configured plugin and enabled metrics:apiVersion: k0rdent.mirantis.com/v1alpha1 kind: Management metadata: name: kcm spec: # ... core: kcm: config: velero: initContainers: - name: velero-plugin-for-<provider-name> image: velero/velero-plugin-for-<provider-name>:<provider-plugin-tag> imagePullPolicy: IfNotPresent volumeMounts: - mountPath: /target name: plugins metrics: enabled: true # ...
To fully disable velero
, set the velero.enabled
parameter to false
.
Schedule Expression Format#
The ManagementBackup
.spec.schedule
field accepts a correct
Cron expression,
along with the
nonstandard predefined scheduling definitions
and an extra definition @every
with a number and a valid time unit
(valid time units are ns
, us
(or µs
), ms
, s
, m
, h
).
The following list contains acceptable .spec.schedule
example values:
0 */1 * * *
(standard Cron expression)@hourly
(nonstandard predefined definition)@every 1h
(extra definition)
Putting Extra Objects in a Management Backup#
If you need to back up objects other than those backed up by default,
you can add the label k0rdent.mirantis.com/component="kcm"
to these objects.
All objects containing the label will be automatically added to the management backup.