Restoring From Backup#
Note
Please refer to the official migration documentation to familiarize yourself with potential limitations of the Velero backup system.
In the event of disaster, you can restore from a backup by doing the following:
-
Create a clean k0rdent installation, including
veleroand its plugins. Specifically, you want to avoid creating aManagementobject and similar objects because they will be part of your restored cluster. You can remove these objects after installation, but you can also install k0rdent without them in the first place:helm install kcm oci://ghcr.io/k0rdent/kcm/charts/kcm \ --version <version> \ --create-namespace \ --namespace kcm-system \ --set controller.createManagement=false \ --set controller.createAccessManagement=false \ --set controller.createRelease=false \ --set controller.createTemplates=false \ --set regional.velero.initContainers[0].name=velero-plugin-for-<provider-name> \ --set regional.velero.initContainers[0].image=velero/velero-plugin-for-<provider-name>:<provider-plugin-tag> \ --set regional.velero.initContainers[0].volumeMounts[0].mountPath=/target \ --set regional.velero.initContainers[0].volumeMounts[0].name=plugins -
Create the
BackupStorageLocation/Secretobjects that were created during the preparation stage of creating a backup (preferably the same depending on a plugin). -
If a k0rdent management cluster version is less than
1.2.0, apply the workaround to avoid failure during the restoration. Related known issue: Velero #9023.The fix is to "rename" the Velero deployment via the override: in the
Managementobject, add to the pathspec.core.kcm.config.veleroa new keyfullnameOverridewith the valuevelero.Example of a patch if the path
spec.core.kcm.config.velerodoes not yet exist:kubectl patch managements kcm \ --type=json \ -p='[{"op": "add", "path": "/spec/core/kcm/config/velero", "value": {"fullnameOverride": "velero"}}]'Example how it should look after the required change:
spec: core: kcm: config: velero: fullnameOverride: veleroThis ensures that the Velero
Deploymentname is exactlyvelero, which is a requirement due to the aforementioned known issue. -
Restore the
kcmsystem creating theRestoreobject. Follow one of the case that is applicable to clusters' configuration in use:-
If there are no regional clusters or all regional clusters' infrastructure is healthy.
Note that it is important to set the
.spec.existingResourcePolicyfield value toupdate.apiVersion: velero.io/v1 kind: Restore metadata: name: <restore-name> namespace: kcm-system spec: backupName: <backup-name> existingResourcePolicy: update includedNamespaces: - '*' -
If one or more regional clusters require reprovisioning.
The following listing will create a
ConfigMapobject along with theRestoreobject, and it allowsVeleroto set the pause annotation to all ofregionsobjects.Substitute
<cluster-deployment-name>with the corresponding names ofClusterDeploymentobjects used for provisioning of the corresponding region cluster.Note that it is important to set the
.spec.existingResourcePolicyfield value toupdate.--- apiVersion: v1 kind: ConfigMap metadata: name: add-region-pause-anno namespace: kcm-system data: add-region-pause-anno: | version: v1 resourceModifierRules: - conditions: groupResource: regions.k0rdent.mirantis.com mergePatches: - patchData: | { "metadata": { "annotations": { "k0rdent.mirantis.com/region-pause": "true" } } } --- apiVersion: velero.io/v1 kind: Restore metadata: name: <restore-name> namespace: kcm-system spec: backupName: <backup-name> existingResourcePolicy: update includedNamespaces: - '*' labelSelector: matchExpressions: - key: cluster.x-k8s.io/cluster-name operator: NotIn values: ["<cluster-deployment-name>"] # Add new entries accordingly if more regional clusters require reprovisioning # - key: cluster.x-k8s.io/cluster-name # operator: NotIn # values: ["<cluster-deployment-name>"] resourceModifier: kind: ConfigMap name: add-region-pause-anno
-
-
Wait until the
Restorestatus isCompletedand allkcmcomponents are up and running. -
If there were one or more regional clusters that required reprovisioning, then:
-
On the management cluster, wait for the
regionsobject readiness:kubectl wait regions kcm --for=condition=Ready=True --timeout 30m -
Manually ensure that the freshly reprovisioned regional cluster runs and is accessable.
-
On the regional cluster, repeat the second step, creating the
BackupStorageLocation/Secretobjects that were created during the preparation stage. -
On the regional cluster, restore the cluster by creating a new
Restoreobject:Note that in this case the
.spec.existingResourcePolicyfield is not set.apiVersion: velero.io/v1 kind: Restore metadata: name: <restore-name> namespace: kcm-system spec: backupName: <region-name>-<backup-name> excludedResources: - mutatingwebhookconfiguration.admissionregistration.k8s.io - validatingwebhookconfiguration.admissionregistration.k8s.io includedNamespaces: - '*' -
On the regional cluster, wait until the
Restorestatus isCompletedand allClusterDeploymentobjects are ready. -
On the management cluster, unpause provisioning of regional
ClusterDeploymentobjects by removing annotation from theregionsobject:kubectl annotate regions <region-name> 'k0rdent.mirantis.com/region-pause-'
-
Caveats#
For some CAPI providers it is necessary to make changes to the Restore
object due to the large number of different resources and logic in each provider.
The resources described below are not excluded from a ManagementBackup by
default to avoid logical dependencies on one or another provider, and to create a provider-agnostic system.
Note
The described caveats apply only to the Restore
object creation step and do not affect the other steps.
Note
The below mentioned exclusions (excludedResources)
are applicable to any of the Restore examples on
this page, including those tailored for regional clusters.
Azure (CAPZ)#
The following resources should be excluded from the Restore object:
natgateways.network.azure.comresourcegroups.resources.azure.comvirtualnetworks.network.azure.comvirtualnetworkssubnets.network.azure.com
Due to the webhook conversion,
objects of these resources cannot be restored, and they will
be created in the management cluster by the CAPZ provider
automatically with the same spec as in the backup.
The resulting Restore object:
apiVersion: velero.io/v1
kind: Restore
metadata:
name: <restore-name>
namespace: kcm-system
spec:
backupName: <backup-name>
existingResourcePolicy: update
excludedResources:
- natgateways.network.azure.com
- resourcegroups.resources.azure.com
- virtualnetworks.network.azure.com
- virtualnetworkssubnets.network.azure.com
includedNamespaces:
- '*'
vSphere (CAPV)#
The following resources should be excluded from the Restore object:
mutatingwebhookconfiguration.admissionregistration.k8s.iovalidatingwebhookconfiguration.admissionregistration.k8s.io
Due to the Velero Restoration Order,
some of the CAPV core objects cannot be restored,
and they will not be recreated automatically.
Because all of the objects have already passed both mutations
and validations, there is not much sense in validating them again.
The webhook configurations will be restored during installation
of the CAPV provider.
The resulting Restore object:
apiVersion: velero.io/v1
kind: Restore
metadata:
name: <restore-name>
namespace: kcm-system
spec:
backupName: <backup-name>
existingResourcePolicy: update
excludedResources:
- mutatingwebhookconfiguration.admissionregistration.k8s.io
- validatingwebhookconfiguration.admissionregistration.k8s.io
includedNamespaces:
- '*'