Skip to content

Installing k0rdent Observability and FinOps#

Prerequisites#

Before beginning KOF installation, you should have the following components in place:

  • A k0rdent management cluster - You can get instructions to create one in the quickstart guide
    • To test on macOS you can install using: brew install kind && kind create cluster -n k0rdent
  • You will also need your infrastructure provider credentials, such as those shown in the guide for AWS
    • Note that you should skip the "Create your ClusterDeployment" and later sections.
  • Finally, you either need access to create DNS records for service endpoints such as kof.example.com, or you may configure Istio instead.

DNS auto-config#

To avoid manual configuration of DNS records for service endpoints later, you can automate the process now using external-dns.

AWS#

For AWS, use the Node IAM Role or IRSA methods in production.

For now, however, just for the sake of this demo based on the aws-standalone template, you can use the most straightforward (though less secure) static credentials method:

  1. Create an external-dns IAM user with this policy.
  2. Create an access key and external-dns-aws-credentials file, as in:
    [default]
    aws_access_key_id = <EXAMPLE_ACCESS_KEY_ID>
    aws_secret_access_key = <EXAMPLE_SECRET_ACCESS_KEY>
    
  3. Create the external-dns-aws-credentials secret in the kof namespace:
    kubectl create namespace kof
    kubectl create secret generic \
      -n kof external-dns-aws-credentials \
      --from-file external-dns-aws-credentials
    

Azure#

To enable DNS auto-config on Azure, use DNS Zone Contributor.

  1. Create an Azure service principal with the DNS Zone Contributor permissions. You can find an example here.

  2. Create the azure.json text file containing the service principal configuration data:

    {
      "tenantId": "SP_TENANT_SP_TENANT",
      "subscriptionId": "SUBSCRIPTION_ID_SUBSCRIPTION_ID",
      "resourceGroup": "MyDnsResourceGroup",
      "aadClientId": "SP_APP_ID_SP_APP_ID",
      "aadClientSecret": "SP_PASSWORD_SP_PASSWORD"
    }
    

  3. Create the external-dns-azure-credentials secret in the kof namespace:

    kubectl create namespace kof
    kubectl create secret generic \
      -n kof external-dns-azure-credentials \
      --from-file azure.json
    
    See external-dns Azure documentation for more details.

Management Cluster#

To install KOF on the management cluster, look through the default values of the kof-mothership and kof-operators charts, and apply this example, or use it as a reference:

  1. Install kof-operators required by kof-mothership:

    helm install --wait --create-namespace -n kof kof-operators \
      oci://ghcr.io/k0rdent/kof/charts/kof-operators --version 0.2.0
    

  2. Create the mothership-values.yaml file:

    kcm:
      installTemplates: true
    
    This enables installation of ServiceTemplates such as cert-manager and kof-storage, to make it possible to reference them from the Regional and Child ClusterDeployments.

  3. If you want to use a default storage class, but kubectl get sc shows no (default), create it. Otherwise you can use a non-default storage class in the mothership-values.yaml file:

    global:
      storageClass: <EXAMPLE_STORAGE_CLASS>
    

  4. If you've applied the DNS auto-config section, add to the kcm: object in the mothership-values.yaml file.

    For AWS, add:

      kof:
        clusterProfiles:
          kof-aws-dns-secrets:
            matchLabels:
              k0rdent.mirantis.com/kof-aws-dns-secrets: "true"
            secrets:
              - external-dns-aws-credentials
    

    For Azure, add:

      kof:
        clusterProfiles:
          kof-azure-dns-secrets:
            matchLabels:
              k0rdent.mirantis.com/kof-azure-dns-secrets: "true"
            secrets:
              - external-dns-azure-credentials
    

    This enables Sveltos to auto-distribute DNS secret to regional clusters.

  5. Two secrets are auto-created by default:

    • storage-vmuser-credentials is a secret used by VictoriaMetrics. You don't need to use it directly. It is auto-distributed to other clusters by the Sveltos ClusterProfile here.
    • grafana-admin-credentials is a secret that we will use in the Grafana section. It is auto-created here.
  6. Install kof-mothership:

    helm install --wait -f mothership-values.yaml -n kof kof-mothership \
      oci://ghcr.io/k0rdent/kof/charts/kof-mothership --version 0.2.0
    

  7. Install kof-regional and kof-child charts into the management cluster:

    helm install --wait -n kof kof-regional \
      oci://ghcr.io/k0rdent/kof/charts/kof-regional --version 0.2.0
    helm install --wait -n kof kof-child \
      oci://ghcr.io/k0rdent/kof/charts/kof-child --version 0.2.0
    

  8. Wait for all pods to show that they're Running:

    kubectl get pod -n kof
    

Regional Cluster#

To install KOF on the regional cluster, look through the default values of the kof-storage chart, and apply this example for AWS, or use it as a reference:

  1. Set your KOF variables using your own values:

    REGIONAL_CLUSTER_NAME=cloud1-region1
    REGIONAL_DOMAIN=$REGIONAL_CLUSTER_NAME.kof.example.com
    ADMIN_EMAIL=$(git config user.email)
    echo "$REGIONAL_CLUSTER_NAME, $REGIONAL_DOMAIN, $ADMIN_EMAIL"
    

  2. Use the up-to-date ClusterTemplate, as in:

    kubectl get clustertemplate -n kcm-system | grep aws
    TEMPLATE=aws-standalone-cp-0-2-0
    

  3. Compose the regional ClusterDeployment:

    For AWS:

    cat >regional-cluster.yaml <<EOF
    apiVersion: k0rdent.mirantis.com/v1alpha1
    kind: ClusterDeployment
    metadata:
      name: $REGIONAL_CLUSTER_NAME
      namespace: kcm-system
      labels:
        k0rdent.mirantis.com/kof-storage-secrets: "true"
        k0rdent.mirantis.com/kof-aws-dns-secrets: "true"
        k0rdent.mirantis.com/kof-cluster-role: regional
    spec:
      template: $TEMPLATE
      credential: aws-cluster-identity-cred
      config:
        clusterAnnotations:
          k0rdent.mirantis.com/kof-regional-domain: $REGIONAL_DOMAIN
          k0rdent.mirantis.com/kof-cert-email: $ADMIN_EMAIL
        clusterIdentity:
          name: aws-cluster-identity
          namespace: kcm-system
        controlPlane:
          instanceType: t3.large
        controlPlaneNumber: 1
        publicIP: false
        region: us-east-2
        worker:
          instanceType: t3.medium
        workersNumber: 3
    EOF
    

    For Azure:

    REGION=AZURE_LOCATION
    AZURE_SUBSCRIPTION_ID=SUBSCRIPTION_ID_SUBSCRIPTION_ID
    TEMPLATE=azure-standalone-cp-0-2-0
    cat >regional-cluster.yaml <<EOF
    apiVersion: k0rdent.mirantis.com/v1alpha1
    kind: ClusterDeployment
    metadata:
      name: $REGIONAL_CLUSTER_NAME
      namespace: kcm-system
      labels:
        kof: storage
    spec:
      template: $TEMPLATE
      credential: azure-cluster-identity-cred
      config:
        clusterIdentity:
          name: azure-cluster-identity
          namespace: kcm-system
        subscriptionID: $AZURE_SUBSCRIPTION_ID
        controlPlane:
          vmSize: Standard_A4_v2
        controlPlaneNumber: 1
        location: $REGION
        worker:
          vmSize: Standard_A4_v2
        workersNumber: 3
        clusterLabels:
          k0rdent.mirantis.com/kof-storage-secrets: "true"
          k0rdent.mirantis.com/kof-azure-dns-secrets: "true"
      serviceSpec:
        priority: 100
        services:
          - name: ingress-nginx
            namespace: ingress-nginx
            template: ingress-nginx-4-11-3
            values: |
              ingress-nginx:
                controller:
                  service:
                    annotations:
                      service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
          - name: cert-manager
            namespace: cert-manager
            template: cert-manager-1-16-2
            values: |
              cert-manager:
                crds:
                  enabled: true
          - name: kof-storage
            namespace: kof
            template: kof-storage-0-2-0
            values: |
              external-dns:
                enabled: true
                provider:
                  name: azure
                extraVolumeMounts:
                  - name: azure-config-file
                    mountPath: /etc/kubernetes
                    readOnly: true
                extraVolumes:
                  - name: azure-config-file
                    secret:
                      secretName: external-dns-azure-credentials
              victoriametrics:
                vmauth:
                  ingress:
                    host: vmauth.$REGIONAL_DOMAIN
                security:
                  username_key: username
                  password_key: password
                  credentials_secret_name: storage-vmuser-credentials
              grafana:
                ingress:
                  host: grafana.$REGIONAL_DOMAIN
                security:
                  credentials_secret_name: grafana-admin-credentials
              cert-manager:
                email: sample@example.com
    ---
    apiVersion: kof.k0rdent.mirantis.com/v1alpha1
    kind: PromxyServerGroup
    metadata:
      labels:
        app.kubernetes.io/name: promxy-operator
        k0rdent.mirantis.com/promxy-secret-name: kof-mothership-promxy-config
      name: $REGIONAL_CLUSTER_NAME-metrics
      namespace: kof
    spec:
      cluster_name: $REGIONAL_CLUSTER_NAME
      targets:
        - "vmauth.$REGIONAL_DOMAIN:443"
      path_prefix: /vm/select/0/prometheus/
      scheme: https
      http_client:
        dial_timeout: "5s"
        tls_config:
          insecure_skip_verify: true
        basic_auth:
          credentials_secret_name: storage-vmuser-credentials
          username_key: username
          password_key: password
    ---
    apiVersion: grafana.integreatly.org/v1beta1
    kind: GrafanaDatasource
    metadata:
      labels:
        app.kubernetes.io/managed-by: Helm
      name: $REGIONAL_CLUSTER_NAME-logs
      namespace: kof
    spec:
      valuesFrom:
        - targetPath: "basicAuthUser"
          valueFrom:
            secretKeyRef:
              key: username
              name: storage-vmuser-credentials
        - targetPath: "secureJsonData.basicAuthPassword"
          valueFrom:
            secretKeyRef:
              key: password
              name: storage-vmuser-credentials
      datasource:
        name: $REGIONAL_CLUSTER_NAME
        url: https://vmauth.$REGIONAL_DOMAIN/vls
        access: proxy
        isDefault: false
        type: "victoriametrics-logs-datasource"
        basicAuth: true
        basicAuthUser: \${username}
        secureJsonData:
          basicAuthPassword: \${password}
      instanceSelector:
        matchLabels:
          dashboards: grafana
      resyncPeriod: 5m
    EOF
    
  4. This ClusterDeployment uses propagation of its .metadata.labels to the resulting Cluster because there are no .spec.config.clusterLabels here. Only if you add them, please copy .metadata.labels there too.

  5. The ClusterTemplate above provides the default storage class (ebs-csi-default-sc for AWS). If you want to use a non-default storage class, add it to the regional-cluster.yaml file in the .spec.config.clusterAnnotations:

    k0rdent.mirantis.com/kof-storage-class: <EXAMPLE_STORAGE_CLASS>
    

  6. The kof-operator creates and configures PromxyServerGroup and GrafanaDatasource automatically. It uses the endpoints listed below by default. Only if you want to disable the built-in metrics, logs, and traces to use your own existing instances instead, add custom endpoints to the regional-cluster.yaml file in the .spec.config.clusterAnnotations:

    k0rdent.mirantis.com/kof-write-metrics-endpoint: https://vmauth.$REGIONAL_DOMAIN/vm/insert/0/prometheus/api/v1/write
    k0rdent.mirantis.com/kof-read-metrics-endpoint: https://vmauth.$REGIONAL_DOMAIN/vm/select/0/prometheus
    k0rdent.mirantis.com/kof-write-logs-endpoint: https://vmauth.$REGIONAL_DOMAIN/vls/insert/opentelemetry/v1/logs
    k0rdent.mirantis.com/kof-read-logs-endpoint: https://vmauth.$REGIONAL_DOMAIN/vls
    k0rdent.mirantis.com/kof-write-traces-endpoint: https://jaeger.$REGIONAL_DOMAIN/collector
    

  7. The MultiClusterService named kof-regional-cluster configures and installs cert-manager, ingress-nginx, and kof-storage charts automatically. To pass any custom values to the kof-storage chart or its subcharts like the victoria-logs-single, add them to the regional-cluster.yaml file in the .spec.config.clusterAnnotations, for example:

    k0rdent.mirantis.com/kof-storage-values: |
      victoria-logs-single:
        server:
          replicaCount: 2
    

  8. Verify and apply the Regional ClusterDeployment:

    cat regional-cluster.yaml
    
    kubectl apply -f regional-cluster.yaml
    

  9. Watch how the cluster is deployed to AWS until all values of READY are True:

    clusterctl describe cluster -n kcm-system $REGIONAL_CLUSTER_NAME \
      --show-conditions all
    

Child Cluster#

To install KOF on the actual cluster to be monitored, look through the default values of the kof-operators and kof-collectors charts, and apply this example for AWS, or use it as a reference:

  1. Set your own value below, verifing the variables:

    CHILD_CLUSTER_NAME=$REGIONAL_CLUSTER_NAME-child1
    echo "$CHILD_CLUSTER_NAME, $REGIONAL_DOMAIN"
    

  2. Use the up-to-date ClusterTemplate, as in:

    kubectl get clustertemplate -n kcm-system | grep aws
    TEMPLATE=aws-standalone-cp-0-2-0
    

  3. Compose the child ClusterDeployment:

    For AWS:

    cat >child-cluster.yaml <<EOF
    apiVersion: k0rdent.mirantis.com/v1alpha1
    kind: ClusterDeployment
    metadata:
      name: $CHILD_CLUSTER_NAME
      namespace: kcm-system
      labels:
        k0rdent.mirantis.com/kof-storage-secrets: "true"
        k0rdent.mirantis.com/kof-cluster-role: child
    spec:
      template: $TEMPLATE
      credential: aws-cluster-identity-cred
      config:
        clusterIdentity:
          name: aws-cluster-identity
          namespace: kcm-system
        controlPlane:
          instanceType: t3.large
        controlPlaneNumber: 1
        publicIP: false
        region: us-east-2
        worker:
          instanceType: t3.small
        workersNumber: 3
    EOF
    

    For Azure:

    REGION=AZURE_LOCATION
    AZURE_SUBSCRIPTION_ID=SUBSCRIPTION_ID_SUBSCRIPTION_ID
    TEMPLATE=azure-standalone-cp-{{ no such element: super_collections.SuperDict object['azureStandaloneCpCluster'] }}
    cat >child-cluster.yaml <<EOF
    apiVersion: k0rdent.mirantis.com/v1alpha1
    kind: ClusterDeployment
    metadata:
      name: $CHILD_CLUSTER_NAME
      namespace: kcm-system
      labels:
        kof: collector
    spec:
      template: azure-standalone-cp-{{ no such element: super_collections.SuperDict object['azureStandaloneCpCluster'] }}
      credential: azure-cluster-identity-cred
      config:
        clusterIdentity:
          name: azure-cluster-identity
          namespace: kcm-system
        subscriptionID: $AZURE_SUBSCRIPTION_ID
        controlPlane:
          vmSize: Standard_A4_v2
        controlPlaneNumber: 1
        location: $REGION
        worker:
          vmSize: Standard_A4_v2
        workersNumber: 3
        clusterLabels:
          k0rdent.mirantis.com/kof-storage-secrets: "true"
      serviceSpec:
        priority: 100
        services:
          - name: cert-manager
            namespace: kof
            template: cert-manager-1-16-2
            values: |
              cert-manager:
                crds:
                  enabled: true
          - name: kof-operators
            namespace: kof
            template: kof-operators-0-2-0
          - name: kof-collectors
            namespace: kof
            template: kof-collectors-0-2-0
            values: |
              global:
                clusterName: $CHILD_CLUSTER_NAME
              opencost:
                enabled: true
                opencost:
                  prometheus:
                    username_key: username
                    password_key: password
                    existingSecretName: storage-vmuser-credentials
                    external:
                      url: https://vmauth.$REGIONAL_DOMAIN/vm/select/0/prometheus
                  exporter:
                    defaultClusterId: $CHILD_CLUSTER_NAME
              kof:
                logs:
                  username_key: username
                  password_key: password
                  credentials_secret_name: storage-vmuser-credentials
                  endpoint: https://vmauth.$REGIONAL_DOMAIN/vls/insert/opentelemetry/v1/logs
                metrics:
                  username_key: username
                  password_key: password
                  credentials_secret_name: storage-vmuser-credentials
                  endpoint: https://vmauth.$REGIONAL_DOMAIN/vm/insert/0/prometheus/api/v1/write
    EOF
    
  4. This ClusterDeployment uses propagation of its .metadata.labels to the resulting Cluster because there are no .spec.config.clusterLabels here. Only if you add them, please copy .metadata.labels there too.

  5. The kof-operator discovers the regional cluster by the location of the child cluster. Only if you have more than one regional cluster in the same AWS region / Azure location / etc, and you want to connect the child cluster to specific regional cluster, add this regional cluster name to the child-cluster.yaml file in the .metadata.labels:

    k0rdent.mirantis.com/kof-regional-cluster-name: $REGIONAL_CLUSTER_NAME
    

  6. The MultiClusterService named kof-child-cluster configures and installs cert-manager, kof-operators, and kof-collectors charts automatically. To pass any custom values to the kof-collectors chart or its subcharts like the opencost, add them to the child-cluster.yaml file in the .spec.config, for example:

    clusterAnnotations:
      k0rdent.mirantis.com/kof-collectors-values: |
        opencost:
          opencost:
            exporter:
              replicas: 2
    
    Note: the first opencost key is to reference the subchart, and the second opencost key is part of its values.

  7. Verify and apply the ClusterDeployment:

    cat child-cluster.yaml
    
    kubectl apply -f child-cluster.yaml
    

  8. Watch while the cluster is deployed to AWS until all values of READY are True:

    clusterctl describe cluster -n kcm-system $CHILD_CLUSTER_NAME \
      --show-conditions all