Skip to content

k0rdent Observability and FinOps (kof)#

Overview#

k0rdent Observability and FinOps (kof) provides enterprise-grade observability and FinOps capabilities for k0rdent-managed Kubernetes clusters. It enables centralized metrics, logging, and cost management through a unified OpenTelemetry-based architecture.

Architecture#

High-level#

3 layers: Management, Storage, Collection.

           ┌────────────────┐
           │   Management   │
           │   UI, promxy   │
           └────────┬───────┘
                    │
             ┌──────┴──────┐
             │             │
        ┌────┴─────┐ ┌─────┴────┐
        │ Storage  │ │ Storage  │
        │ region 1 │ │ region 2 │
        └────┬─────┘ └─────┬────┘
             │             │
      ┌──────┴──────┐     ...
      │             │
┌─────┴─────┐ ┌─────┴─────┐
│ Collect   │ │ Collect   │
│ managed 1 │ │ managed 2 │
└───────────┘ └───────────┘

Mid-level#

Data flows up - from observed resources to centralized Grafana:

management cluster_____________________
│                                     │
│  kof-mothership chart_____________  │
│  │                               │  │
│  │ grafana-operator              │  │
│  │ victoria-metrics-operator     │  │
│  │ cluster-api-visualizer        │  │
│  │ sveltos-dashboard             │  │
│  │ k0rdent service templates     │  │
│  │ promxy                        │  │
│  │_______________________________│  │
│                                     │
│  kof-operators chart_____________   │
│  │                              │   │
│  │  opentelemetry-operator      │   │
│  │  prometheus-operator-crds    │   │
│  │______________________________│   │
│_____________________________________│


cloud 1...
│
│  region 1__________________________________________  region 2...
│  │                                                │  │
.  │  storage cluster_____________________          │  │
.  │  │                                  │          │  │
.  │  │  kof-storage chart_____________  │          │  .
   │  │  │                            │  │          │  .
   │  │  │ grafana-operator           │  │          │  .
   │  │  │ victoria-metrics-operator  │  │          │
   │  │  │ victoria-logs-single       │  │          │
   │  │  │ external-dns               │  │          │
   │  │  │____________________________│  │          │
   │  │                                  │          │
   │  │  cert-manager (grafana, vmauth)  │          │
   │  │  ingress-nginx                   │          │
   │  │__________________________________│          │
   │                                                │
   │                                                │
   │  managed cluster 1_____________________  2...  │
   │  │                                    │  │     │
   │  │  cert-manager (OTel-operator)      │  │     │
   │  │                                    │  │     │
   │  │  kof-operators chart_____________  │  .     │
   │  │  │                              │  │  .     │
   │  │  │  opentelemetry-operator____  │  │  .     │
   │  │  │  │                        │  │  │        │
   │  │  │  │ OpenTelemetryCollector │  │  │        │
   │  │  │  │________________________│  │  │        │
   │  │  │                              │  │        │
   │  │  │  prometheus-operator-crds    │  │        │
   │  │  │______________________________│  │        │
   │  │                                    │        │
   │  │  kof-collectors chart________      │        │
   │  │  │                          │      │        │
   │  │  │ opencost                 │      │        │
   │  │  │ kube-state-metrics       │      │        │
   │  │  │ prometheus-node-exporter │      │        │
   │  │  │__________________________│      │        │
   │  │                                    │        │
   │  │  observed resources                │        │
   │  │____________________________________│        │
   │________________________________________________│

Low-level#

kof-architecture

Helm Charts#

kof-mothership#

kof-storage#

kof-operators#

kof-collectors#

Installation#

Prerequisites#

  • k0rdent management cluster - quickstart guide
    • To test on macOS: brew install kind && kind create cluster -n k0rdent
  • Infrastructure provider credentials, e.g. guide for AWS
    • Skip the "Create your ClusterDeployment" and later sections.
  • Access to create DNS records for service endpoints, e.g. kof.example.com

DNS auto-config#

To avoid manual configuration of DNS records for service endpoints later, you can automate it now using external-dns.

E.g. for AWS you should use Node IAM Role or IRSA methods in production.

Just for the sake of this demo based on aws-standalone template for now, we're using the most simple but less secure Static credentials method:

  • Create external-dns IAM user with this policy.
  • Create an access key and external-dns-aws-credentials file:
    [default]
    aws_access_key_id = REDACTED
    aws_secret_access_key = REDACTED
    
  • Create external-dns-aws-credentials secret in kof namespace:
    kubectl create namespace kof
    kubectl create secret generic \
      -n kof external-dns-aws-credentials \
      --from-file external-dns-aws-credentials
    

Management Cluster#

  • Install kof-operators required by kof-mothership:

    helm install --create-namespace -n kof kof-operators \
      oci://ghcr.io/k0rdent/kof/charts/kof-operators --version 0.1.0
    

  • Compose the values for kof-mothership:

    cat >mothership-values.yaml <<EOF
    kcm:
      installTemplates: true
      kof:
        clusterProfiles:
          kof-aws-dns-secrets:
            matchLabels:
              k0rdent.mirantis.com/kof-aws-dns-secrets: "true"
            secrets:
              - external-dns-aws-credentials
    EOF
    

  • Why we override some default values here:

    • kcm.installTemplates installs the templates like cert-manager and kof-storage into the management cluster. This allows to reference them from .spec.serviceSpec.services[].template in AWS ClusterDeployment below.
    • external-dns-aws-credentials secret created in DNS auto-config section is auto-distributed to storage clusters by Sveltos. If you've opted out of DNS auto-config then don't add the kof-aws-dns-secrets cluster profile above.
    • storage-vmuser-credentials secret is auto-created by default and auto-distributed to other clusters by Sveltos ClusterProfile here.
    • grafana-admin-credentials secret is auto-created by default here. We will use it in the Grafana section.
  • Install kof-mothership:

    helm install -f mothership-values.yaml -n kof kof-mothership \
      oci://ghcr.io/k0rdent/kof/charts/kof-mothership --version 0.1.0
    

  • Wait for all pods to become Running:

    kubectl get pod -n kof
    

Storage Cluster#

  • Look through the default values of kof-storage chart.
  • Apply the next quick start example for AWS, or use it as a reference.

  • Set the next variables using your own values:

    STORAGE_CLUSTER_NAME=cloud1-region1
    STORAGE_DOMAIN=$STORAGE_CLUSTER_NAME.kof.example.com
    ADMIN_EMAIL=$(git config user.email)
    echo "$STORAGE_CLUSTER_NAME, $STORAGE_DOMAIN, $ADMIN_EMAIL"
    

  • Use up to date template, e.g:

    kubectl get clustertemplate -n kcm-system | grep aws
    TEMPLATE=aws-standalone-cp-0-1-0
    

  • Compose:

    • ClusterDeployment - storage cluster
    • PromxyServerGroup - for metrics
    • GrafanaDatasource - for logs
cat >storage-cluster.yaml <<EOF
apiVersion: k0rdent.mirantis.com/v1alpha1
kind: ClusterDeployment
metadata:
  name: $STORAGE_CLUSTER_NAME
  namespace: kcm-system
  labels:
    kof: storage
spec:
  template: $TEMPLATE
  credential: aws-cluster-identity-cred
  config:
    clusterIdentity:
      name: aws-cluster-identity
      namespace: kcm-system
    controlPlane:
      instanceType: t3.large
    controlPlaneNumber: 1
    publicIP: true
    region: us-east-2
    worker:
      instanceType: t3.medium
    workersNumber: 3
    clusterLabels:
      k0rdent.mirantis.com/kof-storage-secrets: "true"
      k0rdent.mirantis.com/kof-aws-dns-secrets: "true"
  serviceSpec:
    priority: 100
    services:
      - name: ingress-nginx
        namespace: ingress-nginx
        template: ingress-nginx-4-11-3
      - name: cert-manager
        namespace: cert-manager
        template: cert-manager-1-16-2
        values: |
          cert-manager:
            crds:
              enabled: true
      - name: kof-storage
        namespace: kof
        template: kof-storage-0-1-0
        values: |
          external-dns:
            enabled: true
          victoriametrics:
            vmauth:
              ingress:
                host: vmauth.$STORAGE_DOMAIN
            security:
              username_key: username
              password_key: password
              credentials_secret_name: storage-vmuser-credentials
          grafana:
            ingress:
              host: grafana.$STORAGE_DOMAIN
            security:
              credentials_secret_name: grafana-admin-credentials
          cert-manager:
            email: $ADMIN_EMAIL
---
apiVersion: kof.k0rdent.mirantis.com/v1alpha1
kind: PromxyServerGroup
metadata:
  labels:
    app.kubernetes.io/name: promxy-operator
    k0rdent.mirantis.com/promxy-secret-name: kof-mothership-promxy-config
  name: promxyservergroup-sample
  namespace: kof
spec:
  cluster_name: $STORAGE_CLUSTER_NAME
  targets:
    - "vmauth.$STORAGE_DOMAIN:443"
  path_prefix: /vm/select/0/prometheus/
  scheme: https
  http_client:
    dial_timeout: "5s"
    tls_config:
      insecure_skip_verify: true
    basic_auth:
      credentials_secret_name: storage-vmuser-credentials
      username_key: username
      password_key: password
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
  name: victoria-logs-storage0
  namespace: kof
spec:
  valuesFrom:
    - targetPath: "basicAuthUser"
      valueFrom:
        secretKeyRef:
          key: username
          name: storage-vmuser-credentials
    - targetPath: "secureJsonData.basicAuthPassword"
      valueFrom:
        secretKeyRef:
          key: password
          name: storage-vmuser-credentials
  datasource:
    name: $STORAGE_CLUSTER_NAME
    url: https://vmauth.$STORAGE_DOMAIN/vls
    access: proxy
    isDefault: false
    type: "victoriametrics-logs-datasource"
    basicAuth: true
    basicAuthUser: \${username}
    secureJsonData:
      basicAuthPassword: \${password}
  instanceSelector:
    matchLabels:
      dashboards: grafana
  resyncPeriod: 5m
EOF
  • Verify and apply the Storage ClusterDeployment:

    cat storage-cluster.yaml
    
    kubectl apply -f storage-cluster.yaml
    

  • Watch how cluster is deployed to AWS until all READY are True:

    clusterctl describe cluster -n kcm-system $STORAGE_CLUSTER_NAME \
      --show-conditions all
    

Managed Cluster#

  • Look through the default values of kof-operators and kof-collectors charts.

  • Apply the next quick start example for AWS, or use it as a reference.

  • Set your own value below, verify the variables:

    MANAGED_CLUSTER_NAME=$STORAGE_CLUSTER_NAME-managed1
    echo "$MANAGED_CLUSTER_NAME, $STORAGE_DOMAIN"
    

  • Use up to date template, e.g:

    kubectl get clustertemplate -n kcm-system | grep aws
    TEMPLATE=aws-standalone-cp-0-1-0
    

  • Compose the Managed ClusterDeployment:

cat >managed-cluster.yaml <<EOF
apiVersion: k0rdent.mirantis.com/v1alpha1
kind: ClusterDeployment
metadata:
  name: $MANAGED_CLUSTER_NAME
  namespace: kcm-system
  labels:
    kof: collector
spec:
  template: $TEMPLATE
  credential: aws-cluster-identity-cred
  config:
    clusterIdentity:
      name: aws-cluster-identity
      namespace: kcm-system
    controlPlane:
      instanceType: t3.large
    controlPlaneNumber: 1
    publicIP: false
    region: us-east-2
    worker:
      instanceType: t3.small
    workersNumber: 3
    clusterLabels:
      k0rdent.mirantis.com/kof-storage-secrets: "true"
  serviceSpec:
    priority: 100
    services:
      - name: cert-manager
        namespace: kof
        template: cert-manager-1-16-2
        values: |
          cert-manager:
            crds:
              enabled: true
      - name: kof-operators
        namespace: kof
        template: kof-operators-0-1-0
      - name: kof-collectors
        namespace: kof
        template: kof-collectors-0-1-0
        values: |
          global:
            clusterName: $MANAGED_CLUSTER_NAME
          opencost:
            enabled: true
            opencost:
              prometheus:
                username_key: username
                password_key: password
                existingSecretName: storage-vmuser-credentials
                external:
                  url: https://vmauth.$STORAGE_DOMAIN/vm/select/0/prometheus
              exporter:
                defaultClusterId: $MANAGED_CLUSTER_NAME
          kof:
            logs:
              username_key: username
              password_key: password
              credentials_secret_name: storage-vmuser-credentials
              endpoint: https://vmauth.$STORAGE_DOMAIN/vls/insert/opentelemetry/v1/logs
            metrics:
              username_key: username
              password_key: password
              credentials_secret_name: storage-vmuser-credentials
              endpoint: https://vmauth.$STORAGE_DOMAIN/vm/insert/0/prometheus/api/v1/write
EOF
  • Verify and apply the Managed ClusterDeployment:

    cat managed-cluster.yaml
    
    kubectl apply -f managed-cluster.yaml
    

  • Watch how cluster is deployed to AWS until all READY are True:

    clusterctl describe cluster -n kcm-system $MANAGED_CLUSTER_NAME \
      --show-conditions all
    

Verification#

kubectl get clustersummaries -A -o wide
Wait until Provisioning becomes Provisioned.

kubectl get secret -n kcm-system $STORAGE_CLUSTER_NAME-kubeconfig \
  -o=jsonpath={.data.value} | base64 -d > storage-kubeconfig

kubectl get secret -n kcm-system $MANAGED_CLUSTER_NAME-kubeconfig \
  -o=jsonpath={.data.value} | base64 -d > managed-kubeconfig

KUBECONFIG=storage-kubeconfig kubectl get pod -A
  # Namespaces: cert-manager, ingress-nginx, kof, kube-system, projectsveltos

KUBECONFIG=managed-kubeconfig kubectl get pod -A
  # Namespaces: kof, kube-system, projectsveltos
Wait for all pods to become Running.

Manual DNS config#

If you've opted out of DNS auto-config then:

  • Get the EXTERNAL-IP of ingress-nginx:

    KUBECONFIG=storage-kubeconfig kubectl get svc \
      -n ingress-nginx ingress-nginx-controller
    
    It should look like REDACTED.us-east-2.elb.amazonaws.com

  • Create the next DNS records of type A both pointing to that EXTERNAL-IP:

    echo vmauth.$STORAGE_DOMAIN
    echo grafana.$STORAGE_DOMAIN
    

Sveltos#

Use Sveltos dashboard to verify secrets auto-distributed to required clusters:

kubectl create sa platform-admin
kubectl create clusterrolebinding platform-admin-access \
  --clusterrole cluster-admin --serviceaccount default:platform-admin

kubectl create token platform-admin --duration=24h
kubectl port-forward -n kof svc/dashboard 8081:80

sveltos-demo

Grafana#

Access to Grafana#

  • Get Grafana username and password:

    kubectl get secret -n kof grafana-admin-credentials -o yaml | yq '{
      "user": .data.GF_SECURITY_ADMIN_USER | @base64d,
      "pass": .data.GF_SECURITY_ADMIN_PASSWORD | @base64d
    }'
    

  • Start Grafana dashboard:

    kubectl port-forward -n kof svc/grafana-vm-service 3000:3000
    

  • Login to http://127.0.0.1:3000/dashboards with user/pass printed above.

  • Open a dashboard:

grafana-demo

Cluster Overview#

  • Health metrics
  • Resource utilization
  • Performance trends
  • Cost analysis

Logging Interface#

  • Real-time log streaming
  • Full-text search
  • Log aggregation
  • Alert correlation

Cost Management#

  • Resource cost tracking
  • Usage analysis
  • Budget monitoring
  • Optimization recommendations

Scaling Guidelines#

Regional Expansion#

  • Deploy Storage Cluster in new region
  • Update promxy configuration
  • Configure collector routing

Managed Cluster Addition#

  • Apply templates as in Managed Cluster section
  • Verify data flow
  • Configure custom dashboards

Maintenance#

Backup Requirements#

  • Grafana configurations
  • Alert definitions
  • Custom dashboards
  • Retention policies

Health Monitoring#

Uninstallation#

Warning

  • This not just uninstalls kof, but also deletes clusters that may contain your data.
  • Please double check they are demo clusters with no valuable data.
kubectl delete -f managed-cluster.yaml
kubectl delete -f storage-cluster.yaml
helm uninstall -n kof kof-mothership

Resource Limits#

Resources of Management Cluster#

  • promxy:

    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 100m
        memory: 128Mi
    

  • promxy-deployment:

    resources:
      requests:
        cpu: 0.02
        memory: 20Mi
      limits:
        cpu: 0.02
        memory: 20Mi
    

  • promxy-operator:

    resources:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 10m
        memory: 64Mi
    

Resources of Managed Cluster#

  • opentelemetry:
    resourceRequirements:
      limits:
        memory: 128Mi
      requests:
        memory: 128Mi
    

Version Compatibility#

Component Version Notes
k0rdent ≥ 0.0.7 Required for template support
Kubernetes ≥ 1.32 Earlier versions untested
OpenTelemetry ≥ 0.75 Recommended minimum
VictoriaMetrics ≥ 0.40 Required for clustering

Detailed:

More#

  • If you've applied this guide you should have kof up and running.
  • Check k0rdent/kof/docs for advanced guides like configuring alerts.