k0rdent Observability and FinOps (kof)#
Overview#
k0rdent Observability and FinOps (kof) provides enterprise-grade observability and FinOps capabilities for k0rdent-managed Kubernetes clusters. It enables centralized metrics, logging, and cost management through a unified OpenTelemetry-based architecture.
Architecture#
High-level#
3 layers: Management, Storage, Collection.
┌────────────────┐
│ Management │
│ UI, promxy │
└────────┬───────┘
│
┌──────┴──────┐
│ │
┌────┴─────┐ ┌─────┴────┐
│ Storage │ │ Storage │
│ region 1 │ │ region 2 │
└────┬─────┘ └─────┬────┘
│ │
┌──────┴──────┐ ...
│ │
┌─────┴─────┐ ┌─────┴─────┐
│ Collect │ │ Collect │
│ managed 1 │ │ managed 2 │
└───────────┘ └───────────┘
Mid-level#
Data flows up - from observed resources to centralized Grafana:
management cluster_____________________
│ │
│ kof-mothership chart_____________ │
│ │ │ │
│ │ grafana-operator │ │
│ │ victoria-metrics-operator │ │
│ │ cluster-api-visualizer │ │
│ │ sveltos-dashboard │ │
│ │ k0rdent service templates │ │
│ │ promxy │ │
│ │_______________________________│ │
│ │
│ kof-operators chart_____________ │
│ │ │ │
│ │ opentelemetry-operator │ │
│ │ prometheus-operator-crds │ │
│ │______________________________│ │
│_____________________________________│
cloud 1...
│
│ region 1__________________________________________ region 2...
│ │ │ │
. │ storage cluster_____________________ │ │
. │ │ │ │ │
. │ │ kof-storage chart_____________ │ │ .
│ │ │ │ │ │ .
│ │ │ grafana-operator │ │ │ .
│ │ │ victoria-metrics-operator │ │ │
│ │ │ victoria-logs-single │ │ │
│ │ │ external-dns │ │ │
│ │ │____________________________│ │ │
│ │ │ │
│ │ cert-manager (grafana, vmauth) │ │
│ │ ingress-nginx │ │
│ │__________________________________│ │
│ │
│ │
│ managed cluster 1_____________________ 2... │
│ │ │ │ │
│ │ cert-manager (OTel-operator) │ │ │
│ │ │ │ │
│ │ kof-operators chart_____________ │ . │
│ │ │ │ │ . │
│ │ │ opentelemetry-operator____ │ │ . │
│ │ │ │ │ │ │ │
│ │ │ │ OpenTelemetryCollector │ │ │ │
│ │ │ │________________________│ │ │ │
│ │ │ │ │ │
│ │ │ prometheus-operator-crds │ │ │
│ │ │______________________________│ │ │
│ │ │ │
│ │ kof-collectors chart________ │ │
│ │ │ │ │ │
│ │ │ opencost │ │ │
│ │ │ kube-state-metrics │ │ │
│ │ │ prometheus-node-exporter │ │ │
│ │ │__________________________│ │ │
│ │ │ │
│ │ observed resources │ │
│ │____________________________________│ │
│________________________________________________│
Low-level#
Helm Charts#
kof-mothership#
- Centralized Grafana dashboard, managed by grafana-operator
- Local VictoriaMetrics storage for alerting rules only, managed by victoria-metrics-operator
- cluster-api-visualizer for insight into multicluster configuration
- Sveltos dashboard, automatic secret distribution
- k0rdent service templates to deploy other charts to regional clusters
- Promxy for aggregating Prometheus metrics from regional clusters
kof-storage#
- Regional Grafana dashboard, managed by grafana-operator
- Regional VictoriaMetrics storage with main data, managed by victoria-metrics-operator
- vmauth entrypoint proxy for VictoriaMetrics components
- vmcluster for high-available fault-tolerant version of VictoriaMetrics database
- victoria-logs-single for high-performance, cost-effective, scalable logs storage
- external-dns to communicate with other clusters
kof-operators#
- prometheus-operator-crds required to create OpenTelemetry collectors, also required to monitor
kof-mothership
itself - OpenTelemetry collectors below, managed by opentelemetry-operator
kof-collectors#
- prometheus-node-exporter for hardware and OS metrics
- kube-state-metrics for metrics about the state of Kubernetes objects
- OpenCost "shines a light into the black box of Kubernetes spend"
Installation#
Prerequisites#
- k0rdent management cluster - quickstart guide
- To test on macOS:
brew install kind && kind create cluster -n k0rdent
- To test on macOS:
- Infrastructure provider credentials, e.g. guide for AWS
- Skip the "Create your ClusterDeployment" and later sections.
- Access to create DNS records for service endpoints, e.g.
kof.example.com
DNS auto-config#
To avoid manual configuration of DNS records for service endpoints later, you can automate it now using external-dns.
E.g. for AWS you should use Node IAM Role or IRSA methods in production.
Just for the sake of this demo based on aws-standalone
template for now,
we're using the most simple but less secure Static credentials method:
- Create
external-dns
IAM user with this policy. - Create an access key and
external-dns-aws-credentials
file:[default] aws_access_key_id = REDACTED aws_secret_access_key = REDACTED
- Create
external-dns-aws-credentials
secret inkof
namespace:kubectl create namespace kof kubectl create secret generic \ -n kof external-dns-aws-credentials \ --from-file external-dns-aws-credentials
Management Cluster#
-
Install
kof-operators
required bykof-mothership
:helm install --create-namespace -n kof kof-operators \ oci://ghcr.io/k0rdent/kof/charts/kof-operators --version 0.1.0
-
Compose the values for
kof-mothership
:cat >mothership-values.yaml <<EOF kcm: installTemplates: true kof: clusterProfiles: kof-aws-dns-secrets: matchLabels: k0rdent.mirantis.com/kof-aws-dns-secrets: "true" secrets: - external-dns-aws-credentials EOF
-
Why we override some default values here:
kcm.installTemplates
installs the templates likecert-manager
andkof-storage
into the management cluster. This allows to reference them from.spec.serviceSpec.services[].template
in AWSClusterDeployment
below.external-dns-aws-credentials
secret created in DNS auto-config section is auto-distributed to storage clusters by Sveltos. If you've opted out of DNS auto-config then don't add thekof-aws-dns-secrets
cluster profile above.storage-vmuser-credentials
secret is auto-created by default and auto-distributed to other clusters by SveltosClusterProfile
here.grafana-admin-credentials
secret is auto-created by default here. We will use it in the Grafana section.
-
Install
kof-mothership
:helm install -f mothership-values.yaml -n kof kof-mothership \ oci://ghcr.io/k0rdent/kof/charts/kof-mothership --version 0.1.0
-
Wait for all pods to become
Running
:kubectl get pod -n kof
Storage Cluster#
- Look through the default values of
kof-storage
chart. -
Apply the next quick start example for AWS, or use it as a reference.
-
Set the next variables using your own values:
STORAGE_CLUSTER_NAME=cloud1-region1 STORAGE_DOMAIN=$STORAGE_CLUSTER_NAME.kof.example.com ADMIN_EMAIL=$(git config user.email) echo "$STORAGE_CLUSTER_NAME, $STORAGE_DOMAIN, $ADMIN_EMAIL"
-
Use up to date template, e.g:
kubectl get clustertemplate -n kcm-system | grep aws TEMPLATE=aws-standalone-cp-0-1-0
-
Compose:
ClusterDeployment
- storage clusterPromxyServerGroup
- for metricsGrafanaDatasource
- for logs
cat >storage-cluster.yaml <<EOF
apiVersion: k0rdent.mirantis.com/v1alpha1
kind: ClusterDeployment
metadata:
name: $STORAGE_CLUSTER_NAME
namespace: kcm-system
labels:
kof: storage
spec:
template: $TEMPLATE
credential: aws-cluster-identity-cred
config:
clusterIdentity:
name: aws-cluster-identity
namespace: kcm-system
controlPlane:
instanceType: t3.large
controlPlaneNumber: 1
publicIP: true
region: us-east-2
worker:
instanceType: t3.medium
workersNumber: 3
clusterLabels:
k0rdent.mirantis.com/kof-storage-secrets: "true"
k0rdent.mirantis.com/kof-aws-dns-secrets: "true"
serviceSpec:
priority: 100
services:
- name: ingress-nginx
namespace: ingress-nginx
template: ingress-nginx-4-11-3
- name: cert-manager
namespace: cert-manager
template: cert-manager-1-16-2
values: |
cert-manager:
crds:
enabled: true
- name: kof-storage
namespace: kof
template: kof-storage-0-1-0
values: |
external-dns:
enabled: true
victoriametrics:
vmauth:
ingress:
host: vmauth.$STORAGE_DOMAIN
security:
username_key: username
password_key: password
credentials_secret_name: storage-vmuser-credentials
grafana:
ingress:
host: grafana.$STORAGE_DOMAIN
security:
credentials_secret_name: grafana-admin-credentials
cert-manager:
email: $ADMIN_EMAIL
---
apiVersion: kof.k0rdent.mirantis.com/v1alpha1
kind: PromxyServerGroup
metadata:
labels:
app.kubernetes.io/name: promxy-operator
k0rdent.mirantis.com/promxy-secret-name: kof-mothership-promxy-config
name: promxyservergroup-sample
namespace: kof
spec:
cluster_name: $STORAGE_CLUSTER_NAME
targets:
- "vmauth.$STORAGE_DOMAIN:443"
path_prefix: /vm/select/0/prometheus/
scheme: https
http_client:
dial_timeout: "5s"
tls_config:
insecure_skip_verify: true
basic_auth:
credentials_secret_name: storage-vmuser-credentials
username_key: username
password_key: password
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
labels:
app.kubernetes.io/managed-by: Helm
name: victoria-logs-storage0
namespace: kof
spec:
valuesFrom:
- targetPath: "basicAuthUser"
valueFrom:
secretKeyRef:
key: username
name: storage-vmuser-credentials
- targetPath: "secureJsonData.basicAuthPassword"
valueFrom:
secretKeyRef:
key: password
name: storage-vmuser-credentials
datasource:
name: $STORAGE_CLUSTER_NAME
url: https://vmauth.$STORAGE_DOMAIN/vls
access: proxy
isDefault: false
type: "victoriametrics-logs-datasource"
basicAuth: true
basicAuthUser: \${username}
secureJsonData:
basicAuthPassword: \${password}
instanceSelector:
matchLabels:
dashboards: grafana
resyncPeriod: 5m
EOF
-
Verify and apply the Storage
ClusterDeployment
:cat storage-cluster.yaml kubectl apply -f storage-cluster.yaml
-
Watch how cluster is deployed to AWS until all
READY
areTrue
:clusterctl describe cluster -n kcm-system $STORAGE_CLUSTER_NAME \ --show-conditions all
Managed Cluster#
-
Look through the default values of kof-operators and kof-collectors charts.
-
Apply the next quick start example for AWS, or use it as a reference.
-
Set your own value below, verify the variables:
MANAGED_CLUSTER_NAME=$STORAGE_CLUSTER_NAME-managed1 echo "$MANAGED_CLUSTER_NAME, $STORAGE_DOMAIN"
-
Use up to date template, e.g:
kubectl get clustertemplate -n kcm-system | grep aws TEMPLATE=aws-standalone-cp-0-1-0
-
Compose the Managed
ClusterDeployment
:
cat >managed-cluster.yaml <<EOF
apiVersion: k0rdent.mirantis.com/v1alpha1
kind: ClusterDeployment
metadata:
name: $MANAGED_CLUSTER_NAME
namespace: kcm-system
labels:
kof: collector
spec:
template: $TEMPLATE
credential: aws-cluster-identity-cred
config:
clusterIdentity:
name: aws-cluster-identity
namespace: kcm-system
controlPlane:
instanceType: t3.large
controlPlaneNumber: 1
publicIP: false
region: us-east-2
worker:
instanceType: t3.small
workersNumber: 3
clusterLabels:
k0rdent.mirantis.com/kof-storage-secrets: "true"
serviceSpec:
priority: 100
services:
- name: cert-manager
namespace: kof
template: cert-manager-1-16-2
values: |
cert-manager:
crds:
enabled: true
- name: kof-operators
namespace: kof
template: kof-operators-0-1-0
- name: kof-collectors
namespace: kof
template: kof-collectors-0-1-0
values: |
global:
clusterName: $MANAGED_CLUSTER_NAME
opencost:
enabled: true
opencost:
prometheus:
username_key: username
password_key: password
existingSecretName: storage-vmuser-credentials
external:
url: https://vmauth.$STORAGE_DOMAIN/vm/select/0/prometheus
exporter:
defaultClusterId: $MANAGED_CLUSTER_NAME
kof:
logs:
username_key: username
password_key: password
credentials_secret_name: storage-vmuser-credentials
endpoint: https://vmauth.$STORAGE_DOMAIN/vls/insert/opentelemetry/v1/logs
metrics:
username_key: username
password_key: password
credentials_secret_name: storage-vmuser-credentials
endpoint: https://vmauth.$STORAGE_DOMAIN/vm/insert/0/prometheus/api/v1/write
EOF
-
Verify and apply the Managed
ClusterDeployment
:cat managed-cluster.yaml kubectl apply -f managed-cluster.yaml
-
Watch how cluster is deployed to AWS until all
READY
areTrue
:clusterctl describe cluster -n kcm-system $MANAGED_CLUSTER_NAME \ --show-conditions all
Verification#
kubectl get clustersummaries -A -o wide
Provisioning
becomes Provisioned
.
kubectl get secret -n kcm-system $STORAGE_CLUSTER_NAME-kubeconfig \
-o=jsonpath={.data.value} | base64 -d > storage-kubeconfig
kubectl get secret -n kcm-system $MANAGED_CLUSTER_NAME-kubeconfig \
-o=jsonpath={.data.value} | base64 -d > managed-kubeconfig
KUBECONFIG=storage-kubeconfig kubectl get pod -A
# Namespaces: cert-manager, ingress-nginx, kof, kube-system, projectsveltos
KUBECONFIG=managed-kubeconfig kubectl get pod -A
# Namespaces: kof, kube-system, projectsveltos
Running
.
Manual DNS config#
If you've opted out of DNS auto-config then:
-
Get the
EXTERNAL-IP
ofingress-nginx
:It should look likeKUBECONFIG=storage-kubeconfig kubectl get svc \ -n ingress-nginx ingress-nginx-controller
REDACTED.us-east-2.elb.amazonaws.com
-
Create the next DNS records of type
A
both pointing to thatEXTERNAL-IP
:echo vmauth.$STORAGE_DOMAIN echo grafana.$STORAGE_DOMAIN
Sveltos#
Use Sveltos dashboard to verify secrets auto-distributed to required clusters:
kubectl create sa platform-admin
kubectl create clusterrolebinding platform-admin-access \
--clusterrole cluster-admin --serviceaccount default:platform-admin
kubectl create token platform-admin --duration=24h
kubectl port-forward -n kof svc/dashboard 8081:80
- Open http://127.0.0.1:8081/login and paste the token printed above.
- Open the
ClusterAPI
tab: http://127.0.0.1:8081/sveltos/clusters/ClusterAPI/1 - Check both storage and managed clusters:
- Cluster profiles should be
Provisioned
. - Secrets should be distributed.
- Cluster profiles should be
Grafana#
Access to Grafana#
-
Get Grafana username and password:
kubectl get secret -n kof grafana-admin-credentials -o yaml | yq '{ "user": .data.GF_SECURITY_ADMIN_USER | @base64d, "pass": .data.GF_SECURITY_ADMIN_PASSWORD | @base64d }'
-
Start Grafana dashboard:
kubectl port-forward -n kof svc/grafana-vm-service 3000:3000
-
Login to http://127.0.0.1:3000/dashboards with user/pass printed above.
- Open a dashboard:
Cluster Overview#
- Health metrics
- Resource utilization
- Performance trends
- Cost analysis
Logging Interface#
- Real-time log streaming
- Full-text search
- Log aggregation
- Alert correlation
Cost Management#
- Resource cost tracking
- Usage analysis
- Budget monitoring
- Optimization recommendations
Scaling Guidelines#
Regional Expansion#
- Deploy Storage Cluster in new region
- Update promxy configuration
- Configure collector routing
Managed Cluster Addition#
- Apply templates as in Managed Cluster section
- Verify data flow
- Configure custom dashboards
Maintenance#
Backup Requirements#
- Grafana configurations
- Alert definitions
- Custom dashboards
- Retention policies
Health Monitoring#
- Apply Verification section
- Apply Sveltos section
Uninstallation#
Warning
- This not just uninstalls kof, but also deletes clusters that may contain your data.
- Please double check they are demo clusters with no valuable data.
kubectl delete -f managed-cluster.yaml
kubectl delete -f storage-cluster.yaml
helm uninstall -n kof kof-mothership
Resource Limits#
Resources of Management Cluster#
-
resources: requests: cpu: 100m memory: 128Mi limits: cpu: 100m memory: 128Mi
-
resources: requests: cpu: 0.02 memory: 20Mi limits: cpu: 0.02 memory: 20Mi
-
resources: limits: cpu: 500m memory: 128Mi requests: cpu: 10m memory: 64Mi
Resources of Managed Cluster#
- opentelemetry:
resourceRequirements: limits: memory: 128Mi requests: memory: 128Mi
Version Compatibility#
Component | Version | Notes |
---|---|---|
k0rdent | ≥ 0.0.7 | Required for template support |
Kubernetes | ≥ 1.32 | Earlier versions untested |
OpenTelemetry | ≥ 0.75 | Recommended minimum |
VictoriaMetrics | ≥ 0.40 | Required for clustering |
Detailed:
More#
- If you've applied this guide you should have kof up and running.
- Check k0rdent/kof/docs for advanced guides like configuring alerts.