返回

Kubernetes 使用Rancher管理平台 安裝Prometheus監控

Kubernetes 使用Rancher管理平台 安裝Prometheus監控

目錄

參考網址: https://ranchermanager.docs.rancher.com/zh/how-to-guides/advanced-user-guides/monitoring-alerting-guides/enable-monitoring

注意:

由於我之前有安裝rook-ceph,也有用到prometheus監控rook-ceph集群,所以也有安裝prometheus-operator,現在rancher安裝的prometheus也會用到prometheus-operator,所以這裡要刪除掉,不然會有衝突,導致rancher安裝的prometheus無法啟動。

等安裝好rancher的prometheus,再來重新安裝rook-ceph用的prometheus。

root@k8s-master71u:~/rook/deploy/examples/monitoring# kubectl delete -f bundle.yaml
customresourcedefinition.apiextensions.k8s.io "alertmanagerconfigs.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "alertmanagers.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "podmonitors.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "probes.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "prometheusagents.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "prometheusrules.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "scrapeconfigs.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "servicemonitors.monitoring.coreos.com" deleted
customresourcedefinition.apiextensions.k8s.io "thanosrulers.monitoring.coreos.com" deleted
clusterrolebinding.rbac.authorization.k8s.io "prometheus-operator" deleted
clusterrole.rbac.authorization.k8s.io "prometheus-operator" deleted
deployment.apps "prometheus-operator" deleted
serviceaccount "prometheus-operator" deleted
service "prometheus-operator" deleted

安裝Prometheus

點選左下角,Cluster Tools-> Monitoring

Prometheus修改資料存儲天數,和使用持久化數據

Grafana使用持久化數據

安裝完成

root@k8s-master71u:~/rook/deploy/examples/monitoring# kubectl get pod -n cattle-monitoring-system
NAME                                                     READY   STATUS    RESTARTS      AGE
alertmanager-rancher-monitoring-alertmanager-0           2/2     Running   1 (49s ago)   58s
prometheus-rancher-monitoring-prometheus-0               3/3     Running   0             56s
rancher-monitoring-grafana-568d4fc6d5-zsnq7              3/4     Running   0             88s
rancher-monitoring-kube-state-metrics-56b4477cc-2zgmf    1/1     Running   0             88s
rancher-monitoring-operator-c66c76fd9-cns8h              1/1     Running   0             88s
rancher-monitoring-prometheus-adapter-7494f789f6-x2v6p   1/1     Running   0             88s
rancher-monitoring-prometheus-node-exporter-65n2k        1/1     Running   0             88s
rancher-monitoring-prometheus-node-exporter-7t4nt        1/1     Running   0             88s
rancher-monitoring-prometheus-node-exporter-s9kwh        1/1     Running   0             88s
rancher-monitoring-prometheus-node-exporter-w8tv9        1/1     Running   0             88s
rancher-monitoring-prometheus-node-exporter-zzqpc        1/1     Running   0             88s

測試Monitoring功能

查看Prometheus

查看Alertmanager

查看Grafana

重新安裝rook-ceph用的prometheus

重新安裝rook-ceph用的prometheus,prometheus-operator就不用安裝了,也就是最前面刪除的bundle.yaml那個檔案。

root@k8s-master71u:~/rook/deploy/examples/monitoring# kubectl create -f prometheus.yaml
prometheus.monitoring.coreos.com/rook-prometheus created

root@k8s-master71u:~/rook/deploy/examples/monitoring# kubectl create -f prometheus-service.yaml
The Service "rook-prometheus" is invalid: spec.ports[0].nodePort: Invalid value: 30900: provided port is already allocated
root@k8s-master71u:~/rook/deploy/examples/monitoring# kubectl create -f service-monitor.yaml
servicemonitor.monitoring.coreos.com/rook-ceph-mgr created

Kubelet 內部 cAdvisor 無法取得特定 Label

參考文章: https://ithelp.ithome.com.tw/m/articles/10331330

由於 Kubernetes 在 v1.24 後,剔除對 docker shim 的依賴,導致 Kubelet 內部原本用來收集各種容器指標的 cAdvisor 服務,沒辦法取得有關 image、container 等指標,直到目前這個問題依然持續發生著:

這造成我們 Kube-Prometheus-Stack 中許多關於服務資源使用量的圖表無法取得數據,大部分的圖表都沒辦法正常顯示。

目前社群上的解決方法為自己替 Kubelet 在 Kubernetes 叢集建立 cAdvisor 來產生正常的容器資源指標,以解決 Kubelet 在 v1.24 升級後普遍遇到的的問題。

首先讓我們先將 Kubelet 的 cAdvisor 功能關閉:

接下來,我們需要自訂安裝 cAdvisor 服務並建立 ServiceMonitor 資源,使 Prometheus 拉取容器指標:

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: cadvisor
  name: cadvisor
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app: cadvisor
  name: cadvisor
rules:
  - apiGroups:
      - policy
    resourceNames:
      - cadvisor
    resources:
      - podsecuritypolicies
    verbs:
      - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app: cadvisor
  name: cadvisor
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cadvisor
subjects:
  - kind: ServiceAccount
    name: cadvisor
    namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/pod: docker/default
  labels:
    app: cadvisor
  name: cadvisor
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: cadvisor
      name: cadvisor
  template:
    metadata:
      labels:
        app: cadvisor
        name: cadvisor
    spec:
      automountServiceAccountToken: false
      containers:
        - args:
            - --housekeeping_interval=10s
            - --max_housekeeping_interval=15s
            - --event_storage_event_limit=default=0
            - --event_storage_age_limit=default=0
            - --enable_metrics=app,cpu,disk,diskIO,memory,network,process
            - --docker_only
            - --store_container_labels=false
            - --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
          image: gcr.io/cadvisor/cadvisor:v0.45.0
          name: cadvisor
          ports:
            - containerPort: 8080
              name: http
              protocol: TCP
          resources:
            limits:
              cpu: 800m
              memory: 2000Mi
            requests:
              cpu: 400m
              memory: 400Mi
          volumeMounts:
            - mountPath: /rootfs
              name: rootfs
              readOnly: true
            - mountPath: /var/run
              name: var-run
              readOnly: true
            - mountPath: /sys
              name: sys
              readOnly: true
            - mountPath: /var/lib/docker
              name: docker
              readOnly: true
            - mountPath: /dev/disk
              name: disk
              readOnly: true
      priorityClassName: system-node-critical
      serviceAccountName: cadvisor
      terminationGracePeriodSeconds: 30
      tolerations:
        - key: node-role.kubernetes.io/controlplane
          value: "true"
          effect: NoSchedule
        - key: node-role.kubernetes.io/etcd
          value: "true"
          effect: NoExecute
      volumes:
        - hostPath:
            path: /
          name: rootfs
        - hostPath:
            path: /var/run
          name: var-run
        - hostPath:
            path: /sys
          name: sys
        - hostPath:
            path: /var/lib/docker
          name: docker
        - hostPath:
            path: /dev/disk
          name: disk
---
apiVersion: v1
kind: Service
metadata:
  name: cadvisor
  labels:
    app: cadvisor
  namespace: kube-system
spec:
  selector:
    app: cadvisor
  ports:
    - name: cadvisor
      port: 8080
      protocol: TCP
      targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: cadvisor
    release: prometheus-stack
  name: cadvisor
  namespace: kube-system
spec:
  endpoints:
    - metricRelabelings:
        - sourceLabels:
            - container_label_io_kubernetes_pod_name
          targetLabel: pod
        - sourceLabels:
            - container_label_io_kubernetes_container_name
          targetLabel: container
        - sourceLabels:
            - container_label_io_kubernetes_pod_namespace
          targetLabel: namespace
        - action: labeldrop
          regex: container_label_io_kubernetes_pod_name
        - action: labeldrop
          regex: container_label_io_kubernetes_container_name
        - action: labeldrop
          regex: container_label_io_kubernetes_pod_namespace
      port: cadvisor
      relabelings:
        - sourceLabels:
            - __meta_kubernetes_pod_node_name
          targetLabel: node
        - sourceLabels:
            - __metrics_path__
          targetLabel: metrics_path
          replacement: /metrics/cadvisor
        - sourceLabels:
            - job
          targetLabel: job
          replacement: kubelet
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app: cadvisor

ref:https://github.com/rancher/rancher/issues/38934#issuecomment-1294585708

執行上列設定後,cAdvisor 就能替我們開始收集指標:

root@k8s-master71u:~/kube-prometheus-stack# kubectl create -f cAdvisor.yaml
serviceaccount/cadvisor created
clusterrole.rbac.authorization.k8s.io/cadvisor created
clusterrolebinding.rbac.authorization.k8s.io/cadvisor created
daemonset.apps/cadvisor created
service/cadvisor created
servicemonitor.monitoring.coreos.com/cadvisor created

如此一來圖表可以正常顯示:

分配權限給User,可以查看監控數據

建立一個Project,取名Monitoring

設定User為此Project的成員

將cattle-monitoring-system命名空間移至此Project中

預設cattle-monitoring-system不屬於任何一個Project

登入User帳號測試

comments powered by Disqus
使用 Hugo 建立
主題 StackJimmy 設計
發表了128篇文章 · 總計63.96k字
本站已運行
·