Qingular

CRD 与 Operator

·CKAk8s练习

CustomResourceDefinition(CRD)和 Operator 模式是 Kubernetes 扩展性的核心,允许用户自定义资源并实现自动化运维逻辑。

← 返回 CKA 练习目录

概述

CustomResourceDefinition(CRD)允许用户扩展 Kubernetes API,定义自己的资源类型。Operator 模式则结合 CRD 和控制器逻辑,实现应用的自动化管理。CKA 考试要求对 CRD 和 Operator 模式有基本理解。


第一部分:CRD(CustomResourceDefinition)

1. CRD 概念

CRD 是 Kubernetes 的扩展机制,允许用户定义新的资源类型(如 DatabaseBackupApplication),Kubernetes API Server 会为这些自定义资源提供完整的 RESTful API 支持。

自定义资源创建后:
- API Server 自动生成 RESTful 端点:/apis/<group>/<version>/namespaces/<ns>/<resource-plural>/
- 支持 kubectl 操作(get, create, delete, etc.)
- 支持 RBAC 权限控制
- 存储在 etcd 中

CRD 与原生资源对比

特性原生资源(Pod, Service...)CRD
定义位置Kubernetes 源码CRD YAML 文件
验证逻辑内置OpenAPI v3 schema
控制器内置控制器自定义控制器(Operator)
存储etcdetcd

2. CRD YAML 结构

2.1 基本 CRD 示例

# crd-database.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com    # 必须格式:<plural>.<group>
spec:
  group: example.com             # API 组
  names:
    plural: databases            # 复数名称(用于 kubectl)
    singular: database           # 单数名称
    shortNames:                  # 短名称
    - db
    kind: Database               # 资源类型(用于 YAML 的 kind 字段)
    listKind: DatabaseList       # 列表类型
  scope: Namespaced              # 作用域:Namespaced 或 Cluster
  versions:
  - name: v1                     # API 版本
    served: true                 # 是否在 API Server 中提供
    storage: true                # 是否作为 etcd 存储版本
    schema:                      # OpenAPI v3 验证 Schema
      openAPIV3Schema:
        type: object
        required:
        - spec
        properties:
          spec:
            type: object
            required:
            - engine
            - version
            properties:
              engine:
                type: string
                enum:
                - mysql
                - postgres
                - mongodb
              version:
                type: string
                pattern: '^\d+\.\d+\.\d+$'
              replicas:
                type: integer
                minimum: 1
                maximum: 10
                default: 1
              storage:
                type: string
                pattern: '^\d+(Gi|Ti)$'
                default: "10Gi"
              adminUser:
                type: string
                default: "admin"
              backup:
                type: object
                properties:
                  enabled:
                    type: boolean
                    default: false
                  schedule:
                    type: string
                    pattern: '^(\d+|\*)(/\d+)?(\s+(\d+|\*)(/\d+)?){4}$'
                    description: "Cron schedule for backups"

2.2 使用 CRD 创建自定义资源

# database-sample.yaml
apiVersion: example.com/v1
kind: Database
metadata:
  name: my-production-db
  labels:
    environment: production
    team: backend
spec:
  engine: postgres
  version: "14.5"
  replicas: 3
  storage: 100Gi
  adminUser: dbadmin
  backup:
    enabled: true
    schedule: "0 2 * * *"
# 应用 CRD
kubectl apply -f crd-database.yaml

# 查看 CRD
kubectl get crd
kubectl get crd databases.example.com -o yaml

# 使用自定义资源
kubectl apply -f database-sample.yaml

# 操作自定义资源(与原生资源完全一致)
kubectl get databases
kubectl get db                                 # 使用 shortName
kubectl get databases --all-namespaces
kubectl describe database my-production-db
kubectl delete database my-production-db
kubectl get databases -o wide
kubectl get databases -o yaml

2.3 CRD 的附加功能配置

# CRD 高级配置示例
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  # ... 基本配置 ...
  versions:
  - name: v1
    # ... schema ...
    additionalPrinterColumns:    # kubectl get 的额外列
    - name: Engine
      type: string
      jsonPath: .spec.engine
      description: Database engine type
    - name: Version
      type: string
      jsonPath: .spec.version
    - name: Replicas
      type: integer
      jsonPath: .spec.replicas
    - name: Status
      type: string
      jsonPath: .status.phase
    subresources:                # 子资源
      status: {}                 # 启用 /status 子资源
      scale:                     # 启用 /scale 子资源
        specReplicasPath: .spec.replicas
        statusReplicasPath: .status.replicas
  conversion:                    # 版本转换
    strategy: None               # 或 Webhook
# 有了 additionalPrinterColumns 后:
kubectl get databases
# NAME                ENGINE    VERSION   REPLICAS   STATUS
# my-production-db    postgres  14.5      3          Running

# 查看状态子资源
kubectl get database my-production-db -o json | jq '.status'

2.4 CRD 版本管理

# CRD 支持多版本共存,通过版本转换实现平滑升级
# 查看 CRD 的版本
kubectl get crd databases.example.com -o json | jq '.spec.versions[].name'

# 使用特定版本访问
kubectl get databases.v1.example.com

3. Operator 模式

3.1 Operator 概念

Operator 是打包、部署和管理 Kubernetes 应用的模式。它通过 CRD 扩展 API,并使用自定义控制器(Controller)来维护自定义资源的目标状态。

Operator = CRD + Controller + Domain Knowledge

工作原理:
1. 用户创建自定义资源(如:Database 实例)
2. Operator 控制器 watch 该资源的变更
3. 控制器根据业务逻辑创建/管理底层资源(StatefulSet, Service, PVC...)
4. 控制器不断调整实际状态使其匹配用户期望状态

3.2 Operator 与传统部署对比

特性传统方式Operator 方式
部署手动创建多个 YAML创建单个 CR 实例
扩缩容手动修改 Deployment修改 CR 的 replicas 字段
升级手动更新镜像版本修改 CR 的 version 字段
备份恢复手动执行脚本Operator 自动处理
故障恢复手动排查修复Operator 自动检测恢复
升级回滚手动执行Operator 自动化

3.3 Operator 工作流程

          ┌──────────────────────┐
          │     etcd (CR 存储)    │
          └──────────┬───────────┘
                     │ watch
          ┌──────────▼───────────┐
          │   Reconciler Loop     │
          │  ─────────────────    │
          │  1. Read CR (当前期望)│
          │  2. 查询实际状态      │
          │  3. 对比差异          │
          │  4. 执行调整操作      │
          │  5. 更新 CR Status   │
          └──────────────────────┘

4. 使用已有 Operator

4.1 etcd-operator

# 安装 etcd-operator
kubectl apply -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/service_account.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/rbac/cluster-role.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/rbac/cluster-role-binding.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/deployment.yaml

# 查看 Operator
kubectl get pods -l name=etcd-operator

# 创建 etcd 集群
cat <<EOF | kubectl apply -f -
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "example-etcd-cluster"
spec:
  size: 3
  version: "3.5.15"
EOF

# 查看 etcd 集群
kubectl get etcdclusters
kubectl get pods -l etcd_cluster=example-etcd-cluster

# 扩容 etcd 集群
kubectl patch etcdcluster example-etcd-cluster --type='json' -p='[{"op": "replace", "path": "/spec/size", "value": 5}]'

# 修改 etcd 版本(Operator 自动滚动升级)
kubectl patch etcdcluster example-etcd-cluster --type='json' -p='[{"op": "replace", "path": "/spec/version", "value": "3.5.15"}]'

4.2 prometheus-operator

# 使用 Helm 安装 kube-prometheus-stack(包含 prometheus-operator)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack

# 查看 operator
kubectl get pods -l app.kubernetes.io/name=prometheus-operator

# 查看自定义资源(prometheus-operator 注册了大量 CRD)
kubectl get crd | grep monitoring.coreos.com
# prometheuses.monitoring.coreos.com
# alertmanagers.monitoring.coreos.com
# servicemonitors.monitoring.coreos.com
# podmonitors.monitoring.coreos.com
# prometheusrules.monitoring.coreos.com
# thanosrulers.monitoring.coreos.com

# 使用 ServiceMonitor 配置抓取
cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: metrics
    interval: 15s
EOF

# 查看 ServiceMonitor
kubectl get servicemonitors

4.3 cert-manager Operator

# 安装 cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# 查看 cert-manager Pod
kubectl get pods -n cert-manager

# 注册的自定义资源
kubectl get crd | grep cert-manager
# certificates.cert-manager.io
# issuers.cert-manager.io
# clusterissuers.cert-manager.io

# 创建证书
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-tls-cert
spec:
  dnsNames:
  - example.com
  secretName: my-tls-secret
  issuerRef:
    name: selfsigned-issuer
    kind: Issuer
EOF

# 查看证书
kubectl get certificates
kubectl get certificate my-tls-cert -o yaml
kubectl get secret my-tls-secret

5. OLM(Operator Lifecycle Manager)

OLM 是 Operator 的包管理器,负责 Operator 的安装、升级和生命周期管理。

# 安装 OLM
curl -sL https://github.com/Operator-framework/operator-lifecycle-manager/releases/download/v0.25.0/install.sh | bash -s v0.25.0

# 查看 OLM 组件
kubectl get pods -n olm

# 查看可用的 Operator(从 OperatorHub)
kubectl get packagemanifests -n olm

# 安装 Operator(通过 Subscription)
cat <<EOF | kubectl apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: prometheus
  namespace: operators
spec:
  channel: stable
  name: prometheus
  source: operatorhubio-catalog
  sourceNamespace: olm
EOF

# 查看已安装的 Operator
kubectl get operators
kubectl get subscriptions

# 查看 Operator 版本
kubectl get clusterserviceversion
kubectl get csv -n operators

6. kubebuilder / operator-sdk(构建 Operator 的工具)

# 安装 operator-sdk
export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; esac)
export OS=$(uname | awk '{print tolower($0)}')
curl -sL https://github.com/operator-framework/operator-sdk/releases/download/v1.33.0/operator-sdk_${OS}_${ARCH} -o operator-sdk
chmod +x operator-sdk && sudo mv operator-sdk /usr/local/bin/

# 创建 Operator 项目
operator-sdk init --domain example.com --repo github.com/example/database-operator

# 创建 API(CRD)
operator-sdk create api --group example --version v1 --kind Database --resource --controller

# 生成 CRD 清单
make generate
make manifests

CKA 考试要点

  1. CRD 基本结构 -- 知道 groupnamesscopeversionsschema 字段
  2. kubectl 操作 CRD 资源 -- CRD 资源如同原生资源一样使用 kubectl
  3. Operator = CRD + Controller -- 理解 Operator 模式的核心思想
  4. additionalPrinterColumns -- 定制 kubectl get 的输出列
  5. CRD 的 scope -- NamespacedCluster 的区别

🧪 完整操作实例:创建 CRD 并使用 Operator

场景描述

创建一个名为 databases.example.com 的 CRD,然后使用 Helm 安装 prometheus-operator(kube-prometheus-stack),验证自定义资源和 Operator 正常工作。

前置条件

  • 一个运行中的 Kubernetes 集群
  • kubectl 已配置
  • Helm v3 已安装

操作步骤

Step 1: 创建 CRD YAML

cat <<'EOF' > ~/crd-database.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  group: example.com
  names:
    plural: databases
    singular: database
    shortNames:
    - db
    kind: Database
    listKind: DatabaseList
  scope: Namespaced
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        required:
        - spec
        properties:
          spec:
            type: object
            required:
            - engine
            - version
            properties:
              engine:
                type: string
                enum:
                - mysql
                - postgres
              version:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 10
                default: 1
EOF

Step 2: 应用 CRD

kubectl apply -f ~/crd-database.yaml
# customresourcedefinition.apiextensions.k8s.io/databases.example.com created

# 验证 CRD
kubectl get crd
# NAME                     CREATED AT
# databases.example.com    2026-05-27T10:00:00Z

kubectl get crd databases.example.com -o yaml

Step 3: 创建自定义资源

cat <<'EOF' | kubectl apply -f -
apiVersion: example.com/v1
kind: Database
metadata:
  name: my-production-db
spec:
  engine: postgres
  version: "14.5"
  replicas: 3
EOF
# database.example.com/my-production-db created

# 使用 kubectl 操作自定义资源(与原生资源一样)
kubectl get databases
# NAME               ENGINE    VERSION   AGE
# my-production-db   postgres  14.5      10s

kubectl get db   # 使用 shortName
# NAME               ENGINE    VERSION   AGE
# my-production-db   postgres  14.5      10s

kubectl describe database my-production-db

Step 4: 安装 prometheus-operator(kube-prometheus-stack)

# 添加 prometheus 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# 安装 kube-prometheus-stack(包含 prometheus-operator)
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring --create-namespace
# NAME: prometheus
# LAST DEPLOYED: Tue May 27 10:00:00 2026
# NAMESPACE: monitoring
# STATUS: deployed
# REVISION: 1

Step 5: 查看 Operator 注册的 CRD

# 查看 prometheus-operator 注册的 CRD
kubectl get crd | grep monitoring.coreos.com
# alertmanagerconfigs.monitoring.coreos.com
# alertmanagers.monitoring.coreos.com
# podmonitors.monitoring.coreos.com
# prometheuses.monitoring.coreos.com
# prometheusrules.monitoring.coreos.com
# servicemonitors.monitoring.coreos.com

# 查看 Operator Pod
kubectl get pods -n monitoring
# NAME                                                     READY   STATUS    RESTARTS   AGE
# prometheus-kube-prometheus-operator-xxxxxxxxx-xxxxx      1/1     Running   0          2m
# prometheus-kube-state-metrics-xxxxxxxxx-xxxxx            1/1     Running   0          2m
# prometheus-prometheus-node-exporter-xxxxx                1/1     Running   0          2m

Step 6: 使用 Operator 的 CRD 创建 ServiceMonitor

cat <<'EOF' | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: metrics
    interval: 30s
EOF
# servicemonitor.monitoring.coreos.com/example-app-monitor created

# 查看 ServiceMonitor
kubectl get servicemonitors -n monitoring
# NAME                    AGE
# example-app-monitor     10s

验证结果

# 验证 CRD 自定义资源
kubectl get databases
# NAME               ENGINE    VERSION   AGE
# my-production-db   postgres  14.5      5m

# 验证 Operator 工作
kubectl get pods -n monitoring | grep operator
# prometheus-kube-prometheus-operator-xxxxxxxxx-xxxxx   1/1     Running   0   5m

# 验证 Operator 管理的 Prometheus
kubectl get prometheus -n monitoring
# NAME                                    VERSION   REPLICAS   AGE
# prometheus-kube-prometheus-prometheus   v2.xx.x   1          5m

考试提示

  • CRD 操作与原生资源完全一样:kubectl get/create/delete/describe <crd-resource>
  • CRD 的 metadata.name 必须遵循 <plural>.<group> 格式(如 databases.example.com
  • scope 决定了资源是 Namespaced 还是 Cluster 级别
  • Operator = CRD + Controller,安装 Operator 后相关的 CRD 会自动注册
  • CKA 考试对 CRD/Operator 只要求基本理解,不要求编写 Operator 代码

官方文档