Control Plane Troubleshooting
CKA Exam Domain 5 — API Server, Scheduler, Controller Manager, and etcd troubleshooting
← Back to CKA Practice Index The control plane is the brain of a Kubernetes cluster. In the CKA exam, control plane component troubleshooting commonly involves scenarios such as static Pod configuration, etcd health checks, and component restarts.
1. Control Plane Component Overview
| Component | Function | Deployment Method |
|---|---|---|
| kube-apiserver | Entry point for all API requests | Static Pod (/etc/kubernetes/manifests/) |
| kube-scheduler | Pod scheduling decisions | Static Pod |
| kube-controller-manager | Controller management | Static Pod |
| etcd | Cluster data storage | Static Pod |
# View control plane Pods
kubectl get pods -n kube-system
# View the static Pod configuration directory
ls /etc/kubernetes/manifests/
# kube-apiserver.yaml
# kube-scheduler.yaml
# kube-controller-manager.yaml
# etcd.yaml
2. API Server Troubleshooting
The API Server is the core component of the cluster. When it is inaccessible, the entire cluster becomes unusable.
Check API Server Status
# Check API Server Pod
kubectl get pods -n kube-system | grep apiserver
# View API Server logs
kubectl logs -n kube-system kube-apiserver-<node-name>
kubectl logs -n kube-system kube-apiserver-<node-name> --tail=100
# If API Server is completely unavailable, use Docker/containerd on the master node
crictl ps | grep apiserver
crictl logs <container-id>
Static Pod Configuration Repair
# Check the static Pod configuration file
cat /etc/kubernetes/manifests/kube-apiserver.yaml
# Common issues: incorrect certificate paths, incorrect etcd addresses, incorrect service-cluster-ip-range
# After modifying the configuration, kubelet will automatically recreate the static Pod
vi /etc/kubernetes/manifests/kube-apiserver.yaml
Troubleshooting Steps When API Server Is Unavailable
# Step 1: SSH to the master node
ssh <master-node>
# Step 2: Check if the static Pod configuration file exists
ls -la /etc/kubernetes/manifests/kube-apiserver.yaml
# Step 3: Check if kubelet is running
systemctl status kubelet
# Step 4: Check the container runtime
crictl ps | grep apiserver
# Step 5: View kubelet logs to locate the issue
journalctl -u kubelet -n 50 --no-pager
3. Scheduler Troubleshooting
Pod Not Being Scheduled
# View unscheduled Pods
kubectl get pods --all-namespaces | grep Pending
# View scheduling failure reasons
kubectl describe pod <pod-name>
# Events:
# FailedScheduling 30s default-scheduler 0/2 nodes are available
# View Scheduler logs
kubectl logs -n kube-system kube-scheduler-<master-name>
Common Scheduling Issues
| Issue | Cause | Solution |
|---|---|---|
| Pod Pending | Insufficient node resources | Add nodes or adjust resource requests |
| Pod Pending | Node has taints | Add tolerations |
| Pod Pending | Node selector mismatch | Modify nodeSelector |
| Pod not scheduled to desired node | Incorrect weight or affinity configuration | Check affinity configuration |
Scheduler Configuration Check
# Check Scheduler configuration file
cat /etc/kubernetes/manifests/kube-scheduler.yaml
# Check Scheduler startup parameters
kubectl get pods -n kube-system kube-scheduler-<master-name> -o yaml
4. Controller Manager Troubleshooting
# View Controller Manager logs
kubectl logs -n kube-system kube-controller-manager-<master-name> --tail=50
# Check Controller Manager configuration
cat /etc/kubernetes/manifests/kube-controller-manager.yaml
# Check if control loops are functioning normally
# Logs should contain normal output from controllers: replicaset, deployment, node, serviceaccount, etc.
Common issues:
- Node Controller not correctly marking node status
- Deployment Controller not creating ReplicaSets
- Service Account Controller not creating Tokens
5. etcd Member Health Check
# Method 1: Use etcdctl
# Note: etcdctl requires setting endpoint, certificate, and other environment variables
# Check etcd endpoint health
kubectl exec -it -n kube-system etcd-<master-name> -- \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint health --cluster
# List etcd members
kubectl exec -it -n kube-system etcd-<master-name> -- \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
# Check etcd logs
kubectl logs -n kube-system etcd-<master-name> --tail=100
Health check output example:
https://192.168.1.10:2379 is healthy: successfully committed proposal: took = 2.345ms
https://192.168.1.11:2379 is healthy: successfully committed proposal: took = 3.012ms
https://192.168.1.12:2379 is healthy: successfully committed proposal: took = 1.987ms
6. etcd Data Directory Full Handling
# Check etcd data directory size
du -sh /var/lib/etcd/
# Check disk space
df -h
# Compact etcd data (frees space but does not reduce data directory size)
kubectl exec -it -n kube-system etcd-<master-name> -- \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
compaction $(etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint status --write-out=json | jq -r '.[].Status.header.revision')
# Defragmentation (actually frees disk space)
kubectl exec -it -n kube-system etcd-<master-name> -- \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
defrag
# Clean up etcd history after compaction
# etcd server cleans up automatically, or trigger manual compaction
7. Restarting Control Plane Components
# Method 1: Delete the Pod (static Pods will be recreated by kubelet)
kubectl delete pod -n kube-system kube-apiserver-<master-name>
kubectl delete pod -n kube-system kube-scheduler-<master-name>
kubectl delete pod -n kube-system kube-controller-manager-<master-name>
kubectl delete pod -n kube-system etcd-<master-name>
# Method 2: Move the static Pod configuration file (temporary removal)
mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
sleep 30
mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/
# Method 3: Modify the static Pod configuration to trigger a restart
# kubelet detects file changes and recreates the container
vi /etc/kubernetes/manifests/kube-apiserver.yaml
Note: Control plane components are static Pods.
kubectl deletedoes not truly delete them; kubelet automatically restores them from/etc/kubernetes/manifests/.
8. General Troubleshooting Command Reference
# View status of all control plane components
kubectl get pods -n kube-system
# View component logs (latest 50 lines)
kubectl logs -n kube-system <pod-name> --tail=50
# View component events
kubectl get events -n kube-system --sort-by='.lastTimestamp'
# View control plane component configuration
kubectl get pods -n kube-system <pod-name> -o yaml
# Check the static Pod directory on the master node
ls -la /etc/kubernetes/manifests/
9. Exam Key Points
- Control plane components are static Pods, configured under
/etc/kubernetes/manifests/ kubectl deleteon static Pods does not delete them; kubelet recreates them automatically- etcd health check uses
etcdctl endpoint health - The etcd endpoint is typically
https://127.0.0.1:2379 - etcd certificate path:
/etc/kubernetes/pki/etcd/ - When API Server is unavailable, check static Pod configuration and kubelet logs
- Common CKA exam issues: incorrect certificate paths, incorrect etcd endpoint addresses
🧪 Complete Hands-on Example: Troubleshoot API Server Failure
Scenario Description
The API Server is responding abnormally, and kubectl commands are not working. Walk through the troubleshooting process from checking control plane Pod status, static Pod configuration, and etcd health checks to full recovery.
Prerequisites
- A cluster with a Master node
- SSH access to the Master node
kubeadmtool available on the Master node
Steps
Step 1: Detect API Server anomaly
kubectl get nodes
# The connection to the server <master-ip>:6443 was refused - did you specify the right host or port?
# API Server is completely unavailable
Step 2: SSH to the Master node and check control plane Pods
ssh master-node
# Use containerd to directly view container status (since kubectl is unavailable)
crictl ps | grep apiserver
# If no output, the API Server container is not running
# Check all kube-system containers
crictl ps -a | grep -E "apiserver|scheduler|controller|etcd"
# CONTAINER ID IMAGE CREATED STATUS NAME
# ... ... 10m ago Exited kube-apiserver
Step 3: Check the static Pod configuration file
ls -la /etc/kubernetes/manifests/
# total 16
# -rw------- 1 root root 2153 May 27 09:00 kube-apiserver.yaml
# -rw------- 1 root root 2000 May 27 09:00 kube-controller-manager.yaml
# -rw------- 1 root root 1585 May 27 09:00 kube-scheduler.yaml
# -rw------- 1 root root 1466 May 27 09:00 etcd.yaml
# Inspect API Server configuration (look for common configuration errors)
cat /etc/kubernetes/manifests/kube-apiserver.yaml
# Focus on:
# - --etcd-servers: Is the address correct?
# - --tls-cert-file / --tls-private-key-file: Do the certificate paths exist?
# - --service-cluster-ip-range: Is it valid?
Step 4: View kubelet logs to get API Server startup errors
sudo journalctl -u kubelet -n 50 --no-pager
# May 27 10:00:00 master-node kubelet[1234]: E1001 10:00:00.123456 1234 kubelet.go:1234] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: ..."
# May 27 10:00:01 master-node kubelet[1234]: E1001 10:00:01.123456 1234 kubelet.go:5678] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests/kube-apiserver.yaml"
If the output shows that the configuration path does not exist, the configuration file has been accidentally deleted or moved.
Step 5: Check if certificates have expired
# Check API Server certificate
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates
# notBefore=May 27 09:00:00 2025 GMT
# notAfter=May 27 09:00:00 2026 GMT
# -> Certificate has expired!
# Check all certificate validity periods
sudo kubeadm certs check-expiration
# [check-expiration] Checking expiration for all certificates ...
# apiserver.crt May 27 09:00:00 2026 to May 27 09:00:00 2026 <expired
# apiserver-kubelet-client.crt ...
Step 6: Renew certificates and restart components
# Renew all certificates
sudo kubeadm certs renew all
# certificate renewal succeeded
# Verify certificates have been renewed
sudo kubeadm certs check-expiration
# apiserver.crt May 27 09:00:00 2026 to May 27 10:30:00 2027 ← Extended
# Restart kubelet to apply new certificates
sudo systemctl restart kubelet
# Wait for static Pods to be recreated automatically
sleep 30
Step 7: Verify etcd health
# Use etcdctl to check etcd endpoint health
kubectl exec -it -n kube-system etcd-master-node -- \
etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint health
# https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 2.345ms
Step 8: Verify cluster recovery
# Verify from the Master node
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# master-node Ready control-plane 10d v1.28.0
# worker-node1 Ready <none> 10d v1.28.0
kubectl get pods -n kube-system | grep apiserver
# kube-apiserver-master-node 1/1 Running 0 2m
Verification Results
# Confirm API Server is running normally
kubectl get componentstatuses
# NAME STATUS MESSAGE ERROR
# controller-manager Healthy ok
# scheduler Healthy ok
# etcd-0 Healthy {"health":"true"}
# Verify certificate validity
echo | openssl s_client -connect localhost:6443 2>/dev/null | openssl x509 -noout -dates
Exam Tips
- When API Server is unavailable, SSH directly to the Master node and use
crictlordockerto check container status - Control plane components are static Pods; configuration files are under
/etc/kubernetes/manifests/ - Certificate expiration is a common exam topic; use
kubeadm certs renew allto renew - After fixing,
systemctl restart kubelettriggers static Pod recreation - Use
etcdctl endpoint healthfor etcd health checks; memorize the certificate paths