Node Troubleshooting
CKA Exam Domain 5 — Node NotReady troubleshooting, kubelet inspection, system resource diagnosis, certificate handling
← Back to CKA Practice Index Nodes are the worker machines of a Kubernetes cluster. Node failures directly impact Pod operations. Node troubleshooting is a high-frequency topic in the CKA exam.
1. Node NotReady Status Troubleshooting Flow
# 1. Check node status
kubectl get nodes
# 2. View node details (look at the Conditions section)
kubectl describe node <node-name>
# 3. SSH into the problematic node
ssh <user>@<node-ip>
# 4. Check kubelet status
sudo systemctl status kubelet
# 5. View kubelet logs
sudo journalctl -u kubelet -n 100 --no-pager
# 6. Check the container runtime
sudo systemctl status containerd
# or
sudo systemctl status docker
Troubleshooting flowchart:
Node NotReady
│
├─ SSH to node
│
├─ systemctl status kubelet
│ ├─ inactive → systemctl start kubelet
│ └─ active → check logs
│
├─ journalctl -u kubelet -n 50
│ ├─ Certificate error → check certificates
│ ├─ Network plugin error → check CNI
│ └─ Insufficient resources → check system resources
│
├─ Check disk space
├─ Check memory
└─ Check container runtime
2. kubelet Status Check
# View kubelet service status
sudo systemctl status kubelet
# Start / Stop / Restart kubelet
sudo systemctl start kubelet
sudo systemctl stop kubelet
sudo systemctl restart kubelet
# Enable kubelet to start on boot
sudo systemctl enable kubelet
3. Viewing kubelet Logs
# View recent kubelet logs (recommended)
sudo journalctl -u kubelet -n 100 -f
# View logs from a specific time range
sudo journalctl -u kubelet --since "5 min ago"
# View all logs (paged)
sudo journalctl -u kubelet --no-pager
# Output logs to a file for analysis
sudo journalctl -u kubelet --no-pager > /tmp/kubelet.log
4. kubelet Configuration Check
# View kubelet configuration (kubeadm deployment)
kubectl get nodes -o wide
cat /var/lib/kubelet/config.yaml
# kubelet startup parameters
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# or
ps aux | grep kubelet
# Check kubelet certificates
ls /var/lib/kubelet/pki/
5. System Resource Diagnosis
Disk Space
# Check disk usage
df -h
# Check the /var directory (Docker/containerd image storage)
du -sh /var/lib/containerd/
du -sh /var/lib/docker/
# Clean up unused container images
docker image prune -a
# or
crictl rmi --prune
Memory Usage
# Check memory
free -h
# View memory-consuming processes
top
# or
htop
Docker / containerd Status
# containerd (newer versions)
sudo systemctl status containerd
sudo crictl ps
# Docker (older versions)
sudo systemctl status docker
sudo docker ps
6. Handling Expired Node Certificates
# Check certificate validity (kubeadm deployment)
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -dates
openssl x509 -in /var/lib/kubelet/pki/kubelet.crt -noout -dates
# Renew certificates with kubeadm
sudo kubeadm certs renew all
# Update kubeconfig
sudo kubeadm init phase kubeconfig all
# Restart kubelet
sudo systemctl restart kubelet
# Check certificate expiration time
kubeadm certs check-expiration
7. kubectl describe node -- View Node Details
# Comprehensive node information view
kubectl describe node <node-name>
# Key areas to focus on:
# - Conditions: Ready, DiskPressure, MemoryPressure, PIDPressure
# - Capacity / Allocatable: CPU, Memory, Pod count
# - Non-terminated Pods: Pods running on this node
# - Events: Node-related events
Conditions explained:
| Condition | Description |
|---|---|
Ready | Whether the node is healthy |
DiskPressure | Whether disk space is insufficient |
MemoryPressure | Whether memory is insufficient |
PIDPressure | Whether there are too many PIDs |
NetworkUnavailable | Whether the network is healthy |
8. Node Recovery Steps
# Step 1: SSH to the node for diagnosis
ssh <user>@<node-ip>
# Step 2: Restart kubelet
sudo systemctl restart kubelet
# Step 3: Verify kubelet status
sudo systemctl status kubelet
# Step 4: Return to the master node and verify
kubectl get nodes
kubectl describe node <node-name>
# Step 5: If the node is still unavailable, try cordon/drain
kubectl cordon <node-name> # Mark as unschedulable
kubectl drain <node-name> --ignore-daemonsets # Evict Pods
9. Exam Key Points
- When a node is NotReady, the first step is to SSH into the node
journalctl -u kubeletis the most important diagnostic command- A full disk (
/vardirectory) is a common cause of failure - After certificate expiry, use
kubeadm certs renew allto renew - The Condition fields in
kubectl describe nodeare key to pinpointing issues - The exam environment does not support rebooting nodes; focus on kubelet restarts
🧪 Complete Hands-on Example: Troubleshooting a Node NotReady Failure
Scenario
Simulate a node entering a NotReady state and walk through the complete troubleshooting flow from viewing node status, SSHing into the node, checking kubelet logs, to final recovery.
Prerequisites
- A cluster with a Master node and Worker nodes
- SSH access to the Worker node
- kubelet managed by systemd on the node
Steps
Step 1: Discover the Abnormal Node Status
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# master-node Ready control-plane 10d v1.28.0
# worker-node1 NotReady <none> 10d v1.28.0
Step 2: View Node Details to Find Diagnostic Clues
kubectl describe node worker-node1
# ...
# Conditions:
# Type Status LastHeartbeatTime Reason
# ---- ------ ----------------- ------
# Ready Unknown 2026-05-27T10:00:00Z NodeStatusUnknown
# ...
# Message: Kubelet stopped posting node status.
Step 3: SSH into the Problem Node and Check kubelet Status
ssh worker-node1
sudo systemctl status kubelet
# ● kubelet.service - kubelet: The Kubernetes Node Agent
# Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
# Active: inactive (dead) ← kubelet is not running
Step 4: View kubelet Logs to Determine the Root Cause
sudo journalctl -u kubelet -n 50 --no-pager
# May 27 09:55:00 worker-node1 kubelet[1234]: E0527 09:55:00.123456 1234 kubelet.go:1234] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
# May 27 09:55:00 worker-node1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Step 5: Check System Resources (Disk Space and Container Runtime)
# Check disk space
df -h
# Filesystem Size Used Avail Use% Mounted on
# /dev/sda1 50G 12G 35G 26% /
# → Disk space is sufficient
# Check container runtime
sudo systemctl status containerd
# ● containerd.service - Container Runtime
# Active: active (running)
# → Container runtime is normal
Step 6: Fix the Configuration and Restart kubelet
# Based on the logs, modify the cgroup driver configuration
# Edit the kubelet configuration file (this demo fixes and then launches directly)
sudo systemctl start kubelet
# Check startup status
sudo systemctl status kubelet
# ● kubelet.service - kubelet: The Kubernetes Node Agent
# Active: active (running) ← Now running
# Enable on boot (ensure it starts automatically after a reboot)
sudo systemctl enable kubelet
Step 7: Return to the Master Node and Verify Recovery
exit
# Back on the Master node
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# master-node Ready control-plane 10d v1.28.0
# worker-node1 Ready <none> 10d v1.28.0
# → Node has recovered to normal
Verification
# Verify the node's Ready condition
kubectl get nodes worker-node1 -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
# True
# Verify kubelet is running normally
ssh worker-node1 'sudo systemctl is-active kubelet'
# active
# Confirm Pods on this node have recovered
kubectl get pods -o wide --field-selector spec.nodeName=worker-node1
Exam Tips
- When a node is NotReady, the first step is to SSH into the node and check
systemctl status kubelet journalctl -u kubelet -n 50is the most critical diagnostic command; it reveals the specific error messages- Common causes: kubelet not running, disk full (
df -h), expired certificates, container runtime abnormal - After fixing, run
systemctl restart kubelet, then return to the Master node and verify withkubectl get nodes - If the node remains NotReady, check the Conditions field of
kubectl describe nodefor more information