Pod Troubleshooting
CKA Exam Domain 5 — Common Pod troubleshooting, CrashLoopBackOff, ImagePullBackOff, Pending state
← Back to CKA Practice Index Pods are the smallest scheduling unit in Kubernetes. Pod failures are the most common troubleshooting scenario in the CKA exam.
1. Pod Status Quick Reference
| Status | Meaning |
|---|---|
Pending | Pod not yet scheduled, or image being pulled |
Running | Pod running normally |
CrashLoopBackOff | Container repeatedly crashes and restarts |
ImagePullBackOff | Image pull failed |
ErrImagePull | Image pull error |
OOMKilled | Container killed due to memory overrun |
CreateContainerConfigError | Container configuration error (e.g., ConfigMap does not exist) |
Init:Error / Init:CrashLoopBackOff | Init container failed |
Terminating | Pod is terminating (may be stuck) |
2. CrashLoopBackOff Troubleshooting
# 1. Check Pod status
kubectl get pods
# 2. View container logs
kubectl logs <pod-name>
# 3. View logs from the previous crashed instance
kubectl logs <pod-name> --previous
# 4. View Pod details (look for error reasons in the Events section)
kubectl describe pod <pod-name>
# 5. Enter the container for inspection
kubectl exec -it <pod-name> -- /bin/sh
Common Causes:
| Cause | Troubleshooting Method |
|---|---|
| Application code error | kubectl logs to check errors |
| Startup command failure | Check Dockerfile ENTRYPOINT / CMD |
| Configuration error | Check ConfigMap / Secret mounts |
| Health check failure | Check liveness / readiness probe configuration |
| Port conflict | Check containerPort configuration |
3. ImagePullBackOff / ErrImagePull Troubleshooting
# 1. View Pod details
kubectl describe pod <pod-name>
# Output will show something like:
# Failed to pull image "nginx:latst": rpc error: ...
# Error: ErrImagePull
# Back-off pulling image "nginx:latst"
Common Causes and Solutions:
| Cause | Solution |
|---|---|
| Image name typo | Check the image field, e.g., nginx:latst should be nginx:latest |
| Image tag does not exist | Use kubectl edit pod to modify the tag |
| Private registry not authenticated | Create an ImagePullSecret |
| Registry unreachable | Check network connectivity |
| Image does not exist | Confirm image has been pushed to the registry |
Private Registry Authentication:
# Create Docker registry Secret
kubectl create secret docker-registry regcred \
--docker-server=<registry> \
--docker-username=<user> \
--docker-password=<pass> \
--docker-email=<email>
# Reference in Pod
# spec:
# imagePullSecrets:
# - name: regcred
4. Pending State Troubleshooting
kubectl describe pod <pod-name>
The Events section will show the reason for scheduling failure:
| Reason | Solution |
|---|---|
0/1 nodes are available: Insufficient cpu | Insufficient node CPU resources |
0/1 nodes are available: Insufficient memory | Insufficient node memory resources |
0/1 nodes are available: node(s) had taint | Node has taints, toleration needed |
0/1 nodes are available: pod has unbound PVC | PVC not bound or does not exist |
0/1 nodes are available: node(s) didn't match node selector | Node labels do not match |
Check Resources:
# View node resource capacity
kubectl describe node <node-name>
# View node resource allocation
kubectl top node
# View Pod resource requests
kubectl get pod <pod-name> -o yaml | grep -A 5 resources
5. OOMKilled (Memory Overrun)
# Status is OOMKilled
kubectl get pod
# NAME STATUS RESTARTS
# my-pod OOMKilled 5
# View logs (logs may be lost after container is OOM killed)
kubectl logs <pod-name> --previous
# View container exit reason
kubectl describe pod <pod-name>
# Last State: Terminated
# Reason: OOMKilled
# Exit Code: 137
Solutions:
# Increase memory limit
kubectl set resources pod <pod-name> --limits=memory=512Mi
# Or edit the Pod (Deployment)
kubectl edit deployment <deployment-name>
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"
6. Init Container Failure
# View Init container status
kubectl describe pod <pod-name>
# View Init container logs
kubectl logs <pod-name> -c <init-container-name>
# View previous Init container logs
kubectl logs <pod-name> -c <init-container-name> --previous
Example:
spec:
initContainers:
- name: init-setup
image: busybox
command: ["sh", "-c", "echo 'init done'"]
containers:
- name: app
image: nginx
7. Readiness / Liveness Probe Failure
kubectl describe pod <pod-name>
The Events section will show:
Warning Unhealthy 3s (x5 over 30s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Warning Unhealthy 10s (x3 over 50s) kubelet Readiness probe failed: Get "http://10.244.1.2:8080/healthz": dial tcp 10.244.1.2:8080: connect: connection refused
Troubleshooting Steps:
# 1. Confirm the application port
kubectl exec <pod-name> -- netstat -tlnp
# 2. Test the probe path
kubectl exec <pod-name> -- wget -qO- http://localhost:8080/healthz
# 3. Check probe configuration
kubectl get pod <pod-name> -o yaml | grep -A 15 livenessProbe
8. kubectl exec for Container Diagnostics
# Enter container shell
kubectl exec -it <pod-name> -- /bin/sh
kubectl exec -it <pod-name> -- /bin/bash
# Execute commands in the container
kubectl exec <pod-name> -- ls /app
kubectl exec <pod-name> -- env
kubectl exec <pod-name> -- cat /etc/config/config.yaml
# Specify container (multi-container Pod)
kubectl exec -it <pod-name> -c <container-name> -- /bin/sh
9. kubectl debug for Temporary Debug Containers
Kubernetes v1.25+ supports temporary debug containers (Ephemeral Containers) via
kubectl debug.
# Add a debug container to a running Pod
kubectl debug <pod-name> -it --image=busybox
# Copy a Pod and replace the image for debugging
kubectl debug <pod-name> -it --copy-to=<debug-name> --container=<container> --image=busybox
# Create a debug Pod for a node
kubectl debug node/<node-name> -it --image=busybox
10. General Troubleshooting Command Quick Reference
# Pod status overview
kubectl get pods -o wide
kubectl get pods --all-namespaces | grep -v Running
# View events
kubectl get events --sort-by='.lastTimestamp'
kubectl get events -w
# View full YAML
kubectl get pod <pod-name> -o yaml
# View all resource events
kubectl get events --all-namespaces
11. Exam Key Points
- For
CrashLoopBackOff, first checkkubectl logs, thenkubectl describe ImagePullBackOffis usually caused by a typo in the image name- For
Pending, check the scheduling failure reason in Events - The exit code for
OOMKilledis 137 - Init container logs are viewed with
-c <container-name> - The Events section of
kubectl describe podis the most important diagnostic information
🧪 Complete Hands-On Example: Troubleshooting CrashLoopBackOff
Scenario
A Pod repeatedly crashes and restarts (CrashLoopBackOff). Walk through a complete troubleshooting process from start to finish, covering log inspection, configuration checking, fixing, and result verification.
Prerequisites
- There is a Pod in CrashLoopBackOff state in the cluster
- Permission to use
kubectl logsandkubectl describe
Steps
Step 1: Identify the abnormal Pod
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# nginx-crash 0/1 CrashLoopBackOff 5 (15s ago) 2m
# web-app 1/1 Running 0 10m
Step 2: View Pod details (look for clues in Events)
kubectl describe pod nginx-crash
# ...
# Containers:
# nginx:
# Container ID: containerd://abc123
# State: Waiting
# Reason: CrashLoopBackOff
# Last State: Terminated
# Reason: Error
# Exit Code: 1
# Finished At: 2026-05-27T10:01:00Z
# ...
# Events:
# Type Reason Age From Message
# ---- ------ ---- ---- -------
# Normal Scheduled 3m default-scheduler Successfully assigned default/nginx-crash to worker-node1
# Normal Pulled 3m kubelet Successfully pulled image "nginx:latest" in 2.345s
# Normal Created 3m kubelet Created container nginx
# Normal Started 3m kubelet Started container nginx
# Warning BackOff 15s (x5 over 2m40s) kubelet Back-off restarting failed container
Exit Code is 1, indicating the application process exited abnormally.
Step 3: View current instance logs
kubectl logs nginx-crash
# 2026/05/27 10:00:00 [emerg] 1#1: open() "/etc/nginx/nginx.conf" failed (2: No such file or directory)
# nginx: [emerg] open() "/etc/nginx/nginx.conf" failed (2: No such file or directory)
Found that Nginx cannot find the configuration file.
Step 4: View logs from the previous crashed instance (if needed)
kubectl logs nginx-crash --previous
# (Same as current logs, indicating consistent crash cause)
Step 5: Enter container to inspect configuration (test with a non-crashing Pod image)
# Since the container keeps crashing, use kubectl debug to create a debug copy
kubectl debug nginx-crash -it --image=nginx --copy-to=nginx-debug -- /bin/bash
# Or check inside a running debug container
kubectl exec -it nginx-debug -- ls -la /etc/nginx/
# Found missing nginx.conf file → configuration issue
Step 6: Fix the problem
# Check Deployment/Pod configuration, find the root cause
# The original Pod YAML may have mounted an incorrect ConfigMap that overwrote nginx.conf
# Method 1: Directly edit the Deployment to fix configuration
kubectl edit deployment nginx-crash
# Fix the mounted ConfigMap name or path
# Method 2: If ConfigMap content is wrong, edit the ConfigMap
kubectl edit configmap nginx-config
# Ensure it contains the correct nginx.conf content
Step 7: Verify Pod is running again
kubectl get pods -w
# nginx-crash 1/1 Running 0 30s
# → Pod has returned to normal running state
kubectl describe pod nginx-crash
# State: Running
# Started: ...
# No more CrashLoopBackOff in Events
Verification
# Verify Pod status is stable
kubectl get pods nginx-crash
# NAME READY STATUS RESTARTS AGE
# nginx-crash 1/1 Running 0 1m
# Verify service responds normally
kubectl port-forward pod/nginx-crash 8080:80 &
curl http://localhost:8080
# <!DOCTYPE html>
# <html>...(Nginx homepage returns normally)
Exam Tips
- CrashLoopBackOff troubleshooting order:
kubectl describe→kubectl logs→kubectl logs --previous - Exit code meanings: 0=normal exit, 1=application error, 137=OOMKilled (SIGKILL), 143=graceful termination (SIGTERM)
kubectl logs --previousviews logs before the crash, very valuable when the container restarts repeatedly- If the container starts too fast to catch logs, use
kubectl debugto create a debug copy - Checking liveness/readiness probe configuration errors is also a common cause