Scaling and HPA
Manual scaling, HorizontalPodAutoscaler auto-scaling, metrics-server configuration
Overview
The CKA exam requires mastering manual scaling of Deployments and configuring HPA auto-scaling. HPA relies on metrics-server to provide Pod resource metrics.
1. Manual Scaling
1.1 kubectl scale
kubectl scale deployment/nginx --replicas=5
kubectl scale deployment nginx --replicas=3
# Scale ReplicaSet
kubectl scale rs/web-rs --replicas=4
# Scale StatefulSet
kubectl scale sts/web --replicas=5
# Check current replica count
kubectl get deployment nginx
kubectl get rs
kubectl get pods
# Conditional scale-down (--current-replicas validates current replica count)
kubectl scale deployment/nginx --current-replicas=5 --replicas=3
# If the current count is not 5, the command is not executed
1.2 Scale a Specific ReplicaSet Revision (Scaling After Rollback)
# View revision history
kubectl rollout history deployment nginx
# Roll back to revision 2
kubectl rollout undo deployment nginx --to-revision=2
# Scale up
kubectl scale deployment nginx --replicas=5
2. HorizontalPodAutoscaler (HPA)
HPA automatically adjusts the replica count of Deployments / StatefulSets based on CPU/memory utilization.
2.1 Prerequisites: metrics-server
# Install metrics-server (usually pre-installed in the exam environment)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify installation
kubectl get pods -n kube-system | grep metrics-server
# View node and Pod metrics
kubectl top nodes
kubectl top pods
# If the top command returns "metrics not available yet", wait for metrics-server to collect data (about 30s)
2.2 Create HPA
Method 1: Imperative (kubectl autoscale)
# Create HPA: scale up when CPU usage exceeds 50%, max 10 replicas, min 2
kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10
# Generate HPA YAML
kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10 --dry-run=client -o yaml
Method 2: Declarative YAML
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
2.3 View HPA Status
# View HPA
kubectl get hpa
kubectl get horizontalpodautoscaler
# View HPA details
kubectl describe hpa nginx-hpa
# View HPA YAML
kubectl get hpa nginx-hpa -o yaml
# Monitor HPA
kubectl get hpa -w
2.4 Example HPA Scaling Behavior Output
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-hpa Deployment/nginx 30%/50% 2 10 2 5m
nginx-hpa Deployment/nginx 70%/50% 2 10 4 6m
nginx-hpa Deployment/nginx 45%/50% 2 10 4 7m
2.5 Generate Load to Test Scaling
# Start a Pod that generates CPU load
kubectl run load-generator --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://nginx-service; done"
# Or use
kubectl run -i --tty load-generator --image=busybox --restart=Never -- sh -c "while true; do wget -q -O- http://nginx:80; done"
# Check if HPA is scaling
kubectl get hpa -w
# Delete the load generator after testing
kubectl delete pod load-generator
3. Advanced HPA Configuration
3.1 Custom Metrics (autoscaling/v2)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Pods
pods:
metric:
name: requests-per-second
target:
type: AverageValue
averageValue: 1000
behavior: # Scaling behavior control
scaleDown:
stabilizationWindowSeconds: 300 # Scale-down stabilization window (default 5 minutes)
policies:
- type: Percent
value: 100
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 0 # No waiting needed for scale-up
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
3.2 Behavior Policy Explanation
| Policy | Description |
|---|---|
stabilizationWindowSeconds | Stabilization window, prevents frequent scaling (flapping) |
scaleDown.policies | Scale-down policy: max percentage/count per second |
scaleUp.policies | Scale-up policy: max percentage/count per second |
selectPolicy: Max/Min/Disabled | Selection policy: Max / Min / Disabled |
4. metrics-server
4.1 Installation
# Quick install
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# If it doesn't work after installation, you may need to modify parameters
kubectl edit deployment metrics-server -n kube-system
# Add to spec.containers[0].args:
# - --kubelet-insecure-tls
# - --kubelet-preferred-address-types=InternalIP
4.2 Verification
# Wait for Pod to be ready
kubectl wait --namespace kube-system --for=condition=ready pod -l k8s-app=metrics-server --timeout=120s
# Test metric collection
kubectl top nodes
kubectl top pods
# If no data for a long time, check metrics-server logs
kubectl logs -n kube-system -l k8s-app=metrics-server
5. Useful Exam Commands
# 1. Quickly create a Deployment with resources (HPA requires Pods to have CPU requests)
kubectl create deployment nginx --image=nginx --dry-run=client -o yaml > nginx.yaml
# Edit to add resources.requests.cpu
# 2. Apply resource settings
vim nginx.yaml
kubectl apply -f nginx.yaml
# 3. Create HPA
kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=5
# 4. Verify
kubectl get hpa
kubectl get pods -w
# 5. Create CronJob for scheduled scaling (not HPA but useful for the exam)
kubectl create cronjob scale-up --image=bitnami/kubectl --schedule="0 8 * * 1-5" -- kubectl scale deployment nginx --replicas=10
kubectl create cronjob scale-down --image=bitnami/kubectl --schedule="0 18 * * 1-5" -- kubectl scale deployment nginx --replicas=2
🧪 Complete Hands-on Example: Manual Scaling + Configure HPA Auto-scaling
Scenario
First manually scale a Deployment, then configure CPU-based HPA auto-scaling.
Prerequisites
- A working Kubernetes cluster (minikube or kind recommended)
- kubectl is configured to connect to the cluster
- Pods in the Deployment must have CPU requests set (required by HPA)
Steps
Step 1: Create a Deployment with Resource Requests
cat <<'EOF' > nginx-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: "200m"
memory: "128Mi"
ports:
- containerPort: 80
EOF
kubectl apply -f nginx-deploy.yaml
# Expected output: deployment.apps/nginx created
kubectl get deployment nginx
# Expected output: NAME READY UP-TO-DATE AVAILABLE AGE
# nginx 2/2 2 2 <seconds>
Step 2: Manual Scaling
# Scale up to 5 replicas
kubectl scale deployment nginx --replicas=5
# Expected output: deployment.apps/nginx scaled
kubectl get deployment nginx
# Expected output: NAME READY UP-TO-DATE AVAILABLE AGE
# nginx 5/5 5 5 <seconds>
# Scale down back to 2 replicas
kubectl scale deployment nginx --replicas=2
# Expected output: deployment.apps/nginx scaled
Step 3: Verify metrics-server is Installed
kubectl get pods -n kube-system | grep metrics-server
# Expected output: metrics-server-<hash> 1/1 Running 0 <time>
# If not installed, install it
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify metrics are available
kubectl top pods
# Expected output: NAME CPU(cores) MEMORY(bytes)
# nginx-<hash>-<pod> 1m 10Mi
# nginx-<hash>-<pod> 2m 12Mi
Step 4: Create HPA (CPU-based)
kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10
# Expected output: horizontalpodautoscaler.autoscaling/nginx autoscaled
kubectl get hpa nginx
# Expected output: NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# nginx Deployment/nginx 0%/50% 2 10 2 <seconds>
Step 5: Generate CPU Load to Trigger Auto-scaling
# Expose a Service for the load generator to access
kubectl expose deployment nginx --port=80 --target-port=80
# Expected output: service/nginx exposed
# Start load generator
kubectl run load-generator --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://nginx; done"
# Expected output: pod/load-generator created
# Monitor HPA (open another terminal or append &)
kubectl get hpa nginx -w
# Expected output (changes start after about 1-2 minutes):
# nginx Deployment/nginx 0%/50% 2 10 2 2m
# nginx Deployment/nginx 65%/50% 2 10 4 3m
# nginx Deployment/nginx 80%/50% 2 10 5 4m
# nginx Deployment/nginx 45%/50% 2 10 5 5m
Verification
# View HPA final state
kubectl get hpa nginx
# Expected output: CPU usage below 50% (may drop back to low after load stabilizes)
# View replica count changes
kubectl get deployment nginx
# Expected output: REPLICAS may be > 2 (auto-scaled)
# Observe scale-down after stopping the load (auto-scales back to 2 after a few minutes)
kubectl delete pod load-generator
# Expected output: pod "load-generator" deleted
# Cleanup
kubectl delete deployment nginx
kubectl delete service nginx
kubectl delete hpa nginx
Exam Tips
- Pods must have CPU requests set to be recognized by HPA, otherwise HPA cannot calculate CPU utilization
kubectl autoscaleis the fastest way to create HPA, suitable for the time-pressured CKA exam- HPA collects metrics every 15 seconds by default, scale-up has no delay policy, scale-down has a default 5-minute stabilization window
- metrics-server is usually pre-installed in the exam environment, but it's recommended to verify with
kubectl top nodesfirst - If
kubectl topreturnsmetrics not available yet, wait about 30-60 seconds and retry