Scaling and HPA

2026-05-27·CKA k8s Practice

Manual scaling, HorizontalPodAutoscaler auto-scaling, metrics-server configuration

← Back to CKA Practice Index

Overview

The CKA exam requires mastering manual scaling of Deployments and configuring HPA auto-scaling. HPA relies on metrics-server to provide Pod resource metrics.

1. Manual Scaling

1.1 kubectl scale

kubectl scale deployment/nginx --replicas=5
kubectl scale deployment nginx --replicas=3

# Scale ReplicaSet
kubectl scale rs/web-rs --replicas=4

# Scale StatefulSet
kubectl scale sts/web --replicas=5

# Check current replica count
kubectl get deployment nginx
kubectl get rs
kubectl get pods

# Conditional scale-down (--current-replicas validates current replica count)
kubectl scale deployment/nginx --current-replicas=5 --replicas=3
# If the current count is not 5, the command is not executed

1.2 Scale a Specific ReplicaSet Revision (Scaling After Rollback)

# View revision history
kubectl rollout history deployment nginx

# Roll back to revision 2
kubectl rollout undo deployment nginx --to-revision=2

# Scale up
kubectl scale deployment nginx --replicas=5

2. HorizontalPodAutoscaler (HPA)

HPA automatically adjusts the replica count of Deployments / StatefulSets based on CPU/memory utilization.

2.1 Prerequisites: metrics-server

# Install metrics-server (usually pre-installed in the exam environment)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify installation
kubectl get pods -n kube-system | grep metrics-server

# View node and Pod metrics
kubectl top nodes
kubectl top pods

# If the top command returns "metrics not available yet", wait for metrics-server to collect data (about 30s)

2.2 Create HPA

Method 1: Imperative (kubectl autoscale)

# Create HPA: scale up when CPU usage exceeds 50%, max 10 replicas, min 2
kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10

# Generate HPA YAML
kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10 --dry-run=client -o yaml

Method 2: Declarative YAML

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

2.3 View HPA Status

# View HPA
kubectl get hpa
kubectl get horizontalpodautoscaler

# View HPA details
kubectl describe hpa nginx-hpa

# View HPA YAML
kubectl get hpa nginx-hpa -o yaml

# Monitor HPA
kubectl get hpa -w

2.4 Example HPA Scaling Behavior Output

NAME         REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx-hpa    Deployment/nginx    30%/50%   2         10        2          5m
nginx-hpa    Deployment/nginx    70%/50%   2         10        4          6m
nginx-hpa    Deployment/nginx    45%/50%   2         10        4          7m

2.5 Generate Load to Test Scaling

# Start a Pod that generates CPU load
kubectl run load-generator --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://nginx-service; done"

# Or use
kubectl run -i --tty load-generator --image=busybox --restart=Never -- sh -c "while true; do wget -q -O- http://nginx:80; done"

# Check if HPA is scaling
kubectl get hpa -w

# Delete the load generator after testing
kubectl delete pod load-generator

3. Advanced HPA Configuration

3.1 Custom Metrics (autoscaling/v2)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        type: AverageValue
        averageValue: 1000
  behavior:                           # Scaling behavior control
    scaleDown:
      stabilizationWindowSeconds: 300 # Scale-down stabilization window (default 5 minutes)
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0   # No waiting needed for scale-up
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

3.2 Behavior Policy Explanation

Policy	Description
`stabilizationWindowSeconds`	Stabilization window, prevents frequent scaling (flapping)
`scaleDown.policies`	Scale-down policy: max percentage/count per second
`scaleUp.policies`	Scale-up policy: max percentage/count per second
`selectPolicy: Max/Min/Disabled`	Selection policy: Max / Min / Disabled

4. metrics-server

4.1 Installation

# Quick install
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# If it doesn't work after installation, you may need to modify parameters
kubectl edit deployment metrics-server -n kube-system
# Add to spec.containers[0].args:
# - --kubelet-insecure-tls
# - --kubelet-preferred-address-types=InternalIP

4.2 Verification

# Wait for Pod to be ready
kubectl wait --namespace kube-system --for=condition=ready pod -l k8s-app=metrics-server --timeout=120s

# Test metric collection
kubectl top nodes
kubectl top pods

# If no data for a long time, check metrics-server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

5. Useful Exam Commands

# 1. Quickly create a Deployment with resources (HPA requires Pods to have CPU requests)
kubectl create deployment nginx --image=nginx --dry-run=client -o yaml > nginx.yaml
# Edit to add resources.requests.cpu

# 2. Apply resource settings
vim nginx.yaml
kubectl apply -f nginx.yaml

# 3. Create HPA
kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=5

# 4. Verify
kubectl get hpa
kubectl get pods -w

# 5. Create CronJob for scheduled scaling (not HPA but useful for the exam)
kubectl create cronjob scale-up --image=bitnami/kubectl --schedule="0 8 * * 1-5" -- kubectl scale deployment nginx --replicas=10
kubectl create cronjob scale-down --image=bitnami/kubectl --schedule="0 18 * * 1-5" -- kubectl scale deployment nginx --replicas=2

🧪 Complete Hands-on Example: Manual Scaling + Configure HPA Auto-scaling

Scenario

First manually scale a Deployment, then configure CPU-based HPA auto-scaling.

Prerequisites

A working Kubernetes cluster (minikube or kind recommended)
kubectl is configured to connect to the cluster
Pods in the Deployment must have CPU requests set (required by HPA)

Steps

Step 1: Create a Deployment with Resource Requests

cat <<'EOF' > nginx-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        resources:
          requests:
            cpu: "200m"
            memory: "128Mi"
        ports:
        - containerPort: 80
EOF

kubectl apply -f nginx-deploy.yaml
# Expected output: deployment.apps/nginx created

kubectl get deployment nginx
# Expected output: NAME    READY   UP-TO-DATE   AVAILABLE   AGE
#          nginx   2/2     2            2           <seconds>

Step 2: Manual Scaling

# Scale up to 5 replicas
kubectl scale deployment nginx --replicas=5
# Expected output: deployment.apps/nginx scaled

kubectl get deployment nginx
# Expected output: NAME    READY   UP-TO-DATE   AVAILABLE   AGE
#          nginx   5/5     5            5           <seconds>

# Scale down back to 2 replicas
kubectl scale deployment nginx --replicas=2
# Expected output: deployment.apps/nginx scaled

Step 3: Verify metrics-server is Installed

kubectl get pods -n kube-system | grep metrics-server
# Expected output: metrics-server-<hash>   1/1     Running   0   <time>

# If not installed, install it
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics are available
kubectl top pods
# Expected output: NAME                    CPU(cores)   MEMORY(bytes)
#          nginx-<hash>-<pod>     1m           10Mi
#          nginx-<hash>-<pod>     2m           12Mi

Step 4: Create HPA (CPU-based)

kubectl autoscale deployment nginx --cpu-percent=50 --min=2 --max=10
# Expected output: horizontalpodautoscaler.autoscaling/nginx autoscaled

kubectl get hpa nginx
# Expected output: NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
#          nginx   Deployment/nginx   0%/50%    2         10        2          <seconds>

Step 5: Generate CPU Load to Trigger Auto-scaling

# Expose a Service for the load generator to access
kubectl expose deployment nginx --port=80 --target-port=80
# Expected output: service/nginx exposed

# Start load generator
kubectl run load-generator --image=busybox --restart=Never -- /bin/sh -c "while true; do wget -q -O- http://nginx; done"
# Expected output: pod/load-generator created

# Monitor HPA (open another terminal or append &)
kubectl get hpa nginx -w
# Expected output (changes start after about 1-2 minutes):
# nginx   Deployment/nginx   0%/50%   2         10        2          2m
# nginx   Deployment/nginx   65%/50%  2         10        4          3m
# nginx   Deployment/nginx   80%/50%  2         10        5          4m
# nginx   Deployment/nginx   45%/50%  2         10        5          5m

Verification

# View HPA final state
kubectl get hpa nginx
# Expected output: CPU usage below 50% (may drop back to low after load stabilizes)

# View replica count changes
kubectl get deployment nginx
# Expected output: REPLICAS may be > 2 (auto-scaled)

# Observe scale-down after stopping the load (auto-scales back to 2 after a few minutes)
kubectl delete pod load-generator
# Expected output: pod "load-generator" deleted

# Cleanup
kubectl delete deployment nginx
kubectl delete service nginx
kubectl delete hpa nginx

Exam Tips

Pods must have CPU requests set to be recognized by HPA, otherwise HPA cannot calculate CPU utilization
kubectl autoscale is the fastest way to create HPA, suitable for the time-pressured CKA exam
HPA collects metrics every 15 seconds by default, scale-up has no delay policy, scale-down has a default 5-minute stabilization window
metrics-server is usually pre-installed in the exam environment, but it's recommended to verify with kubectl top nodes first
If kubectl top returns metrics not available yet, wait about 30-60 seconds and retry