Skip to main content

Operations

Something is broken. The pod is not running. Traffic is not reaching your service. You don't know where to start.

This page is a diagnostic guide for the most common failures. Every section follows the same pattern: what you see → why it happens → how to find the cause → how to fix it.


CrashLoopBackOff

What you see:

NAME        READY   STATUS             RESTARTS   AGE
my-app 0/1 CrashLoopBackOff 5 3m

What it means: The container starts, crashes, Kubernetes restarts it, it crashes again. The BackOff part means Kubernetes is increasing the delay between restarts to avoid hammering a broken container.

How to diagnose:

kubectl logs my-app
kubectl logs my-app --previous # logs from last crash
kubectl describe pod my-app | grep -A 10 "Last State"

Common causes:

Exit codeLikely cause
1Application error — check logs for stack trace
137OOMKilled — container exceeded memory limit
139Segfault
1 with "can't open file"Wrong entrypoint or missing file in image

Fixes:

  • Application crash: fix the bug shown in logs
  • OOMKilled: increase memory limit
  • Missing env var or config: verify ConfigMap/Secret is mounted correctly

Pod stuck in Pending

What you see:

NAME        READY   STATUS    RESTARTS   AGE
my-app 0/1 Pending 0 5m

What it means: The scheduler cannot find a node to place this pod.

How to diagnose:

kubectl describe pod my-app

Look at the Events section at the bottom. It will tell you exactly why scheduling failed.

Common causes and fixes:

Event messageCauseFix
Insufficient cpuNo node has enough CPUReduce cpu request, or add nodes
Insufficient memoryNo node has enough memoryReduce memory request, or add nodes
0/1 nodes are available: 1 node has taintsNode taint blocks schedulingAdd toleration or use different node
no PersistentVolumes availablePVC can't bindCheck kubectl get pv and kubectl get pvc

Pod stuck in ImagePullBackOff

What you see:

NAME        READY   STATUS             RESTARTS   AGE
my-app 0/1 ImagePullBackOff 0 2m

How to diagnose:

kubectl describe pod my-app | grep -A 5 Events

Common causes and fixes:

  • Image does not exist: fix the image name and tag
  • Private registry without credentials: create an imagePullSecret
  • Typo in tag: nginx:lates fails; nginx:latest works
  • Registry rate limited: Docker Hub has pull limits; authenticate or use a mirror

Service not routing traffic

How to diagnose:

kubectl get service my-service
kubectl get endpoints my-service

If ENDPOINTS shows <none>, the service selector does not match any pod labels.

kubectl describe service my-service | grep Selector
kubectl get pods --show-labels

Fix: Make the pod labels match the service selector exactly. Labels are case-sensitive.


Container running but app not responding

Pod is Running, readiness probe is failing, service has no endpoints.

# Test the app from inside the container
kubectl exec -it my-app -- sh
# Inside: curl localhost:8080

If the app responds locally but not via the service: service port or selector is wrong. If the app does not respond locally: it has not started or is listening on the wrong port.


Diagnostic flow


Quick reference

# Three commands that solve 80% of problems
kubectl describe pod <name> # events, config, state
kubectl logs <name> --previous # last crash logs
kubectl get endpoints <service> # is traffic routing?

# Additional tools
kubectl get events --sort-by='.lastTimestamp' # cluster-wide event log
kubectl top pods # CPU/memory usage
kubectl exec -it <name> -- sh # shell into container
kubectl port-forward pod/<name> 8080:8080 # test without a service
kubectl get pod <name> -o yaml # full spec as applied