Operations

Something is broken. The pod is not running. Traffic is not reaching your service. You don't know where to start.

This page is a diagnostic guide for the most common failures. Every section follows the same pattern: what you see → why it happens → how to find the cause → how to fix it.

CrashLoopBackOff

What you see:

NAME        READY   STATUS             RESTARTS   AGE
my-app      0/1     CrashLoopBackOff   5          3m

What it means: The container starts, crashes, Kubernetes restarts it, it crashes again. The BackOff part means Kubernetes is increasing the delay between restarts to avoid hammering a broken container.

How to diagnose:

kubectl logs my-app
kubectl logs my-app --previous          # logs from last crash
kubectl describe pod my-app | grep -A 10 "Last State"

Common causes:

Exit code	Likely cause
1	Application error — check logs for stack trace
137	OOMKilled — container exceeded memory limit
139	Segfault
1 with "can't open file"	Wrong entrypoint or missing file in image

Fixes:

Application crash: fix the bug shown in logs
OOMKilled: increase memory limit
Missing env var or config: verify ConfigMap/Secret is mounted correctly

Pod stuck in Pending

What you see:

NAME        READY   STATUS    RESTARTS   AGE
my-app      0/1     Pending   0          5m

What it means: The scheduler cannot find a node to place this pod.

How to diagnose:

kubectl describe pod my-app

Look at the Events section at the bottom. It will tell you exactly why scheduling failed.

Common causes and fixes:

Event message	Cause	Fix
`Insufficient cpu`	No node has enough CPU	Reduce cpu request, or add nodes
`Insufficient memory`	No node has enough memory	Reduce memory request, or add nodes
`0/1 nodes are available: 1 node has taints`	Node taint blocks scheduling	Add toleration or use different node
`no PersistentVolumes available`	PVC can't bind	Check `kubectl get pv` and `kubectl get pvc`

Pod stuck in ImagePullBackOff

What you see:

NAME        READY   STATUS             RESTARTS   AGE
my-app      0/1     ImagePullBackOff   0          2m

How to diagnose:

kubectl describe pod my-app | grep -A 5 Events

Common causes and fixes:

Image does not exist: fix the image name and tag
Private registry without credentials: create an imagePullSecret
Typo in tag: nginx:lates fails; nginx:latest works
Registry rate limited: Docker Hub has pull limits; authenticate or use a mirror

Service not routing traffic

How to diagnose:

kubectl get service my-service
kubectl get endpoints my-service

If ENDPOINTS shows <none>, the service selector does not match any pod labels.

kubectl describe service my-service | grep Selector
kubectl get pods --show-labels

Fix: Make the pod labels match the service selector exactly. Labels are case-sensitive.

Container running but app not responding

Pod is Running, readiness probe is failing, service has no endpoints.

# Test the app from inside the container
kubectl exec -it my-app -- sh
# Inside: curl localhost:8080

If the app responds locally but not via the service: service port or selector is wrong. If the app does not respond locally: it has not started or is listening on the wrong port.

Diagnostic flow

Quick reference

# Three commands that solve 80% of problems
kubectl describe pod <name>                       # events, config, state
kubectl logs <name> --previous                    # last crash logs
kubectl get endpoints <service>                   # is traffic routing?

# Additional tools
kubectl get events --sort-by='.lastTimestamp'     # cluster-wide event log
kubectl top pods                                   # CPU/memory usage
kubectl exec -it <name> -- sh                     # shell into container
kubectl port-forward pod/<name> 8080:8080         # test without a service
kubectl get pod <name> -o yaml                    # full spec as applied

CrashLoopBackOff​

Pod stuck in Pending​

Pod stuck in ImagePullBackOff​

Service not routing traffic​

Container running but app not responding​

Diagnostic flow​

Quick reference​

CrashLoopBackOff

Pod stuck in Pending

Pod stuck in ImagePullBackOff

Service not routing traffic

Container running but app not responding

Diagnostic flow

Quick reference