Skip to content

Failure diagnostics

kubeagent scans your cluster, finds unhealthy pods, and explains why they are failing — covering the most common pod failure modes.

Read-only operation

kubeagent talks to the cluster directly via the official Kubernetes Go client (client-go) — the same library kubectl and operators use — and operates read-only. It never creates, updates, patches, or deletes cluster resources.

Failure modes detected

CrashLoopBackOff

The container keeps restarting. Kubernetes backs off exponentially between attempts. kubeagent surfaces the exit code and last termination reason so you can spot crash loops without tailing logs manually.

ImagePullBackOff / ErrImagePull

The image cannot be pulled — either the image tag does not exist or the node lacks credentials for the registry. kubeagent reports the image reference and the pull error from the pod's conditions.

OOMKilled

The container exceeded its memory limit and was killed by the kernel OOM killer. kubeagent annotates the finding with the container's configured requests and limits (see Resource context) so you can judge whether to raise the limit or reduce memory pressure.

Pending / Unschedulable

No node can place the pod. This covers insufficient CPU or memory, a missing taint toleration, an unsatisfied node affinity, or no nodes at all. kubeagent reports the scheduler message from the pod's events.

Status

kubeagent scan performs a read-only, whole-cluster scan and reports CrashLoopBackOff, ImagePullBackOff/ErrImagePull, OOMKilled, and Pending/Unschedulable pods, in text or JSON.

The optional --explain flag makes a single Claude API call to summarize findings in plain English. The deterministic core still works offline with no API key.

Example output

P2 — Workload issues

  NAMESPACE   NAME               KIND        READY   STATUS              RESTARTS
  staging     api-server         Deployment  0/2     CrashLoopBackOff    47
  staging     image-builder      Deployment  0/1     ImagePullBackOff    0
  production  worker             Deployment  0/3     OOMKilled           12
  production  batch-processor    Job         0/1     Pending             0