Failure diagnostics¶

kubeagent scans your cluster, finds unhealthy pods, and explains why they are failing — covering the most common pod failure modes.

Read-only operation¶

kubeagent talks to the cluster directly via the official Kubernetes Go client (client-go) — the same library kubectl and operators use — and operates read-only. It never creates, updates, patches, or deletes cluster resources.

Failure modes detected¶

CrashLoopBackOff¶

The container keeps restarting. Kubernetes backs off exponentially between attempts. kubeagent surfaces the exit code and last termination reason so you can spot crash loops without tailing logs manually.

ImagePullBackOff / ErrImagePull¶

The image cannot be pulled — either the image tag does not exist or the node lacks credentials for the registry. kubeagent reports the image reference and the pull error from the pod's conditions.

OOMKilled¶

The container exceeded its memory limit and was killed by the kernel OOM killer. kubeagent annotates the finding with the container's configured requests and limits (see Resource context) so you can judge whether to raise the limit or reduce memory pressure.

Pending / Unschedulable¶

No node can place the pod. This covers insufficient CPU or memory, a missing taint toleration, an unsatisfied node affinity, or no nodes at all. kubeagent reports the scheduler message from the pod's events.

Status¶

kubeagent scan performs a read-only, whole-cluster scan and reports CrashLoopBackOff, ImagePullBackOff/ErrImagePull, OOMKilled, and Pending/Unschedulable pods, in text or JSON.

The optional --explain flag makes a single Claude API call to summarize findings in plain English. The deterministic core still works offline with no API key.

Example output¶

P2 — Workload issues

  NAMESPACE   NAME               KIND        READY   STATUS              RESTARTS
  staging     api-server         Deployment  0/2     CrashLoopBackOff    47
  staging     image-builder      Deployment  0/1     ImagePullBackOff    0
  production  worker             Deployment  0/3     OOMKilled           12
  production  batch-processor    Job         0/1     Pending             0