Skip to main content

Overview

The Kubernetes tools enable IncidentFox to troubleshoot pods, deployments, and services in your clusters.

Configuration

{
  "tools": {
    "kubernetes": {
      "enabled": true,
      "kubeconfig_path": "~/.kube/config",
      "default_namespace": "production",
      "default_context": "prod-cluster"
    }
  }
}

Available Tools

get_pod_logs

Fetch logs from a pod. Parameters:
ParameterTypeRequiredDescription
pod_namestringYesPod name or pattern
namespacestringNoNamespace (uses default)
containerstringNoContainer name
tail_linesintNoNumber of lines (default: 100)
sincestringNoTime duration (e.g., “1h”)
Example:
@incidentfox get logs from cart-pod in production namespace, last 50 lines
Response:
{
  "pod": "cart-7f9d8b6c4f-abc12",
  "container": "cart",
  "logs": [
    "2024-01-15T10:30:00Z INFO Starting cart service",
    "2024-01-15T10:30:01Z ERROR Connection refused to redis"
  ]
}

describe_pod

Get pod status and configuration. Parameters:
ParameterTypeRequiredDescription
pod_namestringYesPod name
namespacestringNoNamespace
Example:
@incidentfox describe cart-pod in production
Response:
{
  "name": "cart-7f9d8b6c4f-abc12",
  "namespace": "production",
  "status": "Running",
  "node": "ip-10-0-1-123.ec2.internal",
  "ip": "10.0.1.45",
  "containers": [
    {
      "name": "cart",
      "image": "acme/cart:v2.3.0",
      "state": "Running",
      "restarts": 0
    }
  ],
  "conditions": [
    {"type": "Ready", "status": "True"},
    {"type": "ContainersReady", "status": "True"}
  ]
}

list_pods

List pods in a namespace with status. Parameters:
ParameterTypeRequiredDescription
namespacestringNoNamespace
label_selectorstringNoLabel filter (e.g., “app=cart”)
field_selectorstringNoField filter
Example:
@incidentfox list pods in production namespace with app=checkout label

get_pod_events

Get Kubernetes events for a pod or namespace. Parameters:
ParameterTypeRequiredDescription
namestringNoResource name
namespacestringNoNamespace
typestringNoNormal, Warning
Example:
@incidentfox get warning events for cart pods
Response:
{
  "events": [
    {
      "type": "Warning",
      "reason": "BackOff",
      "message": "Back-off restarting failed container",
      "last_timestamp": "2024-01-15T10:30:00Z",
      "count": 5
    }
  ]
}

describe_deployment

Get deployment status and configuration. Parameters:
ParameterTypeRequiredDescription
deployment_namestringYesDeployment name
namespacestringNoNamespace
Example:
@incidentfox describe checkout deployment

get_deployment_history

View rollout history. Parameters:
ParameterTypeRequiredDescription
deployment_namestringYesDeployment name
namespacestringNoNamespace
Example:
@incidentfox show rollout history for payments deployment

describe_service

Get service details and endpoints. Parameters:
ParameterTypeRequiredDescription
service_namestringYesService name
namespacestringNoNamespace

get_pod_resource_usage

Get CPU/memory usage for pods. Parameters:
ParameterTypeRequiredDescription
namespacestringNoNamespace
pod_namestringNoSpecific pod
Requires metrics-server installed in the cluster.
Example:
@incidentfox check resource usage for checkout pods
Response:
{
  "pods": [
    {
      "name": "checkout-abc12",
      "cpu": "250m",
      "memory": "512Mi",
      "cpu_request": "200m",
      "memory_request": "256Mi",
      "cpu_limit": "500m",
      "memory_limit": "1Gi"
    }
  ]
}

docker_exec

Execute commands in containers. Parameters:
ParameterTypeRequiredDescription
pod_namestringYesPod name
namespacestringNoNamespace
containerstringNoContainer name
commandstringYesCommand to execute
This tool may be disabled by default for security. Enable only if needed.

Use Cases

Investigating Pod Crashes

@incidentfox why is the cart pod crashing?
IncidentFox will:
  1. list_pods - Check pod status
  2. get_pod_events - Find crash reasons
  3. get_pod_logs - Read logs before crash
  4. describe_pod - Check configuration

Checking Resource Issues

@incidentfox check if checkout pods have resource issues
IncidentFox will:
  1. get_pod_resource_usage - Current usage
  2. get_pod_events - OOMKilled events
  3. describe_deployment - Configured limits

Verifying Deployments

@incidentfox verify the latest deployment of payments service
IncidentFox will:
  1. describe_deployment - Check status
  2. get_deployment_history - Recent rollouts
  3. list_pods - Pod status

Required RBAC

rules:
- apiGroups: [""]
  resources: ["pods", "pods/log", "services", "events"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["metrics.k8s.io"]
  resources: ["pods"]
  verbs: ["get", "list"]

Next Steps