Kubernetes Agent

Overview

Connect your on-premise or private Kubernetes clusters to IncidentFox SaaS without firewall changes. IncidentFox uses an outbound agent pattern to access your private Kubernetes clusters:

Your Kubernetes Cluster              IncidentFox SaaS
====================                 ================

┌──────────────────┐                ┌──────────────────┐
│ incidentfox-     │   outbound     │  K8s Gateway     │
│ k8s-agent        │───────────────>│  Service         │
│ (Helm chart)     │   HTTPS/SSE    │                  │
└────────┬─────────┘                └────────┬─────────┘
         │                                   │
         ▼                                   ▼
┌──────────────────┐                ┌──────────────────┐
│ K8s API Server   │                │ AI Agent         │
│ (your cluster)   │                │ (investigations) │
└──────────────────┘                └──────────────────┘

Key benefits:

No inbound firewall rules needed
Agent connects outbound to IncidentFox (port 443)
You control RBAC permissions via Helm values
Multiple clusters supported per team

Prerequisites

Before you start:

IncidentFox SaaS account with a team created
Kubernetes cluster (v1.24+)
kubectl configured and able to access your cluster
helm v3.x installed
Outbound HTTPS access to ui.incidentfox.ai (or your self-hosted gateway)

Setup

Generate API Key

Log in to the IncidentFox dashboard
Navigate to Settings → Integrations → Kubernetes
Click “Add Cluster”
Enter a Cluster Name (e.g., prod-us-east-1, staging)
Click “Generate API Key”
Copy the API key (starts with ixfx_k8s_) — you won’t see it again!

The API key authenticates your agent with IncidentFox. Each cluster needs its own key.

Add the Helm Repository

helm repo add incidentfox https://charts.incidentfox.ai
helm repo update

Install the Agent

Create a namespace and install the agent:

# Create namespace
kubectl create namespace incidentfox

# Install the agent
helm install incidentfox-agent incidentfox/incidentfox-k8s-agent \
  --namespace incidentfox \
  --set apiKey=ixfx_k8s_YOUR_API_KEY \
  --set clusterName=prod-us-east-1

Configuration options:

Parameter	Description	Default
`apiKey`	API key from Step 1 (required)	—
`clusterName`	Name shown in IncidentFox dashboard	—
`gatewayUrl`	IncidentFox gateway URL	`https://orchestrator.incidentfox.ai/gateway`
`replicaCount`	Number of agent replicas	`1`
`logLevel`	Logging verbosity (`DEBUG`, `INFO`, `WARNING`)	`INFO`

Verify Connection

Check agent pod is running:

kubectl get pods -n incidentfox

You should see:

NAME                                READY   STATUS    RESTARTS   AGE
incidentfox-agent-xxx-yyy           1/1     Running   0          30s

Check agent logs for successful connection:

kubectl logs -n incidentfox -l app.kubernetes.io/name=incidentfox-k8s-agent

Look for:

{"event": "connected_to_gateway", "cluster_name": "prod-us-east-1"}

Verify in dashboard:
- Go to Settings → Integrations → Kubernetes
- Your cluster should show Status: Connected

Usage

Once connected, ask IncidentFox about your cluster:

@incidentfox show me failing pods in prod-us-east-1
@incidentfox what's happening with deployment nginx in staging?
@incidentfox get logs from pod api-server-xxx in production

If you have multiple clusters, specify which one:

@incidentfox list pods in namespace payments on cluster prod-us-east-1

RBAC Permissions

The agent uses a ClusterRole to access Kubernetes resources. By default, it has read-only access to:

Resource	Permissions
Pods	get, list, watch
Pod logs	get
Deployments	get, list, watch
ReplicaSets	get, list, watch
Services	get, list, watch
Nodes	get, list, watch
Events	get, list, watch
ConfigMaps	get, list, watch
Namespaces	get, list

Customizing RBAC

To restrict or expand permissions, use Helm values:

# values.yaml
rbac:
  # Only allow access to specific namespaces
  namespaceRestriction:
    enabled: true
    namespaces:
      - production
      - staging

  # Add custom rules
  additionalRules:
    - apiGroups: ["apps"]
      resources: ["statefulsets"]
      verbs: ["get", "list", "watch"]

Apply with:

helm upgrade incidentfox-agent incidentfox/incidentfox-k8s-agent \
  --namespace incidentfox \
  -f values.yaml

Managing Multiple Clusters

Add multiple clusters by repeating the setup for each:

Generate a new API key for each cluster
Install the agent with a unique release name:

# Production cluster
helm install incidentfox-agent-prod incidentfox/incidentfox-k8s-agent \
  --namespace incidentfox \
  --set apiKey=ixfx_k8s_PROD_KEY \
  --set clusterName=prod-us-east-1

# Staging cluster (in a different cluster context)
helm install incidentfox-agent-staging incidentfox/incidentfox-k8s-agent \
  --namespace incidentfox \
  --set apiKey=ixfx_k8s_STAGING_KEY \
  --set clusterName=staging

In the dashboard, you’ll see all connected clusters and can query any of them.

Revoking Access

To disconnect a cluster:

Uninstall the agent:

helm uninstall incidentfox-agent -n incidentfox

Revoke the API key in the dashboard:
- Go to Settings → Integrations → Kubernetes
- Find the cluster and click “Revoke”

Revoking the key immediately disconnects the agent, even if it’s still running.

Troubleshooting

Agent not connecting

Check pod status:

kubectl describe pod -n incidentfox -l app.kubernetes.io/name=incidentfox-k8s-agent

Common issues:

Symptom	Cause	Solution
`ImagePullBackOff`	Can’t pull agent image	Check network/registry access
`CrashLoopBackOff`	Invalid API key	Verify API key in secret
`Running` but not connected	Network blocked	Allow outbound HTTPS to gateway

Check logs:

kubectl logs -n incidentfox -l app.kubernetes.io/name=incidentfox-k8s-agent --tail=100

Connection drops frequently

The agent automatically reconnects with exponential backoff. Frequent disconnections may indicate:

Unstable network connection
Gateway maintenance (check status.incidentfox.ai)
Resource constraints on the agent pod

Check resource usage:

kubectl top pod -n incidentfox

Increase resources if needed:

helm upgrade incidentfox-agent incidentfox/incidentfox-k8s-agent \
  --namespace incidentfox \
  --set resources.requests.memory=256Mi \
  --set resources.limits.memory=512Mi

Permission denied errors

If IncidentFox reports permission errors when querying resources:

Check the ClusterRole exists:

kubectl get clusterrole incidentfox-agent

Verify ClusterRoleBinding:

kubectl get clusterrolebinding incidentfox-agent

Test permissions manually:

kubectl auth can-i list pods --as=system:serviceaccount:incidentfox:incidentfox-agent

Security

Concern	How we address it
API key security	Keys are hashed with SHA-256 + pepper; plaintext never stored
Transport	All traffic encrypted via TLS (HTTPS)
Agent permissions	You control RBAC; default is read-only
Multi-tenant isolation	Each team’s clusters are isolated; agents can only access their team’s data
Audit logging	All commands from IncidentFox are logged

Support

Email: support@incidentfox.ai
Documentation: docs.incidentfox.ai
Status: status.incidentfox.ai

Next Steps

Kubernetes Tools

Learn about Kubernetes tool capabilities

Slack

Set up Slack bot

GitHub

Configure GitHub integration

Configuration

Customize agent behavior

Getting Started

Core Concepts

Configuration

Integrations

Data Sources

Tools Catalog

Overview

Prerequisites

Setup

Usage

RBAC Permissions

Customizing RBAC

Managing Multiple Clusters

Revoking Access

Troubleshooting

Agent not connecting

Connection drops frequently

Permission denied errors

Security

Support

Next Steps

Kubernetes Tools

Slack

GitHub

Configuration

Getting Started

Core Concepts

Configuration

Integrations

Data Sources

Tools Catalog

​Overview

​Prerequisites

​Setup

​Usage

​RBAC Permissions

​Customizing RBAC

​Managing Multiple Clusters

​Revoking Access

​Troubleshooting

​Agent not connecting

​Connection drops frequently

​Permission denied errors

​Security

​Support

​Next Steps

Kubernetes Tools

Slack

GitHub

Configuration

Overview

Prerequisites

Setup

Usage

RBAC Permissions

Customizing RBAC

Managing Multiple Clusters

Revoking Access

Troubleshooting

Agent not connecting

Connection drops frequently

Permission denied errors

Security

Support

Next Steps