Skip to main content

Welcome to IncidentFox

IncidentFox is an AI SRE / AI On-Call engineer that integrates with your observability stack, infrastructure, and collaboration tools to automatically investigate incidents, find root causes, and suggest fixes.

Key Features

IncidentFox uses two powerful agent runtimes:
  • OpenAI SDK Agent - Production automation with multi-agent orchestration (Planner + Specialists)
  • Claude SDK SRE Agent - Interactive debugging with Kubernetes sandbox isolation
Specialized agents include K8s, AWS, Metrics, Coding, and Investigation agents working together.
Pre-built integrations across 20+ categories:
  • Kubernetes: Pod logs, events, deployments, resource usage (9 tools)
  • AWS: EC2, Lambda, RDS, ECS, CloudWatch (8+ tools)
  • Observability: Grafana, Datadog, Prometheus, Coralogix, Sentry, New Relic (15+ tools)
  • Log Analysis: Statistics, sampling, pattern search, anomaly detection (7 tools)
  • Docker: Container logs, stats, exec, events (15 tools)
  • GitHub: Code search, PRs, issues, Actions, commits (16 tools)
  • Database: MySQL, PostgreSQL, Snowflake, BigQuery (70+ tools)
  • And more: PagerDuty, Slack, Linear, Jira, Confluence, Terraform…
Hierarchical knowledge retrieval system based on ICLR 2024 research:
  • Handles 100+ page runbooks without context loss
  • Knowledge graphs for service dependencies and ownership
  • Learns from past investigations to improve over time
  • Multi-level abstraction: procedural, factual, temporal, policy
Invoke IncidentFox from wherever your team works:
  • Slack - Mention the bot in any channel
  • GitHub - Comment on issues or PRs
  • PagerDuty - Automatic investigation on alerts
  • Incident.io - Integrated incident response
  • REST API - Programmatic access
  • Web UI - Dashboard for investigations and configuration
Intelligent analysis powered by state-of-the-art ML:
  • Anomaly Detection - Z-score, Prophet-based seasonal detection
  • Forecasting - Capacity planning with uncertainty bounds
  • Correlation Analysis - Cross-service metric relationships
  • Change Point Detection - Identify when issues started
  • Pattern Learning - Records and reuses incident patterns
Built for enterprise security and compliance:
  • SOC 2 compliant infrastructure
  • Claude Sandbox isolation with Kubernetes + gVisor
  • Credentials proxy (Envoy) - secrets never touch agent
  • SSO/OIDC authentication (Google, Azure AD, Okta)
  • Approval workflows for critical changes
  • Full audit logging
  • On-premise and air-gapped deployment options

What Can IncidentFox Do?

Incident Investigation

When an incident occurs, IncidentFox automatically:
  1. Gathers Context - Pulls logs, metrics, and recent changes from your observability stack
  2. Analyzes Root Cause - Correlates data across services to identify the issue
  3. Provides Timeline - Reconstructs what happened and when
  4. Suggests Fixes - Recommends actionable remediation steps
@incidentfox investigate why the payments service is slow

CI/CD Auto-Fix

When your CI pipeline fails, IncidentFox can:
  1. Detect Failures - Monitors GitHub Actions, CodePipeline, and other CI systems
  2. Analyze Logs - Reads test output and build errors
  3. Identify Root Cause - Correlates failures with code changes in the PR
  4. Propose Fixes - Suggests code changes to resolve the issue

Proactive Monitoring

IncidentFox can monitor your systems and alert before issues escalate:
  • Anomaly Detection - Prophet-based forecasting identifies unusual patterns
  • Correlation Analysis - Links metrics across services to find relationships
  • Knowledge Base - RAPTOR hierarchical retrieval learns from runbooks and past incidents
  • Alert Correlation - Connects Prometheus, Alertmanager, and PagerDuty alerts

Getting Started

1

Connect Your Data Sources

Configure connections to your observability stack (Coralogix, Datadog, Grafana, etc.)
2

Set Up Integrations

Connect IncidentFox to Slack, GitHub, or PagerDuty for triggering investigations
3

Configure Your Team

Customize agent prompts and enable/disable tools for your specific needs
4

Start Investigating

Mention @incidentfox in Slack or trigger via your preferred integration

Support

Need help? Contact us at support@incidentfox.ai