Architecture Overview
IncidentFox uses a multi-agent architecture where specialized AI agents collaborate to investigate incidents. Each agent has expertise in a specific domain and access to relevant tools.The Agents
Planner Agent
The Planner is the orchestrator. When you trigger an investigation, it:- Analyzes the request - Understands what you’re asking
- Creates a plan - Determines which agents and tools are needed
- Delegates tasks - Assigns work to specialized agents
- Synthesizes results - Combines findings into a coherent response
The Planner doesn’t execute tools directly. It coordinates other agents that have the specialized capabilities.
K8s Agent
Specializes in Kubernetes troubleshooting with 9 dedicated tools:| Tool | Description |
|---|---|
get_pod_logs | Fetch container logs from pods |
describe_pod | Get pod status, events, and configuration |
list_pods | List pods in a namespace with status |
get_pod_events | Get Kubernetes events for pods |
describe_deployment | Get deployment status and replica info |
get_deployment_history | View rollout history |
describe_service | Get service details and endpoints |
get_pod_resource_usage | CPU/memory usage metrics |
docker_exec | Execute commands in containers |
AWS Agent
Handles AWS infrastructure debugging with 8 tools:| Tool | Description |
|---|---|
describe_ec2_instance | EC2 instance details and status |
get_cloudwatch_logs | Fetch logs from CloudWatch Log Groups |
describe_lambda_function | Lambda configuration and metrics |
get_rds_instance_status | RDS database status and metrics |
query_cloudwatch_insights | Run CloudWatch Insights queries |
get_cloudwatch_metrics | Query CloudWatch metrics |
list_ecs_tasks | List ECS Fargate tasks |
describe_codepipeline | Get CodePipeline execution status |
Metrics Agent
Focuses on anomaly detection and correlation with 22 tools including:- Anomaly Detection - Prophet-based forecasting, Z-score detection
- Correlation Analysis - Find relationships between metrics
- Change Point Detection - Identify when metrics behavior changed
- Grafana Integration - Query Prometheus, view dashboards
Coding Agent
Handles code analysis and CI/CD with 15 tools:- File Operations - Read, search, and analyze code
- Git Operations - Diff, blame, log analysis
- GitHub Integration - PR analysis, code search
- Test Execution - Run tests and analyze failures
Investigation Agent
The “jack of all trades” agent with access to 30+ tools from all categories. Used for complex, cross-domain investigations that require multiple types of analysis.Investigation Flow
Here’s what happens when you trigger an investigation:1
Trigger Received
User mentions
@incidentfox in Slack with a request like “investigate high latency in payments service”2
Planner Activates
The Planner agent analyzes the request and determines:
- What systems might be involved (payments, database, etc.)
- What data sources to query (logs, metrics, recent changes)
- Which specialized agents to involve
3
Data Gathering
Specialized agents execute their tools in parallel:
- K8s Agent checks pod status and logs
- AWS Agent queries CloudWatch metrics
- Metrics Agent runs anomaly detection
4
Correlation
The Investigation Agent correlates findings:
- Timeline reconstruction
- Root cause identification
- Impact assessment
5
Response
Results are synthesized and posted back to Slack with:
- Summary of findings
- Root cause with confidence score
- Evidence (logs, metrics, events)
- Recommended actions
Configuration Inheritance
IncidentFox uses hierarchical configuration that flows from organization to team level: Each level can override settings from the level above. This allows:- Org-wide defaults - Set sensible defaults for all teams
- Group-specific settings - Configure for platform vs. application teams
- Team overrides - Fine-tune for specific team needs
Example Configuration Flow
- All three MCP servers
- The org’s base prompt
- The snowflake tool enabled
Data Flow
- Triggers send investigation requests to the Agent Runtime
- Config Service provides team-specific configuration
- Tools query external data sources
- Results flow back through the agent to the trigger source
Tool Loading
Tools are loaded dynamically based on:- Installation - Is the integration package installed?
- Configuration - Are credentials configured?
- Team Settings - Is the tool enabled for this team?
MCP Integration
IncidentFox supports the Model Context Protocol (MCP) for extending capabilities with custom tools.What is MCP?
MCP is an open protocol that allows AI agents to access external tools and data sources in a standardized way.Using MCP with IncidentFox
- Configure MCP servers in the Web UI
- Equip tools to agents via configuration
- Agents automatically use MCP tools during investigations

