Overview
PagerDuty integration enables IncidentFox to:
- Automatically investigate when alerts trigger
- Post findings to configured Slack channels
- Provide context before the oncall engineer responds
- Correlate alerts with recent deployments
Prerequisites
- PagerDuty account with admin access
- Webhook configuration permissions
- Slack integration configured (for responses)
Setup
Step 1: Create Generic Webhook
- Log in to PagerDuty
- Go to Services > Select your service
- Click Integrations tab
- Add Generic Webhooks (v3)
- Configure:
- URL:
https://api.incidentfox.ai/api/pagerduty/webhook
- Events:
incident.triggered, incident.acknowledged
- Copy the signing secret
- Save
Step 2: Add to IncidentFox
{
"integrations": {
"pagerduty": {
"enabled": true,
"webhook_secret": "vault://secrets/pagerduty-webhook-secret",
"auto_investigate": true,
"notification_channel": "#incidents"
}
}
}
Configuration Options
| Option | Description | Default |
|---|
auto_investigate | Auto-start investigation on trigger | true |
notification_channel | Slack channel for findings | Required |
urgency_filter | Only investigate high urgency | all |
service_filter | List of services to investigate | All |
How It Works
When a PagerDuty incident triggers:
- Webhook fires to IncidentFox
- Agent extracts context from alert details
- Investigation runs against configured data sources
- Findings posted to Slack before oncall responds
Automatic Investigation
Alert Context
IncidentFox extracts from the PagerDuty webhook:
- Service name
- Alert title/description
- Urgency level
- Custom details (if provided)
Example Flow
PagerDuty Alert:
Service: checkout-api
Title: High Error Rate on checkout-api
Urgency: High
Details: Error rate exceeded 5% threshold
IncidentFox Response (in #incidents):
PagerDuty Alert Investigation
Service: checkout-api
Alert: High Error Rate
Urgency: High
Investigation Started...
---
Findings:
Summary: Checkout API experiencing elevated 5xx errors
due to database connection issues.
Root Cause (Confidence: 91%):
• RDS connection pool exhausted
• 100/100 connections in use
• New connections failing with timeout
Evidence:
• CloudWatch: RDS connections at max (100)
• Application logs: "connection pool exhausted"
• Error spike started 5 minutes ago
Recent Changes:
• checkout-api v2.3.0 deployed 15 minutes ago
• Change: Added new batch processing job
Recommendations:
1. Check if new batch job is holding connections
2. Consider increasing RDS max_connections
3. Rollback v2.3.0 if issue persists
Response Time: 23 seconds
Service Configuration
Configure per-service investigation:
{
"integrations": {
"pagerduty": {
"services": {
"checkout-api": {
"investigation_prompt": "Focus on database and payment gateway issues",
"data_sources": ["coralogix", "rds", "grafana"],
"notification_channel": "#checkout-incidents"
},
"payments": {
"investigation_prompt": "Check PCI logs and card processor status",
"data_sources": ["coralogix", "cloudwatch"],
"notification_channel": "#payments-oncall"
}
}
}
}
}
Urgency Filtering
Only investigate high urgency alerts:
{
"integrations": {
"pagerduty": {
"urgency_filter": "high"
}
}
}
Enriching Alert Context
Add custom details to your PagerDuty alerts for better investigations:
{
"custom_details": {
"service": "checkout-api",
"environment": "production",
"namespace": "checkout",
"recent_deploy": "v2.3.0"
}
}
IncidentFox will use these details to target the investigation.
Response Time
Typical investigation times:
| Alert Complexity | Response Time |
|---|
| Single service | 15-30 seconds |
| Multi-service | 30-60 seconds |
| Complex correlation | 60-90 seconds |
IncidentFox aims to provide findings before the oncall engineer opens their laptop.
Best Practices
- Add custom details to alerts for targeted investigations
- Configure per-service investigation prompts
- Use dedicated channels per service/team
- Review investigation accuracy to improve prompts
- Combine with Incident.io for full incident workflow
Troubleshooting
Webhook Not Triggering
- Check webhook URL is correct
- Verify signing secret matches
- Check PagerDuty webhook delivery logs
- Ensure incident.triggered event is enabled
Investigation Not Starting
- Check service is not in filter exclude list
- Verify urgency meets threshold
- Review agent logs in Web UI
Slow Investigations
- Reduce number of data sources queried
- Add more specific investigation prompts
- Check data source connectivity
Next Steps