PagerDuty

Overview

PagerDuty integration enables IncidentFox to:

Automatically investigate when alerts trigger
Post findings to configured Slack channels
Provide context before the oncall engineer responds
Correlate alerts with recent deployments

Prerequisites

PagerDuty account with admin access
Webhook configuration permissions
Slack integration configured (for responses)

Setup

Step 1: Create Generic Webhook

Log in to PagerDuty
Go to Services > Select your service
Click Integrations tab
Add Generic Webhooks (v3)
Configure:
- URL: https://api.incidentfox.ai/api/pagerduty/webhook
- Events: incident.triggered, incident.acknowledged
Copy the signing secret
Save

Step 2: Add to IncidentFox

{
  "integrations": {
    "pagerduty": {
      "enabled": true,
      "webhook_secret": "vault://secrets/pagerduty-webhook-secret",
      "auto_investigate": true,
      "notification_channel": "#incidents"
    }
  }
}

Configuration Options

Option	Description	Default
`auto_investigate`	Auto-start investigation on trigger	`true`
`notification_channel`	Slack channel for findings	Required
`urgency_filter`	Only investigate high urgency	`all`
`service_filter`	List of services to investigate	All

How It Works

When a PagerDuty incident triggers:

Webhook fires to IncidentFox
Agent extracts context from alert details
Investigation runs against configured data sources
Findings posted to Slack before oncall responds

Automatic Investigation

Alert Context

IncidentFox extracts from the PagerDuty webhook:

Service name
Alert title/description
Urgency level
Custom details (if provided)

Example Flow

PagerDuty Alert:

Service: checkout-api
Title: High Error Rate on checkout-api
Urgency: High
Details: Error rate exceeded 5% threshold

IncidentFox Response (in #incidents):

PagerDuty Alert Investigation

Service: checkout-api
Alert: High Error Rate
Urgency: High

Investigation Started...

---

Findings:

Summary: Checkout API experiencing elevated 5xx errors
due to database connection issues.

Root Cause (Confidence: 91%):
• RDS connection pool exhausted
• 100/100 connections in use
• New connections failing with timeout

Evidence:
• CloudWatch: RDS connections at max (100)
• Application logs: "connection pool exhausted"
• Error spike started 5 minutes ago

Recent Changes:
• checkout-api v2.3.0 deployed 15 minutes ago
• Change: Added new batch processing job

Recommendations:
1. Check if new batch job is holding connections
2. Consider increasing RDS max_connections
3. Rollback v2.3.0 if issue persists

Response Time: 23 seconds

Service Configuration

Configure per-service investigation:

{
  "integrations": {
    "pagerduty": {
      "services": {
        "checkout-api": {
          "investigation_prompt": "Focus on database and payment gateway issues",
          "data_sources": ["coralogix", "rds", "grafana"],
          "notification_channel": "#checkout-incidents"
        },
        "payments": {
          "investigation_prompt": "Check PCI logs and card processor status",
          "data_sources": ["coralogix", "cloudwatch"],
          "notification_channel": "#payments-oncall"
        }
      }
    }
  }
}

Urgency Filtering

Only investigate high urgency alerts:

{
  "integrations": {
    "pagerduty": {
      "urgency_filter": "high"
    }
  }
}

Enriching Alert Context

Add custom details to your PagerDuty alerts for better investigations:

{
  "custom_details": {
    "service": "checkout-api",
    "environment": "production",
    "namespace": "checkout",
    "recent_deploy": "v2.3.0"
  }
}

IncidentFox will use these details to target the investigation.

Response Time

Typical investigation times:

Alert Complexity	Response Time
Single service	15-30 seconds
Multi-service	30-60 seconds
Complex correlation	60-90 seconds

IncidentFox aims to provide findings before the oncall engineer opens their laptop.

Best Practices

Add custom details to alerts for targeted investigations
Configure per-service investigation prompts
Use dedicated channels per service/team
Review investigation accuracy to improve prompts
Combine with Incident.io for full incident workflow

Troubleshooting

Webhook Not Triggering

Check webhook URL is correct
Verify signing secret matches
Check PagerDuty webhook delivery logs
Ensure incident.triggered event is enabled

Investigation Not Starting

Check service is not in filter exclude list
Verify urgency meets threshold
Review agent logs in Web UI

Slow Investigations

Reduce number of data sources queried
Add more specific investigation prompts
Check data source connectivity

Next Steps

Incident.io

Full incident workflow

Configuration

Customize investigation behavior

Getting Started

Core Concepts

Configuration

Integrations

Data Sources

Tools Catalog

Overview

Prerequisites

Setup

Step 1: Create Generic Webhook

Step 2: Add to IncidentFox

Configuration Options

How It Works

Automatic Investigation

Alert Context

Example Flow

Service Configuration

Urgency Filtering

Enriching Alert Context

Response Time

Best Practices

Troubleshooting

Webhook Not Triggering

Investigation Not Starting

Slow Investigations

Next Steps

Incident.io

Configuration

Getting Started

Core Concepts

Configuration

Integrations

Data Sources

Tools Catalog

​Overview

​Prerequisites

​Setup

​Step 1: Create Generic Webhook

​Step 2: Add to IncidentFox

​Configuration Options

​How It Works

​Automatic Investigation

​Alert Context

​Example Flow

​Service Configuration

​Urgency Filtering

​Enriching Alert Context

​Response Time

​Best Practices

​Troubleshooting

​Webhook Not Triggering

​Investigation Not Starting

​Slow Investigations

​Next Steps

Incident.io

Configuration

Overview

Prerequisites

Setup

Step 1: Create Generic Webhook

Step 2: Add to IncidentFox

Configuration Options

How It Works

Automatic Investigation

Alert Context

Example Flow

Service Configuration

Urgency Filtering

Enriching Alert Context

Response Time

Best Practices

Troubleshooting

Webhook Not Triggering

Investigation Not Starting

Slow Investigations

Next Steps