Mayur Rathi

⭐ 34.1k GitHub stars

AWS CloudWatch Investigation

AWS CloudWatch Investigation is an code AI skill with a core value of >. It helps developers solve real-world problems in the code domain, boosting efficiency, automating repetitive tasks, and optimizing workflows.

>

Last verified on: 2026-06-28

mkdir -p ./skills/aws-cloudwatch-investigation && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/aws-cloudwatch-investigation/SKILL.md -o ./skills/aws-cloudwatch-investigation/SKILL.md

Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).

Skill Content

# AWS CloudWatch Investigation Skill

Reusable patterns for investigating production incidents using CloudWatch Logs, Metrics, and Alarms. These patterns are designed to be composed together during incident triage.

---

Pattern 1: Logs Insights Query Templates

Error Spike Detection

Find the top errors in a time window, grouped by error type:

text

fields @timestamp, @message, @logStream
| filter @message like /(?i)(error|exception|fatal|critical)/
| stats count(*) as errorCount by bin(5m), @logStream
| sort errorCount desc
| limit 20

P99 Latency Breakdown by Operation

Identify which operations are driving latency spikes:

text

fields @timestamp, @duration, operation
| filter ispresent(@duration)
| stats avg(@duration) as avgMs,
        pct(@duration, 50) as p50Ms,
        pct(@duration, 95) as p95Ms,
        pct(@duration, 99) as p99Ms,
        count(*) as invocations
  by operation
| sort p99Ms desc
| limit 15

Lambda Cold Start Detection

Quantify cold start impact during an incident:

text

fields @timestamp, @duration, @initDuration, @memorySize, @maxMemoryUsed
| filter ispresent(@initDuration)
| stats count(*) as coldStarts,
        avg(@initDuration) as avgInitMs,
        max(@initDuration) as maxInitMs,
        avg(@duration) as avgDurationMs
  by bin(5m)
| sort @timestamp desc

Out-of-Memory (OOM) Detection

Find Lambda functions or containers killed by memory pressure:

text

fields @timestamp, @message, @logStream, @memorySize, @maxMemoryUsed
| filter @message like /Runtime exited|out of memory|OOMKilled|Cannot allocate memory|MemoryError/
| stats count(*) as oomEvents by @logStream, bin(10m)
| sort oomEvents desc
| limit 10

For memory utilization trending before OOM:

text

fields @timestamp, @maxMemoryUsed, @memorySize
| filter ispresent(@maxMemoryUsed)
| stats max(@maxMemoryUsed / @memorySize * 100) as peakMemPct,
        avg(@maxMemoryUsed / @memorySize * 100) as avgMemPct
  by bin(5m)
| sort @timestamp desc

Timeout Detection

Find invocations that hit the configured timeout:

text

fields @timestamp, @duration, @logStream, @requestId
| filter @message like /Task timed out/ or @duration > 28000
| stats count(*) as timeouts by @logStream, bin(5m)
| sort timeouts desc

---

Pattern 2: Alarm History to Deploy-Event Correlation

Process

1. **Get alarm transition time** — note the exact timestamp when the alarm entered ALARM state.

2. **Query CloudTrail** for deployment-related events in a window of [alarm_time - 30min, alarm_time]:

text

# CloudTrail Lake query for deployment events
SELECT eventTime, eventName, userIdentity.arn, requestParameters
FROM <event-data-store-id>
WHERE eventTime > '<alarm_time_minus_30m>'
  AND eventTime < '<alarm_time>'
  AND eventName IN (
    'UpdateFunctionCode', 'UpdateFunctionConfiguration',
    'UpdateService', 'CreateDeployment', 'RegisterTaskDefinition',
    'CreateChangeSet', 'ExecuteChangeSet',
    'StartPipelineExecution', 'PutImage'
  )
ORDER BY eventTime DESC

3. **Correlation criteria** — a deploy is "correlated" if:

- It targets the same service/resource as the alarm

- It completed within 15 minutes before the alarm transition

- The deployer identity matches a CI/CD role (not a human applying a hotfix)

4. **Strengthening the correlation:**

- Check if the same alarm was healthy in the previous deployment cycle

- Verify no other environmental changes (scaling events, config changes) in the same window

- Look for canary/synthetic monitor failures that started at the same time

Output Format

text

Deploy Correlation:
  Event: UpdateFunctionCode
  Time: 2024-03-15T14:23:07Z (12 min before alarm)
  Actor: arn:aws:sts::123456789012:assumed-role/github-actions-deploy/session
  Resource: arn:aws:lambda:us-east-1:123456789012:function:payment-processor
  Correlation: STRONG — same resource, CI/CD actor, alarm was OK prior cycle

---

Pattern 3: Narrow the Blast Radius Decision Tree

Use this tree to sy

🎯 Best For

GitHub Copilot users
Claude users
Software engineers
Development teams
Tech leads

💡 Use Cases

Code quality improvement
Best practice enforcement

📖 How to Use This Skill

1
Install the Skill

Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
2
Load into Your AI Assistant

Open GitHub Copilot or Claude and reference the skill. Paste the SKILL.md content or use the system prompt tab.
3
Apply AWS CloudWatch Investigation to Your Work

Open your project in the AI assistant and ask it to apply the skill. Start with a small module to verify the output quality.
4
Review and Refine

Review AI suggestions before committing. Run tests, check for regressions, and iterate on the skill output.

❓ Frequently Asked Questions

Is AWS CloudWatch Investigation compatible with Cursor and VS Code?

Yes — this skill works with any AI coding assistant including Cursor, VS Code with Copilot, and JetBrains IDEs.

Do I need specific dependencies for AWS CloudWatch Investigation?

Check the install command and Works With section. Most code skills only require the AI assistant and your codebase.

How do I install AWS CloudWatch Investigation?

Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/aws-cloudwatch-investigation/SKILL.md, ready to use.

Can I customize this skill for my team?

Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.

⚠️ Common Mistakes to Avoid

Skipping validation

Always test AI-generated code changes, even for simple refactors.

Missing dependency updates

Check if the skill requires updated dependencies or new packages.

🔗 Related Skills

00-andruia-consultant 00-andruia-consultant 007 007 10-andruia-skill-smith 10-andruia-skill-smith 20-andruia-niche-intelligence 20-andruia-niche-intelligence 2slides-ppt-generator 2slides-ppt-generator 3d-ui 3d-ui

AWS CloudWatch Investigation

Skill Content

Pattern 1: Logs Insights Query Templates

Error Spike Detection

P99 Latency Breakdown by Operation

Lambda Cold Start Detection

Out-of-Memory (OOM) Detection

Timeout Detection

Pattern 2: Alarm History to Deploy-Event Correlation

Process

Output Format

Pattern 3: Narrow the Blast Radius Decision Tree

🎯 Best For

💡 Use Cases

📖 How to Use This Skill

Install the Skill

Load into Your AI Assistant

Apply AWS CloudWatch Investigation to Your Work

Review and Refine

❓ Frequently Asked Questions

Is AWS CloudWatch Investigation compatible with Cursor and VS Code?

Do I need specific dependencies for AWS CloudWatch Investigation?

How do I install AWS CloudWatch Investigation?

Can I customize this skill for my team?

⚠️ Common Mistakes to Avoid

Skipping validation

Missing dependency updates

🔗 Related Skills