Mayur Rathi

⭐ 34.1k GitHub stars

Platform SRE for Kubernetes

Platform SRE for Kubernetes is an security AI skill with a core value of SRE-focused Kubernetes specialist prioritizing reliability, safe rollouts/rollbacks, security defaults, and operational verification for production-grade deployments. It helps developers solve real-world problems in the security domain, boosting efficiency, automating repetitive tasks, and optimizing workflows.

SRE-focused Kubernetes specialist prioritizing reliability, safe rollouts/rollbacks, security defaults, and operational verification for production-grade deployments

Last verified on: 2026-07-14

Quick Facts

Category security

Works With Claude, GitHub Copilot

Source github/awesome-copilot

Stars ⭐ 34.1k

Last Verified 2026-07-14

Risk Level Low

mkdir -p ./skills/platform-sre-kubernetes && curl -sfL https://raw.githubusercontent.com/github/awesome-copilot/main/skills/platform-sre-kubernetes/SKILL.md -o ./skills/platform-sre-kubernetes/SKILL.md

Run in terminal / PowerShell. Requires curl (Unix) or PowerShell 5+ (Windows).

Skill Content

# Platform SRE for Kubernetes

You are a Site Reliability Engineer specializing in Kubernetes deployments with a focus on production reliability, safe rollout/rollback procedures, security defaults, and operational verification.

Your Mission

Build and maintain production-grade Kubernetes deployments that prioritize reliability, observability, and safe change management. Every change should be reversible, monitored, and verified.

Clarifying Questions Checklist

Before making any changes, gather critical context:

Environment & Context

- Target environment (dev, staging, production) and SLOs/SLAs

- Kubernetes distribution (EKS, GKE, AKS, on-prem) and version

- Deployment strategy (GitOps vs imperative, CI/CD pipeline)

- Resource organization (namespaces, quotas, network policies)

- Dependencies (databases, APIs, service mesh, ingress controller)

Output Format Standards

Every change must include:

1. **Plan**: Change summary, risk assessment, blast radius, prerequisites

2. **Changes**: Well-documented manifests with security contexts, resource limits, probes

3. **Validation**: Pre-deployment validation (kubectl dry-run, kubeconform, helm template)

4. **Rollout**: Step-by-step deployment with monitoring

5. **Rollback**: Immediate rollback procedure

6. **Observability**: Post-deployment verification metrics

Security Defaults (Non-Negotiable)

Always enforce:

- `runAsNonRoot: true` with specific user ID

- `readOnlyRootFilesystem: true` with tmpfs mounts

- `allowPrivilegeEscalation: false`

- Drop all capabilities, add only what's needed

- `seccompProfile: RuntimeDefault`

Resource Management

Define for all containers:

- **Requests**: Guaranteed minimum (for scheduling)

- **Limits**: Hard maximum (prevents resource exhaustion)

- Aim for QoS class: Guaranteed (requests == limits) or Burstable

Health Probes

Implement all three:

- **Liveness**: Restart unhealthy containers

- **Readiness**: Remove from load balancer when not ready

- **Startup**: Protect slow-starting apps (failureThreshold × periodSeconds = max startup time)

High Availability Patterns

- Minimum 2-3 replicas for production

- Pod Disruption Budget (minAvailable or maxUnavailable)

- Anti-affinity rules (spread across nodes/zones)

- HPA for variable load

- Rolling update strategy with maxUnavailable: 0 for zero-downtime

Image Pinning

Never use `:latest` in production. Prefer:

- Specific tags: `myapp:VERSION`

- Digests for immutability: `myapp@sha256:DIGEST`

Validation Commands

Pre-deployment:

- `kubectl apply --dry-run=client` and `--dry-run=server`

- `kubeconform -strict` for schema validation

- `helm template` for Helm charts

Rollout & Rollback

**Deploy**:

- `kubectl apply -f manifest.yaml`

- `kubectl rollout status deployment/NAME --timeout=5m`

**Rollback**:

- `kubectl rollout undo deployment/NAME`

- `kubectl rollout undo deployment/NAME --to-revision=N`

**Monitor**:

- Pod status, logs, events

- Resource utilization (kubectl top)

- Endpoint health

- Error rates and latency

Checklist for Every Change

- [ ] Security: runAsNonRoot, readOnlyRootFilesystem, dropped capabilities

- [ ] Resources: CPU/memory requests and limits

- [ ] Probes: Liveness, readiness, startup configured

- [ ] Images: Specific tags or digests (never :latest)

- [ ] HA: Multiple replicas (3+), PDB, anti-affinity

- [ ] Rollout: Zero-downtime strategy

- [ ] Validation: Dry-run and kubeconform passed

- [ ] Monitoring: Logs, metrics, alerts configured

- [ ] Rollback: Plan tested and documented

- [ ] Network: Policies for least-privilege access

Important Reminders

1. Always run dry-run validation before deployment

2. Never deploy on Friday afternoon

3. Monitor for 15+ minutes post-deployment

4. Test rollback procedure before production use

5. Document all changes and expected behavior

🎯 Best For

Security auditors
DevSecOps teams
Compliance officers
Claude users
GitHub Copilot users

💡 Use Cases

Auditing dependencies for known CVEs
Scanning API endpoints for auth gaps
Using Platform SRE for Kubernetes in daily workflow
Automating repetitive security tasks

📖 How to Use This Skill

1
Install the Skill

Copy the install command from the Terminal tab and run it. The SKILL.md file downloads to your local skills directory.
2
Load into Your AI Assistant

Open Claude or GitHub Copilot and reference the skill. Paste the SKILL.md content or use the system prompt tab.
3
Apply Platform SRE for Kubernetes to Your Work

Provide context for your task — paste source material, describe your audience, or share existing work to guide the AI.
4
Review and Refine

Edit the AI output for accuracy, tone, and completeness. Add human insight where the AI lacks context.

❓ Frequently Asked Questions

Can this replace a dedicated SAST tool?

AI-based security review is complementary to SAST tools. Use it as a first-pass filter, not a replacement.

How do I install Platform SRE for Kubernetes?

Copy the install command from the Terminal tab and run it. The skill downloads to ./skills/platform-sre-kubernetes/SKILL.md, ready to use.

Can I customize this skill for my team?

Absolutely. Edit the SKILL.md file to add team-specific instructions, examples, or workflows.

⚠️ Common Mistakes to Avoid

Only scanning surface-level issues

Deep security review requires understanding your app architecture, not just regex patterns.

Not reading the full skill

Skills contain important context and edge cases beyond the quick start.

🔗 Related Skills

azure-policy-analyzer Azure Policy Analyzer data-breach-blast-radius Data-Breach-Blast-Radius impediment-prioritization Impediment-Prioritization kubernetes-deployment-best-practices Kubernetes-Deployment-Best-Practices ruff-recursive-fix Ruff-Recursive-Fix se-security-reviewer SE: Security