21 Agent Skills for DevOps & Platform
1 stacks
Skills for CI/CD, infrastructure, observability, incident response, and platform engineering.
Pipeline design, secrets management, runbooks, cloud infrastructure, and the platform engineering workflow from deployment to incident postmortem.
Read the guide: The best Agent Skills for devops & platform →
New to Agent Skills? Learn how to install one in under a minute →
Infrastructure work is complex enough without the documentation debt. Runbooks don't write themselves, incident postmortems don't get completed, and onboarding docs for new engineers lag behind the actual state of the system. These skills fix that.
The skills here cover CI/CD pipeline design, secrets management, cloud infrastructure setup, runbook writing, incident response workflows, observability setup, and platform engineering documentation. They're built for the workflows that happen around the code, not just in it.
Most useful for platform engineers managing shared infrastructure and DevOps engineers who need consistent operational docs across a growing stack.
Stacks for devops & platform
All stacks →Skills for devops & platform
All skills →AWS CDK Development
by @zxkane
AWS CDK expert skill for building cloud infrastructure with TypeScript or Python using best-practice CDK patterns.
AWS Cost Operations
by @zxkane
AWS cost optimization and operations skill for pricing analysis, CloudWatch monitoring, budget review, and operational excellence.
AWS Solution Architect
by @alirezarezvani
Cloud infrastructure design and optimization on AWS — VPCs, IAM, compute, databases, serverless, and cost optimization from a certified architect perspective.
CI/CD Pipeline Builder
by @alirezarezvani
Build production CI/CD pipelines for GitHub Actions, GitLab CI, and CircleCI — from lint and test to deploy with environment promotion and rollbacks.
Engineering Deploy Checklist
by @anthropics
Pre-deployment verification checklist. Use when about to ship a release, deploying a change with database migrations or feature flags, verifying CI status and approvals before going to production, or documenting rollback triggers ahead of time.
Engineering Incident Response
by @anthropics
Run an incident response workflow — triage, communicate, and write postmortem. Trigger with "we have an incident", "production is down", an alert that needs severity assessment, a status update mid-incident, or when writing a blameless postmortem after resolution.
Environment & Secrets Manager
by @alirezarezvani
Design secure secrets management workflows — vaults, rotation policies, environment variable hygiene, and developer-friendly secret distribution.
gstack: Post-Deploy Canary Monitor
by @garrytan
Watches the live app after a deploy for console errors, performance regressions, and page failures. Takes periodic screenshots, compares against pre-deploy baselines, and alerts on anomalies.
gstack: Destructive Command Guardrails
by @garrytan
Warns before running rm -rf, DROP TABLE, force-push, git reset --hard, kubectl delete, and similar destructive operations. You can override each warning. Scoped to the current session.
gstack: Chief Security Officer Audit
by @garrytan
Infrastructure-first security audit: secrets archaeology, dependency supply chain, CI/CD security, OWASP Top 10, and STRIDE threat modelling. Zero noise — 8/10 confidence gate, 17 false positive exclusions. Every finding includes a concrete exploit scenario.
gstack: Edit Scope Lock
by @garrytan
Restricts all file edits to a single directory for the session. Blocks Edit and Write operations outside the allowed path — prevents accidentally changing unrelated code while debugging.
gstack: Full Safety Mode
by @garrytan
Combines /careful (warns before destructive commands) and /freeze (locks edits to one directory) in a single command. Maximum safety for production work or high-stakes debugging.
gstack: Land and Deploy
by @garrytan
Merges the PR, waits for CI to pass, deploys to production, and verifies production health via canary checks. One command from approved PR to verified live deploy.
gstack: Deployment Configurator
by @garrytan
One-time setup for /land-and-deploy. Detects your deploy platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, custom), production URL, and health check endpoints.
Incident Commander
by @alirezarezvani
Lead incident response from detection to resolution — coordinate teams, run war rooms, draft status updates, and produce postmortems.
Migration Architect
by @alirezarezvani
Plan and execute code and system migrations — database migrations, framework upgrades, cloud migrations, and monolith-to-microservices transitions.
Observability Designer
by @alirezarezvani
Design comprehensive observability for distributed systems — metrics, logs, traces, alerting rules, and dashboards that surface real problems fast.
Operations Change Request
by @anthropics
Create a change management request with impact analysis and rollback plan. Use when proposing a system or process change that needs approval, preparing a change record for CAB review, documenting risk and rollback steps before a deployment, or planning stakeholder communications for a rollout.
Operations Runbook
by @anthropics
Create or update an operational runbook for a recurring task or procedure. Use when documenting a task that on-call or ops needs to run repeatably, turning tribal knowledge into exact step-by-step commands, adding troubleshooting and rollback steps to an existing procedure, or writing escalation paths for when things go wrong.
Runbook Generator
by @alirezarezvani
Generate clear operational runbooks — step-by-step procedures for deployments, incident response, disaster recovery, and routine maintenance tasks.
Senior Security Engineer
by @alirezarezvani
Threat modeling, penetration testing guidance, zero-trust architecture design, and security code review from a senior security engineering perspective.