⚙️

21 Agent Skills for DevOps & Platform

1 stacks

Skills for CI/CD, infrastructure, observability, incident response, and platform engineering.

Pipeline design, secrets management, runbooks, cloud infrastructure, and the platform engineering workflow from deployment to incident postmortem.

Read the guide: The best Agent Skills for devops & platform →

New to Agent Skills? Learn how to install one in under a minute →

Infrastructure work is complex enough without the documentation debt. Runbooks don't write themselves, incident postmortems don't get completed, and onboarding docs for new engineers lag behind the actual state of the system. These skills fix that.

The skills here cover CI/CD pipeline design, secrets management, cloud infrastructure setup, runbook writing, incident response workflows, observability setup, and platform engineering documentation. They're built for the workflows that happen around the code, not just in it.

Most useful for platform engineers managing shared infrastructure and DevOps engineers who need consistent operational docs across a growing stack.

Stacks for devops & platform

All stacks →

Skills for devops & platform

All skills →
🏗️

AWS CDK Development

by @zxkane

Development

AWS CDK expert skill for building cloud infrastructure with TypeScript or Python using best-practice CDK patterns.

View & install →
💸

AWS Cost Operations

by @zxkane

Data & Analysis

AWS cost optimization and operations skill for pricing analysis, CloudWatch monitoring, budget review, and operational excellence.

View & install →
☁️

AWS Solution Architect

by @alirezarezvani

Development

Cloud infrastructure design and optimization on AWS — VPCs, IAM, compute, databases, serverless, and cost optimization from a certified architect perspective.

View & install →
⚙️

CI/CD Pipeline Builder

by @alirezarezvani

Development

Build production CI/CD pipelines for GitHub Actions, GitLab CI, and CircleCI — from lint and test to deploy with environment promotion and rollbacks.

View & install →
🛠️

Engineering Deploy Checklist

by @anthropics

Development

Pre-deployment verification checklist. Use when about to ship a release, deploying a change with database migrations or feature flags, verifying CI status and approvals before going to production, or documenting rollback triggers ahead of time.

View & install →
🛠️

Engineering Incident Response

by @anthropics

Development

Run an incident response workflow — triage, communicate, and write postmortem. Trigger with "we have an incident", "production is down", an alert that needs severity assessment, a status update mid-incident, or when writing a blameless postmortem after resolution.

View & install →
🔒

Environment & Secrets Manager

by @alirezarezvani

Development

Design secure secrets management workflows — vaults, rotation policies, environment variable hygiene, and developer-friendly secret distribution.

View & install →
🐦

gstack: Post-Deploy Canary Monitor

by @garrytan

Development

Watches the live app after a deploy for console errors, performance regressions, and page failures. Takes periodic screenshots, compares against pre-deploy baselines, and alerts on anomalies.

View & install →
⚠️

gstack: Destructive Command Guardrails

by @garrytan

Development

Warns before running rm -rf, DROP TABLE, force-push, git reset --hard, kubectl delete, and similar destructive operations. You can override each warning. Scoped to the current session.

View & install →
🔒

gstack: Chief Security Officer Audit

by @garrytan

Development

Infrastructure-first security audit: secrets archaeology, dependency supply chain, CI/CD security, OWASP Top 10, and STRIDE threat modelling. Zero noise — 8/10 confidence gate, 17 false positive exclusions. Every finding includes a concrete exploit scenario.

View & install →
🧊

gstack: Edit Scope Lock

by @garrytan

Development

Restricts all file edits to a single directory for the session. Blocks Edit and Write operations outside the allowed path — prevents accidentally changing unrelated code while debugging.

View & install →
🛡️

gstack: Full Safety Mode

by @garrytan

Development

Combines /careful (warns before destructive commands) and /freeze (locks edits to one directory) in a single command. Maximum safety for production work or high-stakes debugging.

View & install →
🚀

gstack: Land and Deploy

by @garrytan

Development

Merges the PR, waits for CI to pass, deploys to production, and verifies production health via canary checks. One command from approved PR to verified live deploy.

View & install →
⚙️

gstack: Deployment Configurator

by @garrytan

Development

One-time setup for /land-and-deploy. Detects your deploy platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, custom), production URL, and health check endpoints.

View & install →
🚨

Incident Commander

by @alirezarezvani

Development

Lead incident response from detection to resolution — coordinate teams, run war rooms, draft status updates, and produce postmortems.

View & install →
🚚

Migration Architect

by @alirezarezvani

Development

Plan and execute code and system migrations — database migrations, framework upgrades, cloud migrations, and monolith-to-microservices transitions.

View & install →
📡

Observability Designer

by @alirezarezvani

Development

Design comprehensive observability for distributed systems — metrics, logs, traces, alerting rules, and dashboards that surface real problems fast.

View & install →
⚙️

Operations Change Request

by @anthropics

Operations

Create a change management request with impact analysis and rollback plan. Use when proposing a system or process change that needs approval, preparing a change record for CAB review, documenting risk and rollback steps before a deployment, or planning stakeholder communications for a rollout.

View & install →
⚙️

Operations Runbook

by @anthropics

Operations

Create or update an operational runbook for a recurring task or procedure. Use when documenting a task that on-call or ops needs to run repeatably, turning tribal knowledge into exact step-by-step commands, adding troubleshooting and rollback steps to an existing procedure, or writing escalation paths for when things go wrong.

View & install →
📋

Runbook Generator

by @alirezarezvani

Development

Generate clear operational runbooks — step-by-step procedures for deployments, incident response, disaster recovery, and routine maintenance tasks.

View & install →
🔐

Senior Security Engineer

by @alirezarezvani

Development

Threat modeling, penetration testing guidance, zero-trust architecture design, and security code review from a senior security engineering perspective.

View & install →