cheatsheets Jun 28, 2026 updated Jun 28, 2026
Kubernetes Operational Checklist
A small operational checklist for Kubernetes services and AI workloads.
- Status
- evergreen
- Visibility
- public
- Category
- Infrastructure
- Difficulty
- intermediate
- Published
- Jun 28, 2026
- Updated
- Jun 28, 2026
Deployment
- Requests and limits are set.
- Readiness and liveness probes exist.
- Rollout strategy is understood.
- Config and secrets are separated.
- Service account permissions are scoped.
Debugging
- Logs are searchable by deployment, pod, and request ID.
- Dashboards show error rate, latency, CPU, memory, and restarts.
- Runbook includes
kubectlcommands. - Rollback command is documented.
AI Workloads
- GPU scheduling is explicit.
- Model artifact storage is documented.
- Warmup behavior is known.
- Queue depth and job failure rate are visible.
Source Links
Related Notes
Docs Jun 28, 2026 intermediate
Kubernetes Basics for AI Workloads
A practical map of Kubernetes concepts that matter for backend and AI infrastructure work.
Cheat Sheets Jun 28, 2026 intermediate
FastAPI Production Checklist
A compact checklist for taking a FastAPI service from useful prototype to production-ready backend.
Cheat Sheets Jun 28, 2026 advanced
LLM API Reliability Checklist
A checklist for integrating external LLM and model APIs safely.
Cheat Sheets Jun 28, 2026 intermediate
Observability and Reliability Checklist
A checklist for making backend services debuggable before they are painful.
Docs Jun 28, 2026 intermediate
Observability and Reliability Basics
A backend engineer's starting point for logs, metrics, traces, alerts, and incident-ready systems.
Backlinks
Docs Jun 28, 2026 intermediate
Kubernetes Basics for AI Workloads
A practical map of Kubernetes concepts that matter for backend and AI infrastructure work.