cheatsheets Jun 28, 2026 updated Jun 28, 2026
Observability and Reliability Checklist
A checklist for making backend services debuggable before they are painful.
- Status
- evergreen
- Visibility
- public
- Category
- Reliability
- Difficulty
- intermediate
- Published
- Jun 28, 2026
- Updated
- Jun 28, 2026
Logs
- Logs are structured.
- Request IDs exist.
- Job IDs exist for async work.
- Sensitive data is redacted.
Metrics
- Request rate, error rate, and latency are visible.
- Dependency latency is visible.
- Queue depth is visible where relevant.
- Cost-sensitive API usage is tracked.
Alerts
- Alerts map to user impact.
- Alert owner is clear.
- Runbook link exists.
- Rollback path is documented.
Source Links
Related Notes
Docs Jun 28, 2026 intermediate
Observability and Reliability Basics
A backend engineer's starting point for logs, metrics, traces, alerts, and incident-ready systems.
Cheat Sheets Jun 28, 2026 advanced
LLM API Reliability Checklist
A checklist for integrating external LLM and model APIs safely.
Blog Jun 28, 2026 intermediate
Why I'm Building an AI Infrastructure Learning OS
A personal operating system for turning backend and AI infrastructure learning into durable, searchable engineering knowledge.
Cheat Sheets Jun 28, 2026 intermediate
FastAPI Production Checklist
A compact checklist for taking a FastAPI service from useful prototype to production-ready backend.
Cheat Sheets Jun 28, 2026 intermediate
Kubernetes Operational Checklist
A small operational checklist for Kubernetes services and AI workloads.
Backlinks
Docs Jun 28, 2026 intermediate
Observability and Reliability Basics
A backend engineer's starting point for logs, metrics, traces, alerts, and incident-ready systems.