series
MLOps Journey
Building a production ML platform on Kubernetes from scratch — infrastructure to chaos engineering.
11 episodes · Mar 2025 – Mar 2026
→ post
SLOs as a Conversation Tool, Not a Metric
The most valuable thing about Service Level Objectives isn't the number — it's what defining one forces you to discuss
Mar 10, 2026
SREObservabilityCulture
→ post
Python Dependency Hell in ML Projects
Why your ML environment works on your laptop and breaks in production — and how to fix it for good
Feb 24, 2026
MLOpsPythonContainers
→ post
Writing Runbooks That Actually Help
Most runbooks are useless at 3 a.m. Here's how to write ones that aren't
Feb 10, 2026
SRECultureObservability
→ post
Requests, Limits, and the Lies We Tell the Scheduler
Why misconfigured resource requests are the root cause of half your mysterious cluster problems
Jan 28, 2026
K8sPerformanceSRE
→ post
The On-Call Tax
What pager fatigue actually costs you — and how to measure it
Jan 15, 2026
SREObservabilityCulture
→