Building production-grade ML systems on Kubernetes. I care about reliability, observability, and security — not just model accuracy.
Senior Platform & SRE Engineer with 9+ years of experience designing large-scale cloud platforms, Kubernetes infrastructure, and high-volume observability stacks.
At DeliveryHero I architected a cloud bootstrapping platform that automated AWS account provisioning across the org, and maintained an observability stack handling 3M+ active series — with SLO monitoring that brought incident response down to an average of 3 minutes.
I’m drawn to the intersection of reliability and machine learning: where production discipline meets model chaos. I write about the trade-offs, the failures, and the tooling that actually holds up under load.
Open to conversations about MLOps, platform engineering, SRE, or just building stuff on Kubernetes. Find me on GitHub or send a mail.