SRE | DevOps | MLOps

Thomas
Nyambati

Building production-grade ML systems on Kubernetes. I care about reliability, observability, and security — not just model accuracy.

About

Senior Platform & SRE Engineer with 9+ years of experience designing large-scale cloud platforms, Kubernetes infrastructure, and high-volume observability stacks.

I’m drawn to the intersection of reliability and machine learning: where production discipline meets model chaos. I write about the trade-offs, the failures, and the tooling that actually holds up under load.

Current stack
  • Kubernetes / EKS / GKE
  • ArgoCD / GitOps
  • Prometheus + Mimir
  • Grafana / Loki / Tempo
  • Terraform / Terragrunt
  • Go / Python
  • Karpenter / HPA / VPA
  • Helm / Helmfile
  • GitHub Actions
  • AWS / GCP
Blog
post
Migrating Mimir's KV Store to Memberlist: What We Learned
A narrative account of migrating from Consul to Memberlist — the why, the phases, and the lessons.
Apr 3, 2026
KubernetesGrafanaMimirObservability
View all 1 posts →
Projects
View all 0 projects →
Contact

Let's talk.

Open to conversations about MLOps, platform engineering, SRE, or just building stuff on Kubernetes. Find me on GitHub or send a mail.