The Misconception
GitOps sounds simple: store your infrastructure in Git and apply it. But that description misses the key insight — the cluster pulls from Git, not the other way around.
This inversion matters enormously for ML systems, where you often have multiple pipelines, experiment runs, and model versions in flight simultaneously.
How FluxCD Works
FluxCD runs as a set of controllers inside the cluster:
- Source Controller — watches your Git repo (or Helm repo, OCI registry)
- Kustomize Controller — applies Kustomize overlays
- Helm Controller — manages Helm releases as Kubernetes objects
Every 1 minute (configurable), the source controller pulls the latest commit. If anything has drifted from the declared state, the relevant controller reconciles it.
Why This Matters for ML
ML systems have a tendency to accumulate “snowflake” state — a model that was manually deployed, an experiment that left orphaned resources, a config that was edited in place. GitOps makes this visible and reversible.
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: mlflow
namespace: mlops
spec:
interval: 10m
chart:
spec:
chart: mlflow
version: ">=1.0.0"
sourceRef:
kind: HelmRepository
name: community-charts
values:
backendStore:
postgres:
enabled: true
This HelmRelease is the source of truth for MLflow. If someone manually scales down the deployment, FluxCD will scale it back up within 10 minutes.
The Culture Shift
GitOps only works if the team commits to the constraint: if it’s not in Git, it doesn’t exist. That’s a cultural change as much as a technical one.