The Misconception

GitOps sounds simple: store your infrastructure in Git and apply it. But that description misses the key insight — the cluster pulls from Git, not the other way around.

This inversion matters enormously for ML systems, where you often have multiple pipelines, experiment runs, and model versions in flight simultaneously.

How FluxCD Works

FluxCD runs as a set of controllers inside the cluster:

Every 1 minute (configurable), the source controller pulls the latest commit. If anything has drifted from the declared state, the relevant controller reconciles it.

Why This Matters for ML

ML systems have a tendency to accumulate “snowflake” state — a model that was manually deployed, an experiment that left orphaned resources, a config that was edited in place. GitOps makes this visible and reversible.

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: mlflow
  namespace: mlops
spec:
  interval: 10m
  chart:
    spec:
      chart: mlflow
      version: ">=1.0.0"
      sourceRef:
        kind: HelmRepository
        name: community-charts
  values:
    backendStore:
      postgres:
        enabled: true

This HelmRelease is the source of truth for MLflow. If someone manually scales down the deployment, FluxCD will scale it back up within 10 minutes.

The Culture Shift

GitOps only works if the team commits to the constraint: if it’s not in Git, it doesn’t exist. That’s a cultural change as much as a technical one.