Securing ML Systems on Kubernetes

Why Security Is an MLOps Problem, Not Just an Infra Problem

ML systems have unique attack surfaces. A compromised training pipeline can inject poisoned data. A model artifact that’s been tampered with can produce subtly wrong predictions at scale. An exposed MLflow API can leak proprietary model architectures and training data.

Standard Kubernetes security hygiene doesn’t cover these cases. You need both.

The CIS Benchmark Baseline

I ran kube-bench against my k3s cluster after Phase 1. The results were humbling — not because k3s is insecure by default, but because defaults are designed for usability, not security.

Key findings that mattered for ML workloads:

Anonymous authentication enabled: The API server would serve requests without credentials if you knew the right endpoint. For a cluster running ML experiments with proprietary data, this is a critical finding. Fix: --anonymous-auth=false.

No PodSecurityAdmission: Pods could run as root, mount arbitrary host paths, use host networking. A compromised training pod could exfiltrate data via the host network. Fix: Enable PSA in enforce mode with restricted profile for ML workloads.

No NetworkPolicies: Any pod could talk to any other pod. A compromised inference service could reach the MLflow API and tamper with model registry state. Fix: default-deny NetworkPolicy with explicit allow rules.

OPA/Gatekeeper for ML-Specific Policies

Some security requirements are specific to ML systems and not covered by generic Kubernetes policies:

# Deny model deployments without required labels
package mlops.model.labels

violation[{"msg": msg}] {
  input.review.object.kind == "Deployment"
  input.review.object.metadata.labels["mlops.io/model-name"] == null
  msg := "Model deployments must have mlops.io/model-name label"
}

This policy enforces that every Deployment in the ML namespace is labeled with its model name — a prerequisite for drift tracking and audit logging.

HashiCorp Vault for Secrets

Credentials for MinIO, PostgreSQL (MLflow backend), and the model registry should never touch Git or Kubernetes Secrets (which are only base64-encoded, not encrypted).

Vault provides dynamic credentials, automatic rotation, and audit logging for every secret access. Combined with the ExternalSecrets Operator, secrets are injected into pods at runtime without ever persisting in etcd.