CI/CD for Machine Learning Systems

Why ML CI/CD Is Different

Software CI gates check: does the code compile, do the tests pass, does the linter approve?

ML CI gates need to check additional things:

Does the training pipeline complete without errors?
Does the resulting model meet quality thresholds (accuracy, latency, fairness metrics)?
Has the model been signed and its provenance recorded?
Does the serving container pass security scanning?

The last two points are where SRE habits pay dividends — ML teams that came from research backgrounds rarely think about supply chain security.

The Pipeline

# .github/workflows/model-release.yml
name: Model Release

on:
  push:
    paths: ['src/model/**', 'configs/**']

jobs:
  train-and-evaluate:
    runs-on: self-hosted
    steps:
      - name: Run training pipeline
        run: argo submit workflows/train.yaml --wait

      - name: Check quality gates
        run: |
          ACCURACY=$(python scripts/get_metric.py accuracy)
          if (( $(echo "$ACCURACY < 0.90" | bc -l) )); then
            echo "Accuracy $ACCURACY below threshold"
            exit 1
          fi

      - name: Sign model artifact
        run: cosign sign ${{ env.MODEL_IMAGE }}

      - name: Promote to staging
        run: mlflow models transition-to-stage --name classifier --stage Staging

Supply Chain Security

The cosign sign step is often skipped in ML pipelines. It shouldn’t be. A signed model artifact gives you:

Proof that the model was produced by a specific pipeline run
Tamper evidence — if the artifact is modified after signing, verification fails
An audit trail when something goes wrong in production

The ML supply chain (training data → code → artifact → deployment) has the same attack surface as software supply chains, and deserves the same treatment.

Self-Hosted Runners

For pipelines that submit to Argo Workflows in a homelab cluster, you need a self-hosted GitHub Actions runner with kubectl access. I run the runner as a Kubernetes Deployment with a service account that has limited permissions — just enough to submit workflows and read pod logs.