The Mental Model
If you’ve operated Airflow, cron jobs, or Jenkins pipelines, Argo Workflows will feel familiar. Each step in a pipeline is a container. Dependencies between steps form a DAG. Argo schedules, executes, retries, and archives the results.
The key difference: Argo is Kubernetes-native. Each step is a Kubernetes Pod. That means you get all of Kubernetes — resource limits, GPU scheduling, node affinity, service accounts — for free.
A Simple Training Pipeline
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: train-model
spec:
entrypoint: pipeline
templates:
- name: pipeline
dag:
tasks:
- name: preprocess
template: preprocess-data
- name: train
template: train-model
dependencies: [preprocess]
- name: evaluate
template: evaluate-model
dependencies: [train]
- name: register
template: register-model
dependencies: [evaluate]
when: "{{tasks.evaluate.outputs.parameters.accuracy}} > 0.90"
The when condition on the register step is the interesting part — it implements a quality gate. Models that don’t hit the accuracy threshold never get registered, and therefore never get deployed.
Parameterized Workflows
For hyperparameter search, you can combine Argo Workflows with Argo’s withItems to fan out training runs in parallel:
- name: hyperparameter-search
dag:
tasks:
- name: train-{{item.lr}}-{{item.batch}}
template: train-model
arguments:
parameters:
- name: learning-rate
value: "{{item.lr}}"
- name: batch-size
value: "{{item.batch}}"
withItems:
- { lr: "0.001", batch: "32" }
- { lr: "0.001", batch: "64" }
- { lr: "0.0001", batch: "32" }
Comparing to Airflow
Argo Workflows wins on Kubernetes-nativeness: no Python DAG code to maintain, no Airflow scheduler, no separate metadata database. The trade-off is that complex Python-native orchestration logic is less ergonomic — Argo YAML can get verbose for deeply conditional logic.
For an MLOps platform that’s already Kubernetes-first, Argo is the right choice.