Introduction
The most reliable sign of an immature ML project is the phrase “it works on my machine.” In ML this phrase is especially dangerous because the gap between “works locally” and “works in production” is wider than in most software. You have Python version differences, CUDA version mismatches, transitive dependency conflicts, packages that behave differently on different OS versions, and models that produce subtly different outputs depending on which version of numpy was used to preprocess the training data.
This post is about treating ML dependencies as a first-class engineering problem.
Why ML dependencies are harder than regular software dependencies
In web services, a dependency is usually just code. In ML projects, dependencies affect numerical outputs. A BLAS library update can change floating-point results. A different numpy version can change array operation behaviour in edge cases. A PyTorch minor version upgrade can change the default initialisation scheme for a layer type.
This means dependency drift is not just a build problem — it’s a reproducibility problem. A model that performed at 94% accuracy at training time might perform at 92% in production, and the root cause might be that scikit-learn 1.2 and 1.3 handle feature scaling slightly differently.
The four layers of the problem
1. Python interpreter version
Lock this. Not to a minor version — to a patch. python3.11.7, not python3.11. Use pyenv or, better, build a base container image that pins it.
2. Direct dependencies
Use a proper lockfile. pip freeze > requirements.txt produces a flat list but doesn’t capture why each package is there or manage conflicts well. pip-tools, poetry, or uv all give you a separation between what you depend on and what gets installed, which is the right model.
For ML projects specifically, treat your training environment and inference environment as separate dependency sets. They share a core (numpy, your model library) but have different extras (Jupyter, data loaders, augmentation libraries vs. a lean serving stack).
3. Native libraries and CUDA
This is where most container-based solutions break. If you pip install torch inside a container, you get CPU-only PyTorch unless you specifically target the CUDA wheel. And the CUDA wheel version must match the CUDA version installed in your base image, which must be compatible with the driver version on your GPU nodes.
The only reliable approach is to start from a known base — nvidia/cuda:12.2.0-cudnn8-runtime-ubuntu22.04 or the PyTorch base images — and build up from there. Never use a generic Python image for GPU workloads.
4. OS-level dependencies
Some Python packages are wrappers around native libraries: opencv-python, lightgbm, faiss. These have OS-level requirements (libgomp, libGL, etc.) that don’t appear in your Python lockfile. Document and install them explicitly in your Dockerfile.
A practical setup
FROM nvidia/cuda:12.2.0-cudnn8-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.11 python3.11-venv libgomp1 && \
rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN python3.11 -m pip install --no-cache-dir -r requirements.txt
And requirements.txt is the output of pip-compile (pip-tools) from a requirements.in that lists only your direct dependencies:
torch==2.2.0+cu122 --extra-index-url https://download.pytorch.org/whl/cu122
transformers==4.38.1
scikit-learn==1.4.0
numpy==1.26.4
Lock the full transitive tree. Commit the lockfile. Treat changes to it as requiring the same review attention as code changes.
For training reproducibility
Beyond dependencies, ML training reproducibility requires:
- Seeding:
torch.manual_seed,numpy.random.seed,random.seed, and — for GPU —torch.cuda.manual_seed_all - Deterministic ops:
torch.use_deterministic_algorithms(True)with the CUBLAS_WORKSPACE_CONFIG env var set - Data order: shuffle with a fixed seed, version your dataset alongside your model
None of this matters if your dependencies aren’t locked, because the same seed applied to different numerical primitives gives different results.
The payoff
A properly pinned ML environment means:
- A colleague can reproduce your training run 6 months later
- Production inference matches development behaviour
- Debugging is about your code and data, not your environment
- Upgrading a dependency is an explicit, reviewed decision rather than an accident
Dependency management is not glamorous work. But it is the foundation that makes everything else in MLOps reliable.