Python Dependency Hell in ML Projects

Introduction

The most reliable sign of an immature ML project is the phrase “it works on my machine.” In ML this phrase is especially dangerous because the gap between “works locally” and “works in production” is wider than in most software. You have Python version differences, CUDA version mismatches, transitive dependency conflicts, packages that behave differently on different OS versions, and models that produce subtly different outputs depending on which version of numpy was used to preprocess the training data.

This post is about treating ML dependencies as a first-class engineering problem.

Why ML dependencies are harder than regular software dependencies

In web services, a dependency is usually just code. In ML projects, dependencies affect numerical outputs. A BLAS library update can change floating-point results. A different numpy version can change array operation behaviour in edge cases. A PyTorch minor version upgrade can change the default initialisation scheme for a layer type.

This means dependency drift is not just a build problem — it’s a reproducibility problem. A model that performed at 94% accuracy at training time might perform at 92% in production, and the root cause might be that scikit-learn 1.2 and 1.3 handle feature scaling slightly differently.

The four layers of the problem

1. Python interpreter version

Lock this. Not to a minor version — to a patch. python3.11.7, not python3.11. Use pyenv or, better, build a base container image that pins it.

2. Direct dependencies

Use a proper lockfile. pip freeze > requirements.txt produces a flat list but doesn’t capture why each package is there or manage conflicts well. pip-tools, poetry, or uv all give you a separation between what you depend on and what gets installed, which is the right model.

For ML projects specifically, treat your training environment and inference environment as separate dependency sets. They share a core (numpy, your model library) but have different extras (Jupyter, data loaders, augmentation libraries vs. a lean serving stack).

3. Native libraries and CUDA

This is where most container-based solutions break. If you pip install torch inside a container, you get CPU-only PyTorch unless you specifically target the CUDA wheel. And the CUDA wheel version must match the CUDA version installed in your base image, which must be compatible with the driver version on your GPU nodes.

The only reliable approach is to start from a known base — nvidia/cuda:12.2.0-cudnn8-runtime-ubuntu22.04 or the PyTorch base images — and build up from there. Never use a generic Python image for GPU workloads.

4. OS-level dependencies

Some Python packages are wrappers around native libraries: opencv-python, lightgbm, faiss. These have OS-level requirements (libgomp, libGL, etc.) that don’t appear in your Python lockfile. Document and install them explicitly in your Dockerfile.

A practical setup

FROM nvidia/cuda:12.2.0-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    python3.11 python3.11-venv libgomp1 && \
    rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN python3.11 -m pip install --no-cache-dir -r requirements.txt

And requirements.txt is the output of pip-compile (pip-tools) from a requirements.in that lists only your direct dependencies:

torch==2.2.0+cu122 --extra-index-url https://download.pytorch.org/whl/cu122
transformers==4.38.1
scikit-learn==1.4.0
numpy==1.26.4

Lock the full transitive tree. Commit the lockfile. Treat changes to it as requiring the same review attention as code changes.

For training reproducibility

Beyond dependencies, ML training reproducibility requires:

Seeding: torch.manual_seed, numpy.random.seed, random.seed, and — for GPU — torch.cuda.manual_seed_all
Deterministic ops: torch.use_deterministic_algorithms(True) with the CUBLAS_WORKSPACE_CONFIG env var set
Data order: shuffle with a fixed seed, version your dataset alongside your model

None of this matters if your dependencies aren’t locked, because the same seed applied to different numerical primitives gives different results.

The payoff

A properly pinned ML environment means:

A colleague can reproduce your training run 6 months later
Production inference matches development behaviour
Debugging is about your code and data, not your environment
Upgrading a dependency is an explicit, reviewed decision rather than an accident

Dependency management is not glamorous work. But it is the foundation that makes everything else in MLOps reliable.