# Docker images for GitHub CI and CD

This directory contains everything needed to build the Docker images
that are used in our CI.

The Dockerfiles located in subdirectories are parameterized to
conditionally run build stages depending on build arguments passed to
`docker build`. This lets us use only a few Dockerfiles for many
images. The different configurations are identified by a freeform
string that we call a _build environment_. This string is persisted in
each image as the `BUILD_ENVIRONMENT` environment variable.

See `build.sh` for valid build environments (it's the giant switch).

## Docker CI builds

* `build.sh` -- dispatch script to launch all builds
* `common` -- scripts used to execute individual Docker build stages
* `ubuntu` -- Dockerfile for Ubuntu image for CPU build and test jobs
* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker
* `ubuntu-rocm` -- Dockerfile for Ubuntu image with ROCm support
* `ubuntu-xpu` -- Dockerfile for Ubuntu image with XPU support

### Docker CD builds

* `conda` - Dockerfile and build.sh to build Docker images used in nightly conda builds
* `manywheel` - Dockerfile and build.sh to build Docker images used in nightly manywheel builds
* `libtorch` - Dockerfile and build.sh to build Docker images used in nightly libtorch builds

## Usage

```bash
# Build a specific image
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

# Set flags (see build.sh) and build image
sudo bash -c 'TRITON=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest'
```

## [Guidance] Adding a New Base Docker Image

### Background

The base Docker images in directory `.ci/docker/` are built by the `docker-builds.yml` workflow. Those images are used throughout the PyTorch CI/CD pipeline. You should only create or modify a base Docker image if you need specific environment changes or dependencies before building PyTorch on CI.

1. **Automatic Rebuilding**:
   - The Docker image building process is triggered automatically when changes are made to files in the `.ci/docker/*` directory
   - This ensures all images stay up-to-date with the latest dependencies and configurations

2. **Image Reuse in PyTorch Build Workflows** (example: linux-build):
   - The images generated by `docker-builds.yml` are reused in `_linux-build.yml` through the `calculate-docker-image` step
   - The `_linux-build.yml` workflow:
     - Pulls the Docker image determined by the `calculate-docker-image` step
     - Runs a Docker container with that image
     - Executes `.ci/pytorch/build.sh` inside the container to build PyTorch

3. **Usage in Test Workflows** (example: linux-test):
   - The same Docker images are also used in `_linux-test.yml` for running tests
   - The `_linux-test.yml` workflow follows a similar pattern:
     - It uses the `calculate-docker-image` step to determine which Docker image to use
     - It pulls the Docker image and runs a container with that image
     - It installs the wheels from the artifacts generated by PyTorch build jobs
     - It executes test scripts (like `.ci/pytorch/test.sh` or `.ci/pytorch/multigpu-test.sh`) inside the container

### Understanding File Purposes

#### `.ci/docker/build.sh` vs `.ci/pytorch/build.sh`
- **`.ci/docker/build.sh`**:
  - Used for building base Docker images
  - Executed by the `docker-builds.yml` workflow to pre-build Docker images for CI
  - Contains configurations for different Docker build environments

- **`.ci/pytorch/build.sh`**:
  - Used for building PyTorch inside a Docker container
  - Called by workflows like `_linux-build.yml` after the Docker container is started
  - Builds PyTorch wheels and other artifacts

#### `.ci/docker/ci_commit_pins/` vs `.github/ci_commit_pins`
- **`.ci/docker/ci_commit_pins/`**:
  - Used for pinning dependency versions during base Docker image building
  - Ensures consistent environments for building PyTorch
  - Changes here trigger base Docker image rebuilds

- **`.github/ci_commit_pins`**:
  - Used for pinning dependency versions during PyTorch building and tests
  - Ensures consistent dependencies for PyTorch across different builds
  - Used by build scripts running inside Docker containers

### Step-by-Step Guide for Adding a New Base Docker Image

#### 1. Add Pinned Commits (If Applicable)

We use pinned commits for build stability. The `nightly.yml` workflow checks and updates pinned commits for certain repository dependencies daily.

If your new Docker image needs a library installed from a specific pinned commit or built from source:

1. Add the repository you want to track in `nightly.yml` and `merge-rules.yml`
2. Add the initial pinned commit in `.ci/docker/ci_commit_pins/`. The text filename should match the one defined in step 1

#### 2. Configure the Base Docker Image
1. **Add new Base Docker image configuration** (if applicable):

   Add the configuration in `.ci/docker/build.sh`. For example:
   ```bash
   pytorch-linux-jammy-cuda12.8-cudnn9-py3.12-gcc11-new1)
     CUDA_VERSION=12.8.1
     ANACONDA_PYTHON_VERSION=3.12
     GCC_VERSION=11
     VISION=yes
     KATEX=yes
     UCX_COMMIT=${_UCX_COMMIT}
     UCC_COMMIT=${_UCC_COMMIT}
     TRITON=yes
     NEW_ARG_1=yes
     ;;
   ```

2. **Add build arguments to Docker build command**:

   If you're introducing a new argument to the Docker build, make sure to add it in the Docker build step in `.ci/docker/build.sh`:
   ```bash
   docker build \
     ....
     --build-arg "NEW_ARG_1=${NEW_ARG_1}"
   ```

3. **Update Dockerfile logic**:

   Update the Dockerfile to use the new argument. For example, in `ubuntu/Dockerfile`:
   ```dockerfile
   ARG NEW_ARG_1
   # Set up environment for NEW_ARG_1
   RUN if [ -n "${NEW_ARG_1}" ]; then bash ./do_something.sh; fi
   ```

4. **Add the Docker configuration** in `.github/workflows/docker-builds.yml`:

   The `docker-builds.yml` workflow pre-builds the Docker images whenever changes occur in the `.ci/docker/` directory. This includes the
   pinned commit updates.
