Creating Python Wheels for Linux

I maintain a python library for computational topology called BATS. It mostly consists of bindings for a C++ library created using pybind11. One of the annoying things about installing these bindings on a new machine is that compilation can take a few minutes (compared to what you might expect usually with pip) - you also need to make sure you have installed OpenMP, and are using a recent version of g++ which is C++17 compliant. The end result is that I’ve done a bit of troubleshooting helping people get BATS bindings installed.

Python Wheels

Python has a mechanism for distributing pre-compiled source called a wheel. This is why when you pip install numpy, you can just download the relevant files and never actually need to compile its interfaces to numerical libraries from source. The problem is that wheels need to be fairly generic - you can’t have dependencies that change from system to system.

In PEP 513, the manylinux platform tag was introduced to limit dependencies to a standard subset of the Linux kernel and ABI, so you can produce a binary which runs on (almost) any Linux machine.

Building manylinux wheels

To build manylinux wheels, you can’t just upload binaries built in the standard way on your system. It is easiest to use a Docker container provided by the Python Packaging Authority PyPA. You can find instructions and an example in this GitHub repository

Installing Docker

If you don’t already have docker, you need to install it. I use Fedora, so I followed the guide here.

sudo dnf -y install dnf-plugins-core
sudo dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install docker-ce docker-ce-cli containerd.io

You pull the containers for building manylinux wheels.

sudo docker pull quay.io/pypa/manylinux_2_24_x86_64

Running the docker container

To build wheels for BATS, I run the following command from the root of the directory

sudo docker run --rm -e PLAT=manylinux_2_24_x86_64 -v `pwd`:/io quay.io/pypa/manylinux_2_24_x86_64 bash /io/build_wheels.sh

This binds the current directory (pwd) to the location /io in the container, so you can read/write from your filesystem. The final bit executes the build_wheels.sh script from within the docker container, which is running Debian 9.

Building

BATS imports matplotlib for visualizations, and I found that pip will sometimes compile PIL (a dependency of matplotlib) from source within the container. This requires libjpeg which is necessary to install inside the script

apt update
apt install -y libjpeg-dev # debian

The provided docker container has several Python versions in /opt/python/, so we iterate over building a wheel for each version

for PYBIN in /opt/python/*/bin; do
    "${PYBIN}/pip" wheel /io/ --no-deps -w wheelhouse/
done

Next, external libraries are bundled with the wheel using the auditwheel utility. Because BATS uses OpenMP, which is not part of the default manylinux ABI, it must be bundled.

function repair_wheel {
    wheel="$1"
    if ! auditwheel show "$wheel"; then
        echo "Skipping non-platform wheel $wheel"
    else
        auditwheel repair "$wheel" --plat "$PLAT" -w /io/wheelhouse/
    fi
}

for whl in wheelhouse/*.whl; do
    repair_wheel "$whl"
done

The repaired wheels are stored in /io/wheelhouse so they will be visible in a wheelhouse directory on my filesystem.

Finally, the wheels are installed and tested

for PYBIN in /opt/python/*/bin; do
    "${PYBIN}/pip" install -r /io/requirements.txt
    "${PYBIN}/pip" install bats-tda --no-index -f /io/wheelhouse
    "${PYBIN}/python3" -m unittest discover -s /io/test/ -p *.py
done

Upload to PyPI

First, we test the upload on test PyPI

python3 -m twine upload --repository testpypi wheelhouse/*2_24_x86_64.whl

And assuming everything looks ok, we can upload to PyPI

python3 -m twine upload --repository pypi wheelhouse/*2_24_x86_64.whl