Building Python wheels with Fortran for Windows

Published on 2017-10-08 00:00:00

Problem statement: how to build Python wheel binary packages that contain compiled Fortran code, on Windows, so that they are compatible with the Python binaries from Python.org?

This is not completely trivial, because the compiler choices are limited by (i) that the C runtime library (CRT) must be compatible with that of the main Python executable, which uses some specific version of the Microsoft C runtime, and (ii) as far as free software [1] Fortran 90+ compilers go, GNU Fortran from the mingw project is still largely the only game in town.

The F90 compiler issue is what was blocking Scipy from putting Windows wheels on Pypi. This plumbing issue was solved with some work, as detailed below.

Solution to CRT problem

../../_images/pipe.jpg
Figure 1. Build process (artist’s depiction).

The solution to the runtime library issue is in principle relatively straightforward, and its implementation in Cmake was discussed before elsewhere. The main approach is:

  1. Compile Fortran code with gfortran into a separate DLL, which is linked against the GNU runtime, usually statically.
  2. The Fortran functions that need to be called are marked as exported from the DLL.
  3. The Python .pyd extension is compiled with the correct version of Microsoft Visual C++ compiler, appropriate for the specific Python version (consult the list on python.org wiki).
  4. The .pyd file is dynamically linked against the DLL containing the Fortran routines.

As long as there is no explicit passing of CRT resources (e.g. file handles) from the .pyd extension to the Fortran DLL, the two different runtime libraries are isolated from each other and coexist peacefully. This also usually works for most cases — typical Fortran code does not deal with e.g. FILE* pointers, or return newly allocated blocks of memory to calling C code, because these things require language additions that came long after Fortran 77.

../../_images/diagram.png
Figure 2. MSVC+gfortran build (schematic).

Implementation

The above approach was implemented in Numpy pull request #9431, by @xoviat and me, and Scipy build set up in Scipy PR #7616. Both are merged, and in line to be live in action in the upcoming Numpy 1.14.0 and Scipy 1.0.0.

All projects using numpy.distutils can benefit from this work, and it generally works almost out of the box, as long as the necessary compilers are installed.

However, the solution is not fully complete — if your Fortran code needs routines you implemented in C, you may not be as lucky, as the numpy.distutils work only exports routines out from and not into the DLL. There are some hacks that could work around this, but currently this should be considered unsupported.

The implementation itself is somewhat more convoluted than the ideas it’s based on due to dealing with distutils. The main points however are:

  • For each Python extension, treat C and Fortran source files separately.

  • Fortran sources are compiled as usual, and linked via:

    gfortran $FFLAGS -o extra-dll/$DLLNAME \
        $F_OBJECTS                         \
        -Wl,--allow-multiple-definition    \
        -Wl,--output-def,$DLLNAME.def      \
        -Wl,--export-all-symbols           \
        -Wl,--enable-auto-import           \
        -static                            \
        -mlong-double-64                   \
        $F_CHAINED_DLLS
    

    For generating DLLs out of .a static libs, $F_OBJECTS is replaced by:

    -Wl,--whole-archive $F_STATIC_LIBS -Wl,--no-whole-archive \
    

    Namely, the constructed DLL exports all symbols from the object files and from any static .a archives — this latter part is useful as it enables linking of e.g. static OpenBLAS to a MSVC compilation unit. The --allow-multiple-definition flag is required to emulate standard link behavior. Finally, -static links the GNU CRT statically.

  • To avoid including static OpenBLAS into all DLLs, the system builds separate DLL files for each .a archive, which it then links dynamically in.

  • Importantly, to avoid DLL hell, the generated $DLLNAME file name contains an encoded SHA1 hash of the object file content and dynamically linked DLLs included. The contents of extra-dll\ typically look like:

    scipy\
      __init__.py
      extra-dll\
        libopenblas.UWVN3XTD2LSS7SFIFK6TIQ5GONFDBJKU.gfortran-win32.dll
        libansari.Q4BAGRNANLWD2YZJOKYPOAUIOLXW2LXK.gfortran-win32.dll
        lib_arpack-.OANI5DHXTTJ2LE4Q42I5J55AXWCBKCF4.gfortran-win32.dll
        libbanded5x.6GPWIPEFGX4CZJ6AYQBC2CY2JJYW433R.gfortran-win32.dll
        libbispeu.OAANPWJKKXZRFOCA7BPAXPEXKORTJQMF.gfortran-win32.dll
        ...
      ...
    

    This ensures that DLL files from different Python extensions cannot interfere with each other — if the file names match, then either the DLL files are completely interchangeable, or an accidental SHA1 hash collision has occurred (effectively impossible).

  • The export .def and .lib for the DLL generated by gfortran are used to link the DLL to the .pyd extension, in standard MSVC link commands.

Building with numpy.distutils

Finally, let’s comment on how to use the MSVC+gfortran combo to build wheels for your own project that uses numpy.distutils. You may also be interested in the corresponding appveyor.yml build configuration for Scipy.

Setting up the build environment works as follows:

  1. Install Numpy >= 1.14.0 (currently, only the development version exists). If you want to build against older Numpy, you can just replace the numpy/distutils and maybe the numpy/compat directories with those from the new version.

  2. Go to https://wiki.python.org/moin/WindowsCompilers and install the MSVC compilers and SDKs corresponding to your Python version.

  3. Go to http://www.msys2.org/ to install mingw-w64 toolchain. Make sure to install the gfortran etc. compiler packages.

  4. pip install --upgrade pip setuptools wheel — this is necessary for MSVC support.

  5. If you need LAPACK/BLAS, build OpenBLAS with the Mingw64 toolchain into a static openblas.a library.

    To use it, drop the .a file under the Lib directory of your Python install, or write a site.cfg file to put next to your setup.py:

    [openblas]
    libraries = openblas
    library_dirs = c:\INSERT-PATH-HERE\
    include_dirs = c:\INSERT-PATH-HERE\include
    

    Prebuilt OpenBLAS binaries can also be found (but don’t rely on these being present forever, build it yourself):

    These openblas builds come from https://github.com/matthew-brett/build-openblas

  6. set PATH=%PATH%;c:\msys2\mingw64\bin

    python setup.py bdist_wheel

    Note that only Mingw-w64 needs to be in PATH; setuptools finds the correct MSVC version itself by looking in the registry.

The build will result in a bunch of DLL files placed into a extra-dll directory inside the project installation/wheel. You need to insert a statement to your top-level __init__.py, in order to add the directory in the DLL search path:

import os
extra_dll_dir = os.path.join(os.path.dirname(__file__), 'extra-dll')
if os.path.isdir(extra_dll_dir):
    os.environ["PATH"] += os.pathsep + extra_dll_dir

If you are using the config.make_config_py() stuff from numpy.distutils, the necessary lines are inserted in the generated __config__.py.

Future?

The MSVC + gfortran solution is a pragmatic approach that relies only on widely available tools that are maintained by someone else(TM). It should be noted here that the mingwpy project made some progress in getting everything to work with the mingw-w64 toolchain only. If this turns out to work in the end in a sustainable way, we will likely take a second look at the build setup, in order to drop MSVC and get to 100% free software.

Footnotes

[1]A non-free solution is using Intel Fortran compilers. The Scipy project however was not happy with some requirements in the compiler license, so this option was out. However 3rd parties did provide binaries built in this way.

Comments

blog comments powered by Disqus