WavePacket (C++/Python) Blog

Time-dependent simulation of open and closed quantum systems

Status: Alpha

Brought to you by: bsch63, ulflor

Building a Python interface to a CMake library

I have recently rewritten the way how the Python module for Wavepacket is built, and learned a lot about Python along the way. The learning curve was rather steep, and it felt a lot like learning the intricate details of Nuget and .Net assembly loading (this is not a compliment). However, while extension libraries for Python may not be ideally documented, the implementation is pretty robust and neat. So I thought, I'd write up what I tried and found out to assist others in a similar situation.

Side note: I do appreciate feedback.

To briefly summarize my initial situation: I have a C++ library, WavePacket, for which I want to provide an easy-to-install and easy-to-use Python interface using Pybind11.

Things start out simple ...

At the end of the day, Python's approach to loading modules, which may include libraries with native code, is pretty robust. Neglecting topics like shadowing where multiple modules have the same name, the algorithm works roughly like this:

Python has search paths where it looks for modules. You can extend these paths either by setting the environment variable PYTHONPATH to a colon-separated list of directories, or within Python by adding a path string to the list in sys.path.
When you import a module, Python looks in every search path directory. It searches for either a library with the module name ("lib<module>.so" with various variations under Unix, "<module>.pyd" under Windows) or a subdirectory with the module name.</module></module>
When it finds a library, it looks for an exported symbol "PyInit_<module>" with a certain signature, which then tells Python about the functions, classes etc. that it exports.. Such a library can be constructed relatively easily for example with the help of pybind11.</module>
If a subdirectory with the name of the module exists, it has to contain a file init.py, which is evaluated on loading the module.

One interesting feature of init.py is that it can load more modules, also relative to the module's path.

...but they can get pretty complicated

On top of all this functionality, however, there is the Python package management system (pip), which provides a simple-to-use repository of common
packages and so on. This part I was actually much less interested in, but it was way easier to find documentation on how to create a Python package then how the whole thing works. So my original approach was to bundle the Python interface in a package.

Python used to have a big blob called setuptools that would magically do all the building, packaging, installation, distribution and so on for you. Using internet sources, I finally cobbled together the following solution:

You build and install Wavepacket using CMake with whatever backend (e.g., make).
As a side effect of the build, my CMake scripts output a setup.py file for the Python compilation in the build directory.
You run this setup.py script to build and install the Python bindings
Afterwards, you can run the various Python integration tests in the build directory using CMake tools again (ctest).

The good thing was that this solution worked. However, it had some glaring deficits. On one hand, if you wanted to test that your Python interface worked, you had to use CMake/make, then Python setuptools, then CMake again, which is ugly. What was worse, especially when compiling under Windows, there were two different build systems (CMake, setuptools) with different compiler flags and potentially compilers and downright different logic.

So I decided to rewrite this part.

Scikit

Fortunately, since my first try, the Python world has moved on. Basically, the moloch setuptools has withdrawn to provide only framework functionality for the packaging. Other tools have sprung up to provide the actual frontend that deals with, for example, the compilation of a Python extension library.

An interesting approach is used by scikit-build. The basic idea here is that you already have a library that is built with CMake, and add the Python bindings there. Scikit-build then drives the CMake build, which includes the Python bindings, takes the artefacts that come out, and wraps them in a Python package. While this sounds already great, scikit-build goes further. It provides some CMake extensions (/functions) for building and linking against a given Python
installation, and, when calling CMake, sets the module path so that these extensions are easily found.

With such nice tooling at my side, my idea was to remove the split between the Python bindings and the C++ library. If the CMake build was driven by scikit-build, I would directly compile the Python bindings together with the actual wavepacket code into a single library, and have it packaged and installed. So I would have a Python setup script on top of my CMake build that would drive the compilation if you want to get out a Python package.

Alas, while trying, I discovered some stepping stones that eventually made me give up this approach:

First of all, the documentation does not cover all the functionality, so when you have some edge cases, you are left with trying out things or reading the code.
One of the issues I learned by trial and error is that scikit-build expectes a very particular directory layout. When you want to install a module named "wavepacket", you must have certain data (such as an init.py file) in a directory called "wavepacket", otherwise things did not turn out well by default. In other words, follow the published examples to the letter.
Another issue were the CMake extensions themselves. To use the full functionality, I had to compile the Wavepacket library as a module (CMake terminology for a library that is loaded at run-time). But CMake correctly prohibits linking against modules, so for example all my C++ unit tests would have to be disabled as well when compiling for Python. Also, CMake supports two different notations for declaring link dependencies, but you may only pick one per target. Scikit-build uses the legacy notation, but I prefer the modern one for boost dependencies, which also clashes then.
setuptools originally had some functionality to run unit tests, which is not present in scikit-build. While I fully understand and support the motivation (different concerns should be addressed by different tools), I need yet another tool (e.g., tox) besides CMake and scikit-build for driving my Python tests.

So while there was no real unsurmountable obstacle, things became annoying enough that I was looking for alternatives. Mind you, if you keep these things in mind, scikit-build does look like a viable alternative.

However, while playing around with scikit-build, I eventually understood enough of Python's internals to be able to build my own module by hand. Furthermore, pondering the problem, it occured to me that I do not want to provide a Python package at all. Wavepacket has some pretty non-standard dependencies (namely the underlying tensor library) that are not easily included in a distribution. So it seems more honest not to build a Python package, but only a module that can be loaded from Python.

The final approach

And that is where my final concept ended up: I use only a CMake build, which creates the Python bindings as a byproduct. The drawback is that this can
become tricky for edge cases (multiple Python installations and such), but CMake has facilities to help there.

As a quick run-down: My top-level CMakeLists.txt offers an option to build the Python bindings and searches for Python

# ...
option(WP_BUILD_PYTHON "Build the Python interface" ON)
# ...
if (WP_BUILD_PYTHON)
     find_package(Python3 COMPONENTS Interpreter Development)
     if (NOT Python3_FOUND)
          message(ERROR "Need Python3 interpreter and development environment to compile Python module")
     endif()
endif()
#...
add_subdirectory(src)
if (WP_BUILD_PYTHON)
     add_subdirectory(python)
endif()

It then descends into two subdirectories: src, which builds the C++ library with the Wavepacket functionality, and python, which builds the Python bindings.

The CMakeLists.txt under python/ does some general setup (https://sourceforge.net/p/wavepacket/cpp/git/ci/master/tree/python/CMakeLists.txt):

It checks for the existence of certain Python modules. This can be done by importing the module with a Python command line and checking the return value
of the interpreter.
It queries the include directories for Pybind11. These are fortunately accessible from Python.
It adds a CMake test to run the various Python tests. This requires some fiddling, because we need to tell Python where to find our Python bindings once they are compiled, and I did not quite grasp the idea of Python's unittest package, so I execute it in the directory of the tests. Also, I copy the tests to the binary directory where everything is built to to avoid cluttering my source directory with residues.

Finally, under python/ I create another directory wavepacket/ that will hold my Python bindings. The directory contains the source code for the Python bindings, a CMakeLists.txt to drive the compilation and installation, and an init.py script.

The compilation of the Python bindings is pretty standard now. The rest is a bit particular:

configure_file (
     "${CMAKE_CURRENT_SOURCE_DIR}/__init__.py"
     "${CMAKE_CURRENT_BINARY_DIR}/__init__.py"
     COPYONLY)

# Installation rules
# Note that we need to link against the wavepacket library, so we need to set an rpath
set(pythonInstallDir ${CMAKE_INSTALL_LIBDIR}/wavepacket_python/wavepacket)
set_target_properties(wavepacket_python PROPERTIES INSTALL_RPATH $ORIGIN/../..)

install(TARGETS wavepacket_python
     LIBRARY DESTINATION ${pythonInstallDir})
install(FILES __init__.py
     DESTINATION ${pythonInstallDir})

The first configure_file copies the init.py to the binary directory. As a consequence, once I build the Python bindings, I only need to add
${CMAKE_BINARY_DIR}/python to PYTHONPATH and the Python installations can find my wavepacket package just as if it had been installed. This is useful for running the Python tests and demos directly in the binary directory before installation.

When installing the libraries under ${Install}/lib, I decided to put the Python files under ${Install}/lib/wavepacket_python/wavepacket. This way, you always add ${Install}/lib/wavepacket_python to your Python path and can access the "wavepacket" module. Into this directory, I install my Python binding library
and the init.py file.

One last thing to be taken into account is that my Python binding library needs to know the location of the C++ library. This I solved by adding a relative ("origin") rpath that tells the loader where to look for other libraries; this feature should be supported on most modern Unix variants. A Windows build would need some special processing, but building Wavepacket under Windows is nothing a normal user would want anyway.

Now the init.py file is pretty straight-forward:

from .wavepacket_python import *

It just directs Python to the wavepacket_python module (i.e., library) in the current directory (hence the dot) and imports all symbols from there. This nicely decouples the name of the library with the Python bindings (libwavepacket_python.so) from the name of the Python module (encoded in the directory name).

While this solution is still quite some way from perfect, it solves my initial problems. I can now compile and isntall the C++ library and the Python bindings with a single CMake call, and also run all the tests using only CMake.

Posted by 2020-10-08