Menu

Building a Python interface to a CMake library

I have recently rewritten the way how the Python module for Wavepacket is built, and learned a lot about Python along the way. The learning curve was rather steep, and it felt a lot like learning the intricate details of Nuget and .Net assembly loading (this is not a compliment). However, while extension libraries for Python may not be ideally documented, the implementation is pretty robust and neat. So I thought, I'd write up what I tried and found out to assist others in a similar situation.

Side note: I do appreciate feedback.

To briefly summarize my initial situation: I have a C++ library, WavePacket, for which I want to provide an easy-to-install and easy-to-use Python interface using Pybind11.

Things start out simple ...

At the end of the day, Python's approach to loading modules, which may include libraries with native code, is pretty robust. Neglecting topics like shadowing where multiple modules have the same name, the algorithm works roughly like this:

  • Python has search paths where it looks for modules. You can extend these paths either by setting the environment variable PYTHONPATH to a colon-separated list of directories, or within Python by adding a path string to the list in sys.path.
  • When you import a module, Python looks in every search path directory. It searches for either a library with the module name ("lib<module>.so" with various variations under Unix, "<module>.pyd" under Windows) or a subdirectory with the module name.</module></module>
  • When it finds a library, it looks for an exported symbol "PyInit_<module>" with a certain signature, which then tells Python about the functions, classes etc. that it exports.. Such a library can be constructed relatively easily for example with the help of pybind11.</module>
  • If a subdirectory with the name of the module exists, it has to contain a file init.py, which is evaluated on loading the module.

One interesting feature of init.py is that it can load more modules, also relative to the module's path.

...but they can get pretty complicated

On top of all this functionality, however, there is the Python package management system (pip), which provides a simple-to-use repository of common
packages and so on. This part I was actually much less interested in, but it was way easier to find documentation on how to create a Python package then how the whole thing works. So my original approach was to bundle the Python interface in a package.

Python used to have a big blob called setuptools that would magically do all the building, packaging, installation, distribution and so on for you. Using internet sources, I finally cobbled together the following solution:

  • You build and install Wavepacket using CMake with whatever backend (e.g., make).
  • As a side effect of the build, my CMake scripts output a setup.py file for the Python compilation in the build directory.
  • You run this setup.py script to build and install the Python bindings
  • Afterwards, you can run the various Python integration tests in the build directory using CMake tools again (ctest).

The good thing was that this solution worked. However, it had some glaring deficits. On one hand, if you wanted to test that your Python interface worked, you had to use CMake/make, then Python setuptools, then CMake again, which is ugly. What was worse, especially when compiling under Windows, there were two different build systems (CMake, setuptools) with different compiler flags and potentially compilers and downright different logic.

So I decided to rewrite this part.

Scikit

Fortunately, since my first try, the Python world has moved on. Basically, the moloch setuptools has withdrawn to provide only framework functionality for the packaging. Other tools have sprung up to provide the actual frontend that deals with, for example, the compilation of a Python extension library.

An interesting approach is used by scikit-build. The basic idea here is that you already have a library that is built with CMake, and add the Python bindings there. Scikit-build then drives the CMake build, which includes the Python bindings, takes the artefacts that come out, and wraps them in a Python package. While this sounds already great, scikit-build goes further. It provides some CMake extensions (/functions) for building and linking against a given Python
installation, and, when calling CMake, sets the module path so that these extensions are easily found.

With such nice tooling at my side, my idea was to remove the split between the Python bindings and the C++ library. If the CMake build was driven by scikit-build, I would directly compile the Python bindings together with the actual wavepacket code into a single library, and have it packaged and installed. So I would have a Python setup script on top of my CMake build that would drive the compilation if you want to get out a Python package.

Alas, while trying, I discovered some stepping stones that eventually made me give up this approach:

  • First of all, the documentation does not cover all the functionality, so when you have some edge cases, you are left with trying out things or reading the code.
  • One of the issues I learned by trial and error is that scikit-build expectes a very particular directory layout. When you want to install a module named "wavepacket", you must have certain data (such as an init.py file) in a directory called "wavepacket", otherwise things did not turn out well by default. In other words, follow the published examples to the letter.
  • Another issue were the CMake extensions themselves. To use the full functionality, I had to compile the Wavepacket library as a module (CMake terminology for a library that is loaded at run-time). But CMake correctly prohibits linking against modules, so for example all my C++ unit tests would have to be disabled as well when compiling for Python. Also, CMake supports two different notations for declaring link dependencies, but you may only pick one per target. Scikit-build uses the legacy notation, but I prefer the modern one for boost dependencies, which also clashes then.
  • setuptools originally had some functionality to run unit tests, which is not present in scikit-build. While I fully understand and support the motivation (different concerns should be addressed by different tools), I need yet another tool (e.g., tox) besides CMake and scikit-build for driving my Python tests.

So while there was no real unsurmountable obstacle, things became annoying enough that I was looking for alternatives. Mind you, if you keep these things in mind, scikit-build does look like a viable alternative.

However, while playing around with scikit-build, I eventually understood enough of Python's internals to be able to build my own module by hand. Furthermore, pondering the problem, it occured to me that I do not want to provide a Python package at all. Wavepacket has some pretty non-standard dependencies (namely the underlying tensor library) that are not easily included in a distribution. So it seems more honest not to build a Python package, but only a module that can be loaded from Python.

Posted by Ulf Lorenz 2020-10-08 | Draft

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.