I have recently rewritten the way how the Python module for Wavepacket is built, and learned a lot about Python along the way. The learning curve was rather steep, and it felt a lot like learning the intricate details of Nuget and .Net assembly loading (this is not a compliment). However, while extension libraries for Python may not be ideally documented, the implementation is pretty robust and neat. So I thought, I'd write up what I tried and found out to assist others in a similar situation.
Side note: I do appreciate feedback.
To briefly summarize my initial situation: I have a C++ library, WavePacket, for which I want to provide an easy-to-install and easy-to-use Python interface using Pybind11.
At the end of the day, Python's approach to loading modules, which may include libraries with native code, is pretty robust. Neglecting topics like shadowing where multiple modules have the same name, the algorithm works roughly like this:
One interesting feature of init.py is that it can load more modules, also relative to the module's path.
On top of all this functionality, however, there is the Python package management system (pip), which provides a simple-to-use repository of common
packages and so on. This part I was actually much less interested in, but it was way easier to find documentation on how to create a Python package then how the whole thing works. So my original approach was to bundle the Python interface in a package.
Python used to have a big blob called setuptools that would magically do all the building, packaging, installation, distribution and so on for you. Using internet sources, I finally cobbled together the following solution:
The good thing was that this solution worked. However, it had some glaring deficits. On one hand, if you wanted to test that your Python interface worked, you had to use CMake/make, then Python setuptools, then CMake again, which is ugly. What was worse, especially when compiling under Windows, there were two different build systems (CMake, setuptools) with different compiler flags and potentially compilers and downright different logic.
So I decided to rewrite this part.
Fortunately, since my first try, the Python world has moved on. Basically, the moloch setuptools has withdrawn to provide only framework functionality for the packaging. Other tools have sprung up to provide the actual frontend that deals with, for example, the compilation of a Python extension library.
An interesting approach is used by scikit-build. The basic idea here is that you already have a library that is built with CMake, and add the Python bindings there. Scikit-build then drives the CMake build, which includes the Python bindings, takes the artefacts that come out, and wraps them in a Python package. While this sounds already great, scikit-build goes further. It provides some CMake extensions (/functions) for building and linking against a given Python
installation, and, when calling CMake, sets the module path so that these extensions are easily found.
With such nice tooling at my side, my idea was to remove the split between the Python bindings and the C++ library. If the CMake build was driven by scikit-build, I would directly compile the Python bindings together with the actual wavepacket code into a single library, and have it packaged and installed. So I would have a Python setup script on top of my CMake build that would drive the compilation if you want to get out a Python package.
Alas, while trying, I discovered some stepping stones that eventually made me give up this approach:
So while there was no real unsurmountable obstacle, things became annoying enough that I was looking for alternatives. Mind you, if you keep these things in mind, scikit-build does look like a viable alternative.
However, while playing around with scikit-build, I eventually understood enough of Python's internals to be able to build my own module by hand. Furthermore, pondering the problem, it occured to me that I do not want to provide a Python package at all. Wavepacket has some pretty non-standard dependencies (namely the underlying tensor library) that are not easily included in a distribution. So it seems more honest not to build a Python package, but only a module that can be loaded from Python.