I have recently rewritten the way how the Python module for Wavepacket is built, and learned a lot about Python along the way. The learning curve was rather steep, and it felt a lot like learning the intricate details of Nuget and .Net assembly loading (this is not a compliment). However, while extension libraries for Python may not be ideally documented, the implementation is pretty robust and neat. So I thought, I'd write up what I tried and found out to assist others in a similar situation.
Side note: I do appreciate feedback.
To briefly summarize my initial situation: I have a C++ library, WavePacket, for which I want to provide an easy-to-install and easy-to-use Python interface using Pybind11.
At the end of the day, Python's approach to loading modules, which may include libraries with native code, is pretty robust. Neglecting topics like shadowing where multiple modules have the same name, the algorithm works roughly like this:
One interesting feature of init.py is that it can load more modules, also relative to the module's path.
On top of all this functionality, however, there is the Python package management system (pip), which provides a simple-to-use repository of common
packages and so on. This part I was actually much less interested in, but it was way easier to find documentation on how to create a Python package then how the whole thing works. So my original approach was to bundle the Python interface in a package.
Python used to have a big blob called setuptools that would magically do all the building, packaging, installation, distribution and so on for you. Using internet sources, I finally cobbled together the following solution:
The good thing was that this solution worked. However, it had some glaring deficits. On one hand, if you wanted to test that your Python interface worked, you had to use CMake/make, then Python setuptools, then CMake again, which is ugly. What was worse, especially when compiling under Windows, there were two different build systems (CMake, setuptools) with different compiler flags and potentially compilers and downright different logic.
So I decided to rewrite this part.
Fortunately, since my first try, the Python world has moved on. Basically, the moloch setuptools has withdrawn to provide only framework functionality for the packaging. Other tools have sprung up to provide the actual frontend that deals with, for example, the compilation of a Python extension library.
An interesting approach is used by scikit-build. The basic idea here is that you already have a library that is built with CMake, and add the Python bindings there. Scikit-build then drives the CMake build, which includes the Python bindings, takes the artefacts that come out, and wraps them in a Python package. While this sounds already great, scikit-build goes further. It provides some CMake extensions (/functions) for building and linking against a given Python
installation, and, when calling CMake, sets the module path so that these extensions are easily found.
With such nice tooling at my side, my idea was to remove the split between the Python bindings and the C++ library. If the CMake build was driven by scikit-build, I would directly compile the Python bindings together with the actual wavepacket code into a single library, and have it packaged and installed. So I would have a Python setup script on top of my CMake build that would drive the compilation if you want to get out a Python package.
Alas, while trying, I discovered some stepping stones that eventually made me give up this approach:
So while there was no real unsurmountable obstacle, things became annoying enough that I was looking for alternatives. Mind you, if you keep these things in mind, scikit-build does look like a viable alternative.
However, while playing around with scikit-build, I eventually understood enough of Python's internals to be able to build my own module by hand. Furthermore, pondering the problem, it occured to me that I do not want to provide a Python package at all. Wavepacket has some pretty non-standard dependencies (namely the underlying tensor library) that are not easily included in a distribution. So it seems more honest not to build a Python package, but only a module that can be loaded from Python.
And that is where my final concept ended up: I use only a CMake build, which creates the Python bindings as a byproduct. The drawback is that this can
become tricky for edge cases (multiple Python installations and such), but CMake has facilities to help there.
As a quick run-down: My top-level CMakeLists.txt offers an option to build the Python bindings and searches for Python
# ... option(WP_BUILD_PYTHON "Build the Python interface" ON) # ... if (WP_BUILD_PYTHON) find_package(Python3 COMPONENTS Interpreter Development) if (NOT Python3_FOUND) message(ERROR "Need Python3 interpreter and development environment to compile Python module") endif() endif() #... add_subdirectory(src) if (WP_BUILD_PYTHON) add_subdirectory(python) endif()
It then descends into two subdirectories: src, which builds the C++ library with the Wavepacket functionality, and python, which builds the Python bindings.
The CMakeLists.txt under python/ does some general setup (https://sourceforge.net/p/wavepacket/cpp/git/ci/master/tree/python/CMakeLists.txt):
Finally, under python/ I create another directory wavepacket/ that will hold my Python bindings. The directory contains the source code for the Python bindings, a CMakeLists.txt to drive the compilation and installation, and an init.py script.
The compilation of the Python bindings is pretty standard now. The rest is a bit particular:
configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/__init__.py" "${CMAKE_CURRENT_BINARY_DIR}/__init__.py" COPYONLY) # Installation rules # Note that we need to link against the wavepacket library, so we need to set an rpath set(pythonInstallDir ${CMAKE_INSTALL_LIBDIR}/wavepacket_python/wavepacket) set_target_properties(wavepacket_python PROPERTIES INSTALL_RPATH $ORIGIN/../..) install(TARGETS wavepacket_python LIBRARY DESTINATION ${pythonInstallDir}) install(FILES __init__.py DESTINATION ${pythonInstallDir})
The first configure_file copies the init.py to the binary directory. As a consequence, once I build the Python bindings, I only need to add
${CMAKE_BINARY_DIR}/python to PYTHONPATH and the Python installations can find my wavepacket package just as if it had been installed. This is useful for running the Python tests and demos directly in the binary directory before installation.
When installing the libraries under ${Install}/lib, I decided to put the Python files under ${Install}/lib/wavepacket_python/wavepacket. This way, you always add ${Install}/lib/wavepacket_python to your Python path and can access the "wavepacket" module. Into this directory, I install my Python binding library
and the init.py file.
One last thing to be taken into account is that my Python binding library needs to know the location of the C++ library. This I solved by adding a relative ("origin") rpath that tells the loader where to look for other libraries; this feature should be supported on most modern Unix variants. A Windows build would need some special processing, but building Wavepacket under Windows is nothing a normal user would want anyway.
Now the init.py file is pretty straight-forward:
from .wavepacket_python import *
It just directs Python to the wavepacket_python module (i.e., library) in the current directory (hence the dot) and imports all symbols from there. This nicely decouples the name of the library with the Python bindings (libwavepacket_python.so) from the name of the Python module (encoded in the directory name).
While this solution is still quite some way from perfect, it solves my initial problems. I can now compile and isntall the C++ library and the Python bindings with a single CMake call, and also run all the tests using only CMake.