python devirtualizer documentation.
Version 0.0.8 2020/07/28
This package supports a single repository of installed
Python3 packages which may safely contain multiple versions
of installed packages. When installed with this tool packages
that require these different versions can run outside of
a virtualenv without conflicts and without having
to explicitly define a PYTHONPATH for each before running it.
Packages installed with this tool will share a single copy of each
installed package version.
Terminology used in this package:
pkg A package, like setuptools, or an end user program
like johnnydep.
pkgver A package-version combination, like pyFoo-1.2.3.
target Directory for an installed package which has been
converted from a virtualenv.
wrapper A small C program which replaces "bin" level python scripts,
sets up the information they need to run, and then
calls the original (which has been renamed from "script"
to "script._wrapped".)
dd__ Replaces "../" when a file name must represent a collapsed
relative path. Ie "../foo" becomes "dd__foo".
Used in Maps files.
__ Replaces "/" when a filename must represent a collapsed
absolute path. Ie "/tmp/foo" becomes "__tmp__foo".
Used in Deps file names.
Directory organization of shared data:
The top level directory is referenced at all times by
symbol $PDV_ROOT which MUST be defined for pdvctrl to function.
($PDV_ROOT2 has the same directory structure and is used when
python2 modules are to be handled instead.)
It contains these subdirectories:
site-packages Unused, reserved for possible future use.
site-pkgvers All installed package-version combinations.
Envs Temporary virtualenv's used during installation
Deps Contains files whose name is an installed target
directory, modified so that "/" in its path is
converted to "__". Inside the file is a list
of pkgvers that this target uses, one per line.
Maps One file for each pkgver containing the map of
its site-packages and other entries to the actual
location of the single shared copy. Like
$PDV_ROOT/site-pkgvers/networkx-2.4/networkx-2.4.dist-info networkx-2.4.dist-info
This is used during preinstallation to place these links in
the virtualenv before the installation.
Entries in ../../bin and elsewhere are also noted.
On preinstallation all entries are linked except those
in the bin directory, which are copied.
Testing:
echo $PWD
#should appear in first line of report output where <PWD_VALUE>
#is shown below.
ln -s example_wrapper.h wrapper.h
gcc -std=c17 -Wall -Wextra -I. -o report._wrapper report.c
gcc -std=c17 -Wall -Wextra -pedantic \
-D_XOPEN_SOURCE=700 -I. -I/usr/include/SDL2 \
-o report._wrapper wrapper.c -lSDL2
./report
#should emit the following:
0 <PWD_VALUE>/report._wrapped
0 PYTHONPATH=/lib/python3.6/site-packages
1 ALT_PYTHONPATH=/home/common/lib/python3.6/site-packages
2 my_database=/share/my_database
3 python3=/usr/bin/python3
4 python=/usr/bin/python3
#REMOVE THE LINK!!!!
/bin/rm wrapper.h
Installation:
0. Note, all locations are suggestions only. If Lmod is
used let the path to its "software" directory be LMOD_SOFT.
The directories for PDV_TOOLS, PDV_ROOT, drm_tools, and
all other target locations should be subdirectories of LMOD_SOFT.
1. Install drm_tools from sourceforge (for "extract").
If Lmod is used make a module for it.
2. Install gcc if not already present. If gcc is not part
of the OS and Lmod is used make a module for it.
3. Install libSDl2 if not already present. (Typically it
is available as a package in the OS.) Adjust SDLINC and
SDLLIB in pdvctrl to match your machine.
4. Unpack the python_devirtualizer distribution in a
directory PDV_TOOLS.
Decide where to put PDV_ROOT (and PDV_ROOT2 if python2
is also to be used.)
If Lmod is used the module for it should have dependencies
for drm_tools and gcc (if those are also
modules.) (Once it is installed johnnydep will also be
a dependency.) The PDV module should also define PDV_ROOT.
To run the python devirtualizer with Lmod do:
% module load python_devirtualizer
Otherwise, do:
% export PATH=$PATH:$PDV_TOOLS
% export PDV_ROOT=/usr/local/lib/python3.6
% #optionally
% #export PDV_ROOT2=/usr/local/lib/python2.7
5. Install johnnydep and place it on PATH. The
location shown is just a suggestion.
% pdvctrl install johnnydep
% pdvctrl migrate johnnydep /usr/local/etc/pdv/johnnydep
If Lmod is used add johnnydep as a dependency for
the python_devirtualizer module. Ie, in its lua
file:
if not ( isloaded("johnnydep") ) then
load("johnnydep")
end
Installing other packages (examples):
1. Set up PDV_ROOT and the final value of PATH as above,
either with:
% module load python_devirtualizer
or with
% export PATH=$PATH:$PDV_TOOLS
% export PDV_ROOT=/usr/local/lib/python3.6
2A. Install all dependencies and the package itself
into a virtualenv, then migrate it to its final
location.
% pdvctrl install scanpy
% pdvctrl migrate scanpy /usr/local/etc/pdv/scanpy
2B. Look up all dependencies and make links to existing
ones in the virtualenv. Then install everything else,
then migrate.
% pdvctrl preinstall scanpy
% pdvctrl install scanpy
% pdvctrl migrate scanpy /usr/local/etc/pdv/scanpy
2C. Partial manual install. Use this for a package which
cannot be installed using pip.
% package=busco
% pversion=4.1.0
% pdvctrl install busco #just creates a venv for it
% TOPDIR=/usr/common/modules/el8/x86_64/software/${package}/${pversion}-CentOS-vanilla
% TMPTOP=$PDV_ROOT/Envs/${package}
% module load python_devirtualizer
% cd $TMPTOP
% source bin/activate
#requires biopython and sepp, but does not install them itself
% python -m pip install biopython sepp
% wget https://gitlab.com/ezlab/busco/-/archive/${pversion}/${package}-${pversion}.tar.gz
% gunzip -c ${package}-${pversion}.tar.gz | tar -xf -
% /bin/rm ${package}-${pversion}.tar.gz
% cd ${package}-${pversion}
% #many dependencies, as Lmod modules
% module load prodigal hmmer sepp R augustus
% python3 setup.py install --install-scripts=$TMPTOP/bin --prefix $TMPTOP 2>&1 \
% | tee install_2020_07_07.log
% cd ..
% bin/busco --help #runs (in virtualenv)
% deactivate
% cd /tmp
% pdvctrl migrate busco $TOPDIR
#requires a config file be created
% cd $TOPDIR/${package}-${pversion}
% scripts/busco_configurator.py config/config.ini ../config.ini
#Not shown: data file downloads and other configuration
#needed for this package.
Files in this package:
example_wrapper.h An example wrapper.h file.
pdvctrl Bash script which implements most of the
python devirtualizer functions. PDV_ROOT
must be defined and there must be a python3
on PATH.
pdv2ctrl Wrapper for pdvctrl which instructs it to use
python2 instead of python3.
PDV_ROOT2 must be defined and there must be
a python2 on PATH.
README.TXT This file
report._wrapper Report argv and environment values, small
test program.
report.c Source for report._wrapper
wrapper.c Source for program which will call
the python scripts in "bin". If the
wrapped script is "example" that is
renamed to "example._wrapped". This
program is then compiled to produce
a binary file "example". During
compilation it obtains some values from
a specially generated wrapper.h file
to pass on to the script, including
PYTHONPATH. For more details read the
top of wrapper.c and the bottom of function
do_migrate in pdvctrl.
File versions:
pdv2ctrl 0.0.1 28-JUL-2020
pdvctrl 0.0.10 17-JUL-2020
README.TXT x.x.x 17-JUL-2020
report._wrapped 0.0.1 30-JUN-2020
report.c 0.0.1 30-JUN-2020
wrapper.c 0.0.2 07-JUL-2020
pdvctrl functions:
countdepends [package|file]
Count all target directories which depend on package.
Must include version number, as in: pkg-1.2.3
Case invariant but otherwise only exact matches.
help
Print help.
findorphans
Identify orphaned pkgver, Deps, and Maps files.
install package
Create $PDV_ROOT/Envs/package if it does not
already exist.
Install package into $PDV_ROOT/Envs/package.
Install any remaining dependencies left over
from a previous preinstall.
If "package" cannot be found by "pip" the
resulting virtualenv contains only pip, setuptools,
and wheel.
install package "pkg1 pkg2 pkg3"
Install the space delimited set of packages into
existing virtualenv package.
list
List all installed package versions.
list package
List all installed package versions starting
with "package".
mkpath "pkg1 pkg2 pkg3"
Emit a PYTHONPATH with one entry for each space
delimited package.
This capability is intended primarily for testing.
Packages which are to be installed and migrated
should instead be built with these packages.
mkvenv package
Create a venv for package without using johnnydep
to determine and load prereqs.
The resulting virtualenv contains only pip, setuptools,
and wheel.
Used for manual installs within the virtualenv.
move src_dir dst_dir
Move the ROOT level of a normal environment from one
location to another.
Updates $PDV_ROOT/Deps.
migrate package target
Convert a virtualenv to a normal environment, creating
a new target directory.
Migrate dependencies from package to $PDV_ROOT/site-pkgvers
replacing them with links.
List the links in $PDV_ROOT/Maps for each pkgver.
List the pkgvers needed by target in $PDV_ROOT/Deps/target
overlay package target
Like migrate, but overlay the files onto an existing target directory.
Only a single overlay may be applied to a target, usually
for the purpose of supplying a site-packages directory.
Migrate dependencies from package to $PDV_ROOT/site-pkgvers
replacing them with links.
List the links in $PDV_ROOT/Maps for each pkgver.
List the pkgvers needed by target in $PDV_ROOT/Deps/target
preinstall package
Create $PDV_ROOT/Envs/package.
Link existing dependencies for package already in
$PDV_ROOT/site-pkgvers into $PDV_ROOT/Envs/package
using the info in $PDV_ROOT/Maps.
If package has previously been installed and
migrated this should fully install it from the
saved pieces.
remove target
Remove a normal environment.
Updates $PDV_ROOT/Deps.
Currently does NOT remove unused pkgvers in site-pkgvers
or Maps.
reshebang target
Replace the shebang (a first line starting with "#!")
of all python scripts in directory target.
The new shebang uses the python3 defined for $PDV_DOOT
unless the existing shebang contains an explicit "python2",
in which case the system python2 is used.
If target ends with "..." then remove those characters
to define the real target directory, and also scan and
process it and all of its subdirectories, not following
any symbolic links.
reshebang target "shebang_contents"
Like the preceding but append the supplied "shebang_contents"
string to "#!" to form the new shebang. If it contains spaces
it must be quoted as shown. It must start with an
executable file or it will be rejected.
rmoverlay target
Like remove, but only remove the overlay from target.
rmvenv package
Remove a venv. Normally used when an installation
has failed and a clean start is needed.
version
Print version of pdvctrl
whatdepends [package|file]
List all target directories which depend on package.
Must include version number, as in: pkg-1.2.3
Case invariant but otherwise only exact matches.
Project status:
This is early days, mostly a proof of concept. Ideally python would
improve its import methodology so that multiple pkgvers could reside
in site-packages and could load without conflict. To do that the use
of nonversioned and alternate names placed in the site-packages directory
would have to be eliminated, no more:
apkg
apkg-1.2.3.dist-info
because that cannot coexist with
apkg
apkg-1.2.4.dist-info
The directory structure used in site-pkgvers here avoids
this issue, by placing all of pkgver's files in a "pkgver"
directory.
Similarly, many packages use the common __pycache__ which would
again result in conflicts for compiled forms of
different versions of the same package. Also possibly name collisions
between packages. This is why __pycache__ is special
cased in the present code, leaving the single directory
in each target's site-packages directory. Placing a separate
__pycache__ under each pkgver might work but Python would then
have to know to scan through all of those directories.
Targets should not be updated by setting PYTHONPATH and using
pip (for intance). It has not been tried but it will probably corrupt
the data structure. Partially to avoid that "pip" is removed from
all migrated targets.
The mechanism used by wrapper.c to determine the location of
its binary is as portable as the SDL2 library, but the current
wrapper.c assumes linux (POSIX) path syntax, which will not
be correct on Windows except in posixy subsystems.
This code has only been tested on CentOS 8.
Changes
0.0.8 2020/07/28
added pdv2ctrl, support for python2.
0.0.7 2020/07/17
pdvctrl: added reshebang
0.0.6 2020/07/17
pdvctrl: added overlay rmoverlay
0.0.5 2020/07/08
pdvctrl: added mkvenv rmvenv mkpath
pdvctrl: added install package "pkg1 pkg2 pkg3" (into virtualenv)
pdvctrl: slight code rearrangement
0.0.4 2020/07/07
pdvctrl: force migrated bin scripts to have a final EOL character.
pdvctrl: handle johnnydep errors
Update documentation
0.0.3 2020/07/06
wrapper.c: removed all references to argv[0] in wrapper.c
pdvctrl: check that parent of TARGET exists.
pdvctrl: update pip, setuptools, and install wheel on every
virtualenv.
0.0.01 2020/07/02
Initial release
David Mathog