Python packaging for developers in a hurry
08 January 2023
Python packaging is hard (I will spare you that xkcd). But it can be manageable if you’ve hit your head on the wall enough times to know what to do and what to avoid.
“Wait a minute, is this guy about to tell me my problems are imaginary and point me to yet another magical tool that does it all?” you ask yourself. No, of course not! That would be psychological abuse.
Sadly, this is a long mountainous road that you’ll mostly have to walk by yourself. I hope that my experience gives you a head start in figuring out what is best for you.
Even though the title says “Python packaging”, I will cover four main topics:
- Managing Python environments
- Managing dependencies
- Structuring a Python project
- Packaging and publishing
If you already have a well-rounded workflow for any of those, feel free to skip some sections (you don’t have a lot of time, after all). I know the entire post is long, but I hope that you can just navigate the sections, find what you need, and go back to work.
If you’re reaaaally in a hurry and just want the code, here’s the repo: giovannipcarvalho/sample-python-project
“Is this for me?”
If you’re overwhelmed by the amount of information on these subjects, this article strives to be a comprehensive summary of most things you’ll need to consider. It also demonstrates a tried and tested workflow that may work well for you.
However, if you have the time and patience, then reading the official resources from PyPA (Python Packaging Authority), the relevant PEPs (Python Enhancement Proposals), and at least one of the many guides on managing Python versions will be a more thorough – albeit longer – approach.
Managing Python environments
In this section I assume we agree that your system’s Python is for your system, not for you. If you rely on it for your project you better be doing something that is fully-compatible with the way your system works.
“I need no dependencies, just a Python interpreter”
Then fine, you can use your system’s pre-installed Python.
If you need different (perhaps multiple) Python versions than what comes with your system, or if need to install new dependencies (which might interfere with your own system’s dependencies and break it) you are better off isolating them with virtual environments.
“My system’s version is okay, but I need to add some dependencies”
Use Python’s builtin
venv to create virtual environments. On some distros it might not come pre-installed so you’ll need to install
python3-venv to get it.
python -m venv .venv # create the environment . .venv/bin/activate # activate it deactivate # deactivate when you're done
On Windows, I recommend you use git-bash (shipped with git-for-windows) if WSL (Windows Subsystem for Linux) is too slow on your machine.
# on git-bash for windows python -m venv .venv # same command . .venv/Scripts/activate # slightly different deactivate # deactivate when you're done
“venv is too slow or too easy to break”
venv greatest advantage is that you don’t need any extra dependencies. If you have problems with it, use
virtualenv instead. More often than not you can just install it from your distro’s repository and be done with.
Once installed, the workflow is pretty much the same:
virtualenv .venv # slightly different than venv . .venv/bin/activate # same activation command as venv deactivate # deactivate when you're done
“I’m on Windows. HELP!”, or
“My system does not have the Python version I want – or any Python version at all”, or
“I need compiled dependencies (or non-Python dependencies) which are hard to install”, or
“I want a consistent workflow across Linux, Windows and macOS”
Short answer: conda (actually mamba, using the mambaforge distribution)
If you’re disappointed with my answer, I am sorry but our paths have diverged and I am no longer a better guide for you than yourself. If you’re not on Windows, or don’t need compiled and other non-Python dependencies, you might still want to check out
Anaconda is an entire Python distribution that comes with MANY packages, including the
conda package and virtual environment manager. I am not recommending Anaconda. Use conda, or rather mamba – the compatible C++ reimplementation that is faster – if you can avoid Anaconda entirely.
The mamba-forge distribution is a lightweight Python distribution that comes with
conda-forge pre-configured as the default repository. Pick the installer from here according to your platform and architecture and follow the instructions for unix-like systems or Windows.
virtualenvdoes support multiple Python versions, you need to get them installed first, as it merely finds and uses them to create new environments, but won’t install them for you.
Without spending too much more time justifying my choice:
- mamba (the package manager) officially supports Linux, Windows and macOS (all of which I need to use)
- the conda-forge repository has multiple Python versions, from Python 2.x (God help you) to the more recent 3.x versions
- it’s the only sane way to install MKL/BLAS-accelerated NumPy and CUDA-accelerated PyTorch/Tensorflow that I know of (Gohlke’s builds are not a thing anymore)
- it allows me to rootlessly install non-Python dependencies, such as
gccand other language’s toolsets (
Here’s how to use it:
mamba create -n myproj # create environment mamba activate myproj # activate it # get to work mamba deactivate # deactivate when you're done
An added bonus is that you can create a
environment.yml at the root of your project and simply run:
mamba env create or
mamba env update.
# file: environment.yml name: myproj # set this to the actual environment name you want to use channels: - conda-forge dependencies: - python=3.10
I recommend that you install from conda-forge only the packages that you cannot obtain from PyPI (the official Python package index). That is because mamba is usually slower than pip in resolving dependencies, and because pip has better ways to separately declare direct and transitive dependencies (conda-lock is not as good, IMO). We’ll get to why this is important in the next section.
If you are not going to need to upgrade dependencies (short-term or a one-off project), it’s better to have your
environment.yml include all your dependencies so that you can reproduce it later, if needed (use conda export for that).
If you are going to maintain the project for long enough and expect to upgrade some dependencies, make your
environment.yml contain your non-pip dependencies (such as Python itself) and manage your pip-installable dependencies with a better tool.
note: I think
virtualenvstill have their place in this setup: for short-lived or disposable environments where you just want make some small tests (e.g. test if a newer version of a library does what you want, without having to upgrade or mess with your current mamba environment for the project), since they will be usually faster than
Troubleshooting your Python environment
Nine out of ten times, you’re using the wrong
python, the wrong
pip, or the wrong environment altogether.
Take note of the output of the following commands:
which python which pip which whichever-other-command-youre-running # e.g. pytest or jupyter
Compare their base paths and identify from which virtual environment they are coming from. Are they different? They shouldn’t be.
Are they the same? Check your
$PATH environment variable for anything Python related.
# use git-bash if on Windows echo $PATH | tr ':' '\n'
A very common one is Windows 10+’s default Python (somewhere over
%APPDATA%/Microsoft/WindowsApps) taking priority over your desired virtual environment. Get rid of it.
I assume you value having a consistent way to reproduce your environment. In essence, you want to:
- Not be susceptible to hard-to-track bugs that only happen with some mysterious combination of dependencies
- i.e. ensure you’re running your application with dependencies you have tested against
- Track your primary dependencies and transitive dependencies (the dependencies of your dependencies)
- Easily and selectively upgrade dependencies, rather than always upgrading all at once
Poetry actually does it all and a bit more (including automatically creating virtual environments for you), but then again, you miss out on packages that are only (or more easily) installable from conda repositories, and in my experience it is very fragile on Windows.
important: If you’re going with Poetry, remember it should live outside your environment – i.e. do not include it in your
environment.yml, if you plan to use it along with mamba.
I won’t go in details into the other disadvantages of poetry, but I will just emphasize that it has way too many dependencies (it currently pulls in a total 44 dependencies, according to my test in a clean environment).
“What should I use?”
If you’re developing a tool or application, use pip-tools.
If you’re developing a library, use nothing. Let your users decide, and give them as much flexibility as possible to maximize compatibility (don’t pin, unless to exclude some known-to-fail version ranges). In other words, only declare your direct dependencies under
install_requires in your
setup.cfg and let pip do the rest.
pip-tools is actually a combination of two tools:
pip-sync. It is very lightweight and only pulls in 6 dependencies (tested in a clean environment). It also produces reasonably human-readable
requirements.txt files, with pinned dependencies followed by a comment showing their parent package:
asgiref==3.2.3 # via django django==3.0.3 # via -r requirements.in
Which makes identifying why some dependency was installed very easy compared to poetry’s lock file.
Here are the steps:
echo einops >> requirements.in # declares a new dependency # generate or update requirements.txt with pinned versions # by default, it pins them to the latest available and compatible versions pip-compile # at a later point in time, when you need to selectively upgrade a package pip-compile --upgrade-package einops # will upgrade the package to the latest available and compatible version # sync environment with requirements.txt pip-sync requirements.txt
- Keep your non-PyPI dependencies in
- Keep your PyPI dependencies in a
pip-toolsto lock your dependencies and sync your environment
- Check all of them into your version control system (
I have only scraped the surface of what pip-tools is capable of and I highly recommend you to read their documentation.
Structuring a Python project
A well-structured project is easier to maintain, package and publish/deploy. There are many resources on this subject, but here’s a simple layout that works well for me:
$ tree -F --dirsfirst ./ ├── src/ │ └── __init__.py ├── tests/ │ └── __init__.py ├── environment.yml ├── README.md ├── setup.cfg └── setup.py
I write all my imports as absolute imports:
from src.subpackage.module import something
unless I’m exposing something in a
__init__.py for nicer imports:
# file: src/subpkg/__init__.py from ._internal_subpkg_module import something_else __all__ = ["something_else"] # so that I can: # from src.subpkg import something_else # instead of: # from src.subpkg._internal_subpkg_module import something_else
Note that the
src folder is the actual package. Rename it to something meaningful if you’re developing a library because that’s what your users are going to write in their imports, no matter what you say the attribute
name is in
pyproject.toml (actually, you can change the import name – but I find it more error-prone than just using a proper folder name).
I usually don’t bother doing it for my applications, but do it for my libraries (otherwise all my libraries would be imported with a conflicting
That’s because I’d rather write
import src.whatever than
import some_longer_name.whatever – especially since
some_longer_name varies per-project.
You can use a meaningful name for both applications and libraries (and probably should, as it’s more easily identifiable by setuptool’s auto-discover – more on this later).
There’s also some recommendations on following a
“src layout”, which is basically having a meaningful name and stuffing it inside a folder named
I also don’t bother, but if you need a quick overview, there’s a very good and short video by Anthony Sottile on the subject, so that at least you’re making an informed decision about it.
I also like
tests as a separate package, which makes it easier to not accidentally include them when packaging a final source or wheel distribution.
Moving along. We’ve already covered what the contents of
environment.yml should look like. But what about
setup.cfg, and why both of them? And why not
To get the questions out of the way:
- First of all,
pip-toolssupports all of them; you’re free to choose whichever you prefer.
- I like
setuptoolsand it works fine for me. It does not support
pyproject.tomlyet and I don’t want to use
poetryor whatever else supports it (no real reason, just preference), so I use the supported
inifile with plain data, which is easier to parse and manipulate;
setup.pyis code, and may contain complex logic that is not easy to update programmatically
- That said, you’ll still need a dummy
setup.pyfile, because the
setup.cfgalone is meaningless, unless you’re already using a PEP 517-style build with
- That said, you’ll still need a dummy
- If you’re doing anything more complicated (such as compiling non-Python dependencies of your own as part of your package), then you’ll need to stick with just
# file: setup.cfg [metadata] name = mypkg version = 0.1.0 [options] packages = find: install_requires = flask [options.extras_require] dev = pytest coverage [options.packages.find] exclude = tests*
- Declare your dependencies under
- Declare development-only dependencies under
- You may create additional dependency groups. Just
devis enough for me.
- You may create additional dependency groups. Just
- Let pip-tools do the pinning, unless you want to restrict some known-to-fail versions
You can use
find_namespace: instead of
find:, if you don’t want to add multiple
__init__.py to explicitly transform folders into Python packages. I prefer to make packages explicitly.
You may also entirely omit the attribute
packages and the section
[options.packages.find] if you use a properly-named flat-layout package (not named
src) or a src-layout (with properly named packages inside an
src folder) and setuptool’s auto-discover feature will handle those for you. Beware that this feature is in beta (at the time of writing) and may subject to change in the future.
And the dummy
# file: setup.py from setuptools import setup setup()
Now that you’re up to speed with this declarative way of defining your package, let’s see how to use them in combination with
pip-compile setup.cfg --resolver backtracking -o requirements.txt pip-compile setup.cfg --resolver backtracking -o requirements-dev.txt --extra dev
This generates two lock-files:
requirements-dev.txt (both of which should be checked into version control).
“Wait! What are those new options?”
I just want to be explicit about where to get the dependency list from (
setup.cfg), which dependency group to consider (base dependencies from
install_requires, or the
dev group) and where to save it (
backtracking resolver is slower, but will resolve in some situations where
legacy (current default) will not. It’s still not default in
pip-tools at the time of writing this, but will be eventually.
“But how do I single-source my package version?”
Good thing you asked. I have dabbled with
setuptools_scm to extract the package version directly from source control (usually git tags), but eventually settled on just having a single location and source of truth for the package version:
setup.cfg. This minimizes the chances of forgetting to update some locations, and also makes it easier to do so automatically with a tool such as bump2version and bumpver (I just do it manually – it’s a single location, after all – but you’re free to try them out and see if they work for you).
Just remember to:
- Only set the version number in
importlib.metadatato fetch the version from the package name (Python 3.8+ only)
# file: mypkg/__init__.py import importlib.metadata __version__ = importlib.metadata.version("mypkg") # "mypkg" above must be the same as metadata.name in setup.cfg
Packaging and Publishing
This process is different for libraries and applications. Whereas with libraries you want to package source and wheel distributions for publishing in a package index (a public one such as PyPI, or a private one under your control), with applications it depends on what kind of application and where it’s going to run.
My most common use case is packaging stateless Python applications as Docker images to be run in a remote host, exposing some functionality via an HTTP API. Docker images themselves are also usually published to a container registry, but that’s well-covered by better resources and outside the scope of this post.
If you need to package a desktop or mobile Python application, the remainder of this article is of no good use for you.
If you’re using version control (you should), remember to tag your versions. For example, in git you can:
# create a git tag for the current version, prefixed by `v` git -a v`python setup.py --version`
This command will open
$EDITOR, where you should include a brief title and description of your release. Upon saving, the git tag will be created.
Packaging & publishing Python Libraries
Fortunately, you don’t need a lot here. All you need is
twine, respectively, to build your source and wheel distributions, and to publish your package. You don’t really need
build, but it helps you get around many common mistakes.
pip install --upgrade build twine
Now is also probably a good time to add both
twine to your
extras_require under the
dev dependency group. Then it’s just:
# build python -m build # generates: # mypkg-version.tar.gz (source distribution) # mypkg-version-py3-none-any.whl (wheel distribution) # publish python -m twine upload dist/mypkg-version* # upload both sdist and wheel of `version` # use pypi user:pass or __token__:token
The published name in PyPI will be a normalized version of your
name attribute in
“But wait! I want reproducible builds”
You can get that for wheels by setting
SOURCE_DATE_EPOCH=0 python -m build # other sources of non-determinism might affect your build's reproducibility
You can check that the
md5sum of the generated wheel does not change.
I am not sure if it’s possible to get reproducible source distributions, but I haven’t looked too hard.
Packaging & publishing Python Applications (Docker)
Here I aim for a reasonably lean image (~150MB for a simple Flask app; not great, not too bad either), that is fast to build and cache-friendly. We want fast iteration times, and not having to rebuild the entire virutal environment layer every time a line of code is changed (even if no dependencies were added or removed) is crucial.
This is achieved by separating the virtual environment creation and copying the application code. Shipping the application source and building it in-place (with
pip install --use-pep517) is less ideal than building the wheel in a build stage and then copying it over to the final runtime stage. But in my use case it’s often faster and simpler to do it this way, and for me there are no major drawbacks.
Using a virtual environment inside a Docker container is perhaps over-the-top, but it provides extra isolation from the base image’s own Python dependencies, and the entire
/venv folder can be copied over between stages if you need.
Remember that if your project contains dependencies from conda’s repositories, you’ll need to create a conda environment instead of a regular Python environment using venv or virtualenv. Using the continuumio/miniconda3 base image will get you rolling much faster than setting it all up by yourself. You might still want to install
mamba to improve build times if you have many conda-only dependencies.
# file: Dockerfile # --- base image ----------------------------------------------------------------------- FROM python:3.10-slim-bullseye as base ENV \ PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 \ PYTHONFAULTHANDLER=1 \ PYTHONHASHSEED=random \ PIP_DEFAULT_TIMEOUT=100 \ PIP_DISABLE_PIP_VERSION_CHECK=1 \ PIP_NO_CACHE_DIR=1 # venv RUN python -m venv /venv ENV PATH=/venv/bin:$PATH # base dependencies COPY requirements.txt . RUN pip install -r requirements.txt RUN pip install build WORKDIR /src # --- dev stage ------------------------------------------------------------------------ FROM base as dev # dev-only dependencies COPY requirements-dev.txt . RUN pip install -r requirements-dev.txt COPY myapp /src/myapp COPY setup.* /src/ RUN pip install . --use-pep517 --no-deps # --- runtime stage -------------------------------------------------------------------- FROM base as runtime COPY myapp /src/myapp COPY setup.* /src/ RUN pip install . --use-pep517 --no-deps CMD ["python", "-m", "myapp"]
Which you can test with:
docker build . -t $(basename `pwd`) docker run --rm -it $(basename `pwd`)
Remember to update the
- Use the correct Python version for your project as a base image
- Update the copied paths to reflect your actual package name (rather than
Also remember to create a
.dockerignore to stricten the directories considered in the build context, which should also improve build times:
# file: .dockerignore # vcs .git/ # python .venv/ *.pyc *.egg-info/ .coverage .mypy_cache/ .pytest_cache/ # other notebooks/ tmp/ .env
And that’s a wrap. You’ve climbed halfway through the mountain of Python packaging, and learned to create and manage your environments, properly manage your dependencies, and structure your project in a way that facilitates development, packaging and publishing – either as a library or containerized application.
I originally intended this post to be a quick read, but it turned out to be at least twice as long as I had expected. That might be a testament to how complex is the packaging state for Python currently. Certainly manageable, but requires a lot of digging over the years. At the very least, even though this article is not really a quick read, I wish for it to be a helpful resource that contains one single consistent workflow that covers most of what you’ll need.
Thank you for reading. If something I wrote is inaccurate or you have a better alternative, please do let me know!