Sin number 6: Packaging not done right
Packaging Python projects properly can be truly a pain in the neck. However learning how to properly set up a package is extremely valuable because adherence to standards means more people can use your package efficiently.
👉 The biggest sin of all things package-related is storing Python projects as simple collections of directories.
Whence the temptation?
Python is a simple scripting lanugage so the natural tendency newcomers have is to simply create folders with a bunch of scripts and then run those folders one by one. They tend to skip thinking about how pip install
is able to seemingly magically conjure foreign code out of nothingness and when they start reading on it, they claim it’s too much work to set it up properly.
I used to be one of those people when I started out with Python. Today, I tend to freak out massively if I see Python projects that are not set up as any formal package and have some strange custom installation patterns.
Why is this a sin?
This is predominantly sin of hubris and laziness combined, we all succumb to these sometimes. Hubris, because people often think they can come up with simpler and better packaging solutions than the entire community of Python developers combined. Laziness, because often instead of reading up on the standards, they take the path of the least resistance which often ends in extremely poor scalability.
Not to mention the confusion other Python developers familiar with the existing toolchains will experience when looking at a codebase with a bespoke packaging solution.
How to recognize a sinner?
The most common symptoms are easy to spot:
- No
setup.py
at the Python project root - Inability to
pip install
the package - Strange instructions on how to set up and use the project (which often boil down to run
main.py
or something akin) - Profuse usage of quasi-anti-patterns like
sys.path.append
How to repent?
Your only way to redemption is to structure your Python package properly, i.e. you need to learn a bit about what they are and how to make them. It is actually quite simple and I’ll attempt to explain it in an easy-to-understand form.
A Python package is a collection of modules defined in *.py
files structured in such a way that lets pip
package manager to install this collection and be globally used in any other script that uses a particular Python intepreter.
đź‘€ Modules are Python files and directories of Python files that contain a special
__init__.py
file that makes it possible toimport
them. Packages are just collections of modules.
There are a few important peculiarities about Python packages:
- A Python package is not equivalent to a single Git repo -> a Git repo can contain multiple Python packages at once
- A Python package must contain an
__init__.py
file in each directory to be treated as a package - A Python pacakge must have a
setup.py
file at the root of the repository, it might contain an explicit map to Python package directory names or it might use thefind_packages()
function which automatically looks for packages in subdirectories
Packages are described in detail in Python documentation.
Package structure
If your package has a structure:
X
|
|-> Y
|-> Z
|-> setup.py
… the Y
and Z
will be treated as packages, not X
. If you want X
as a package the conventional pattern is to do:
X
|
|-> X
| |-> Y
| |-> Z
|
|-> setup.py
âš Git Repo != Python Package. A single Git repo according to Python standards could contain multiple Python packages.
Name conflicts resulting from improper structure
Suppose you have designed two packages that look like this:
X
|
|-> Y
|-> Z
|-> setup.py
W
|
|-> Y
|-> Z
|-> setup.py
The way Python installs packages is by running setup.py
and in setup.py
you’ve usually got a find_packages()
function call. If it finds any existing package named Y
it will uninstall it and install the one that you just told it to install. So e.g. if you do pip install X
you would actually be adding two packages named Y
and Z
to the list of installed packages. If you were to then do pip install W
you would get a name conflict because Y
and Z
refer to sth that came from X
and pip
would resolve the conflict by removing what was previously installed and installing the new thing.
Please don’t use plain folders
If you were to say let’s not use a Python package, just use folders. What will happen then is in every script you will have to modify the import lookup path for Python to be able to find the modules you’re trying to import. So if I want to get e.g. X/Y/modules/somescript.py
from a file that is located in Z
I would need to do:
import sys, os
sys.path.append(os.path.join(os.path.dirname(__file__), "..", "Y"))
sys.path.append(os.path.join(os.path.dirname(__file__), "..", "Y", "modules"))
from Y.modules import somescript
Instead of doing just:
from Y.modules import somescript
đź‘€ That’s because Python uses the
sys.path
list of directories to look up*.py
files and load modules from. By defaultsys.path
contains the current working directory but not the subdirectories, so you would have to add each subdirectory to thesys.path
list were you not to use a standard Python package.
🙏 So please do yourself a favor and follow the standards outlined in PEP documents to not run into such strange problems. Many open-source tools doing stuff with Python packages may downright assume this structure and if you don’t follow the standard you risk on missing out on goodies as well as making the project way less maintainable.
The __init__.py
For the find_packages()
function idiomatically used in setup.py
to automatically detect all the Python files belonging to a particular package, you need to add at least an empty __init__.py
file at each directory level.
đź‘€
__init__.py
is used as the module definition file and is the first file to be executed when you’re importing a directory-level module into any other Python script. It can be left empty or it can contain some initialization logic depending on how you want to set up your package.
So the proper structure for a package X
with two directory-level modules Y
and Z
would normally look like this:
X
|
|-> X
| |-> Y
| | |-> __init__.py
| |
| |-> Z
| | |-> __init__.py
| |
| |-> __init__.py
|
|-> setup.py
setup.py
and setup.cfg
My recommendation is to use a minimal setup.py
and a second declarative configuration file named setup.cfg
since it’s always easier to read a declarative file than a custom Python script. So if you can, stay with setup.py
as close to this as possible:
from setuptools import setup
if __name__ == "__main__":
setup()
Whereas your setup.cfg
should be the file describing your package through-and-through. Here is an example from mkcommit:
[metadata]
name = mkcommit
author = Krzysztof J. Czarnecki
author_email = kjczarne@gmail.com
version = attr: mkcommit.__version__
url = https://github.com/kjczarne/mkcommit
description = Dead-simple tool for enforcing clean commit messages.
long_description = file: README.md
long_description_content_type = text/markdown
keywords = git, commit, style, templates
license = MIT
classifiers =
License :: OSI Approved :: MIT License
Programming Language :: Python :: 3
Programming Language :: Python :: 3.7
[options]
zip_safe = False
include_package_data = True
packages = find:
python_requires = >=3.6
install_requires =
InquirerPy
pyperclip
prettyprinter
pyyaml
requests
[options.entry_points]
console_scripts =
mkcommit = mkcommit.main:main
[options.extras_require]
dev =
pdoc
flake8
twine
setuptools
autopep8
nox
[flake8]
exclude =
.git,
__pycache__,
build,
dist,
.tox,
.nox,
.pytest_cache
max_complexity = 12
max_line_length = 100
ignore =
E731
E116
E114
E221
E241
Skipping the obvious stuff like author
or url
that points to the repo, there are a few entries that deserve an explanation:
version
-> you could always store a version tag here as a literal but what I like to do is to useattr: project_name.__version__
directive. In the__init__.py
that is at the root of my Python package I place a__version__ = "0.0.1"
variable which I can later use from within the modules to e.g. decide what API version to use.đź‘€ The
attr:
part is a directive that will try to find a particular attribute (variable) in a Python module.long_description
-> I usually usefile: README.md
, which is a directive that opens aREADME.md
file at the root of the repo and will use the contents as a long description. This is exactly what you see when you inspect Python packages on PyPi.long_description_content_type
-> default format used bysetuptools
is RST. I prefer Markdown in most cases, so I settext/markdown
. Otherwise you should prepare yourREADME.rst
file but it doesn’t look nearly as good as the Markdown variant on GitHub or GitLab.license
-> if you’re planning to upload the package topypi.org
for everybody to use, this should be a valid open-source license name.classifiers
-> this is used by registries likepypi.org
to categorize the package. Nobody ever remembers them, everybody copies them from PyPi. TheLicense
andProgramming Language
classifiers are the minimum required by PyPi.zip_safe
-> believe it or not but Python packages can be installed in a compressed (ZIP) format that makes them a bit smaller in size and usually makes them run a tiny bit faster. Turning this on is however not recommended in larger projects since not all resources can be used in a compressed form.include_package_data
-> when set to true you can specify patterns in aMANIFEST.in
file at the root of your repo (or whereversetup.py
is) to include non-Python files along with the package. By default Python packages only consist of the Python source code files so if you need other files for the package to correctly operate and create theMANIFEST.in
. For example:include project\spec.yaml include project/spec.yaml include project\spec.schema.json include project/spec.schema.json recursive-include project/pdoc *.* recursive-include project\pdoc *.*
đź‘€ If the file pattern didn’t match, the file will simply not be included. But because Windows and *NIX systems have a different path separator, you will likely need to duplicate the includes if you’re building the project on more than one platform.
packages
-> if you follow the standard package structure that I outlined, you will always usefind:
directive here. This is equivalent to callingfind_packages()
insetup.py
and it makessetuptools
discover the packages without having to list them explicitly.python_requires
-> minimum required version of Python for the project to work.install_requires
-> a list ofpip
packages that your package uses and depends on.console_scripts
-> a list of commands that will automatically be exposed whenpip
installs your package. For examplemkcommit = mkcommit.main:main
maps anmkcommit
command that you can run in the shell to the functionmain
in the modulemkcommit.main
. This corresponds to executing yourdef main()
frommain.py
that’s located inmkcommit
folder in your repo. Take a look at mkcommit’s code if this sounds confusing and track it. You’ll realize it makes sense.dev
-> these are tools that are necessary for the package only during development but not when a 3rd party wants to use your package. For examplenox
is used to automate package testing and deployment but the end user should never want to touchnox
and the package itself works perfectly without it installed.flake8
section -> this is the configuration I use for my linter of choice, which isflake8
. The linter is used to force yourself and other people to maintain good standardized code style.
Building your package
When you want to create a package and share it with others the easiest thing to do is:
python -m setup bdist_wheel
# or
python -m build
If you run it in the same directory where your setup.py
lives, your package should be zipped into a *.whl
file.
đź‘€ You can unzip
*.whl
files just as if they were normal*.zip
folders.
You can now send the file to somebody and tell them to install it:
pip install <name>.whl
That’s it. setuptools
might make it a pain for developers but it makes things so much simpler for the users. After all, developers should manage complexity, not the end users.
Deploying your package
If you want to deploy your package to a registry, an internal company registry or pypi.org
, you should use twine
. For PyPi it’s quite simple:
pip install twine # install twine
twine upload -r testpypi dist/* # test if the upload works
twine upload dist/* # upload your package
You can learn way more from the documentation.
âš Beware: a package once uploaded to PyPi can never be removed. All packages and their versions are immutable so make sure you don’t make mistakes here. If you exposed some credentials this way, change them immediately!
Final words
So yeah, it might be a bit dense but with the minimal amount of information outlined in this article you should be able to hit the ground running with Python packages. I will leave you with a thought: I’ve seen large commercial projects set up as folders of scripts where stringing any two parts together was an incredibly painful mess. You shouldn’t wish the same happens to your project if you want it to enter production. The patterns that were set up so far were long years in development and involved a lot of discussions and compromises. There are newer, more convenient systems like Poetry but until they take the industry by storm, sticking to the standards is your best chance for success, particularly because most Python developers are already familiar with them, which makes working in teams and open-source communities nicer. If you want to use Poetry, fine, but don’t ignore its older brother. That’s simply not an option (yet).