A Practical Guide to Using Setup.py

Rogier van der Geer/
25 March, 2019


When you are using python professionally it pays to set up your projects
in a consistent manner. This helps your collaborators quickly understand the
structure of a project, and makes it easier for them to set up the project
on their machine. The key to setting up your project is the setup.py file.
In this blog I'll go into the details of this file.

Where we start

Here I assume that you already have a package that you want to set up.
This does not need to be a finished package - ideally you should create the
setup.py long before your project is finished. It could even be an empty package;
just make sure the package folder exists
and contains a file named init.py (which may be empty).

If you follow my colleague Henk's structure
for your project, your starting situation should look something like this:

example_project/
├── exampleproject/      Python package with source code.
│   ├── __init__.py      Make the folder a package.
│   └── example.py       Example module.
└── README.md            README with info of the project.

You may have other files or folders in your structure, for example
folders named notebooks/, tests/ or data/, but these aren't required.

The case for a setup.py

Once you have created a package like this, then you are likely
to use some of the code in other places. For example, you might want
to do this in a notebook:

from exampleproject.example import example_function

This would work if your current working directory is example_project/, but in
all other cases python will give you output like:

ModuleNotFoundError: No module named 'exampleproject'

You could tell python where to look for the package by setting the PYTHONPATH
environment variable or adding the path to sys.path,
but that is far from ideal: it would require different actions on different
platforms, and the path you need to set depends on the location of your code.
A much better way is to install your package using a setup.py and pip,
since pip is the standard way to install all other packages, and it is bound
it work the same on all platforms.

A minimal example

So what does a setup.py file look like? Here is a minimal example0:

from setuptools import setup, find_packages

setup(
    name='example',
    version='0.1.0',
    packages=find_packages(include=['exampleproject', 'exampleproject.*'])
)

Here we specify three things:

Now all that you need to do in order to install your package is to run the following
from inside the example_project/ directory3:

pip install -e .

The . here refers to the current working directory, which I assume to be the directory
where the setup.py can be found. The -e flag specifies that we want to install
in editable mode, which means
that when we edit the files in our package we do not need to re-install the
package before the changes come into effect. You will need to either restart
python or reload the package though!

When you edit information in the setup.py itself you will need to re-install
the package in most cases, and also if you add new (sub)packages.
When in doubt, it can never hurt to re-install. Just run pip install -e . again.

Requirements

Most projects have some dependencies. You have most likely used
a requirements.txt
file before, or an environment.yml
if you are using conda. Now that you are creating a setup.py, you can specify your
dependencies in the install_requires argument.
For example, for a typical data science project you may have:

setup(
    name='example',
    version='0.1.0',
    packages=find_packages(include=['exampleproject', 'exampleproject.*']),
    install_requires=[
        'PyYAML',
        'pandas==0.23.3',
        'numpy>=1.14.5',
        'matplotlib>=2.2.0,,
        'jupyter'
    ]
)

You may specify requirements without a version (PyYAML), pin a version (pandas==0.23.3), specify a minimum
version ('numpy>=1.14.5) or set a range of versions (matplotlib>=2.2.0,<3.0.0). These
requirements will automatically be installed by pip when you install your package.

Extras-require

Sometimes you may have dependencies that are only required in certain situations. As a data scientist
I often make packages which I use to train a model. When I work on such a model interactively
I may need to have matplotlib and jupyter installed in order to interactively work with the
data and to create visualizations
of the performance of the model. On the other hand, if the model runs in production I do not
want to install matplotlib nor jupyter on the machine (or container) where I train
or do inference. Luckily setuptools allows to specify optional dependencies in extras_require:

setup(
    name='example',
    version='0.1.0',
    packages=find_packages(include=['exampleproject', 'exampleproject.*']),
    install_requires=[
        'PyYAML',
        'pandas==0.23.3',
        'numpy>=1.14.5'
    ],
    extras_require={
        'interactive': ['matplotlib>=2.2.0,, 'jupyter'],
    }
)

Now if we install the package normally (pip install example from PyPI or pip install -e . locally)
it will only install the dependencies PyYAML, pandas and numpy. However, when we specify
that we want the optional interactive dependencies (pip install example[interactive]
or pip install -e .[interactive]),
then matplotlib and jupyter will also be installed.

Scripts and entry points

The main use case of most python packages that you install from PyPI is to provide functionality
that can be used in other python code. In other words, you can import from those packages.
As a data scientist I often make packages that aren't meant to be used by other python code but
are meant to do something, for example to train a model. As such, I often have a python script that
I want to execute from the command line.

The best way4 to expose functionality of your package to the command line is to define
an entry_point as such:

setup(
    # ...,
    entry_points={
        'console_scripts': ['my-command=exampleproject.example:main']
    }
)

Now you can use the command my-command from the command line, which will in turn execute the main
function inside exampleproject/example.py. Do not forget to re-install - otherwise the command
will not be registered.

Tests

Whenever you write any code, I strongly encourage you to also write tests for this code. For testing
with python I suggest you use pytest. Of course you do not want to add pytest to your dependencies
in install_requires: it isn't required by the users of your package. In order to have it installed
automatically when you run tests you can add the following to your setup.py:

setup(
    # ...,
    setup_requires=['pytest-runner'],
    tests_require=['pytest'],
)

Additionally you will have to create a file named setup.cfg with the following contents:

[aliases]
test=pytest

Now you can simply run python setup.py test and setuptools will ensure the necessary dependencies
are installed and run pytest for you! Have a look here if
you want to provide arguments or set configuration options for pytest.

If you have any additional requirements for testing (e.g. pytest-flask) you can add them to tests_require.

Flake8

Personally I think it is a good idea to run Flake8 to
check the formatting of your code. Just like with pytest, you do not want to add flake8 to the
install_requires dependencies: it does not need to be installed in order to use your
package. Instead, you can add it to setup_requires:

setup(
    # ...,
    setup_requires=['flake8']
)

Now you can simply run python setup.py flake8. Of course you can also pin the version
of flake8 (or any other package) in setup_requires.

If you want to change some of the configuration parameters of Flake8 you can add a [flake8] section to
your setup.cfg. For example:

[flake8]
max-line-length=120

Package data

Sometimes you may want to include some non-python files in your package. These
may for example be schema files or a small lookup table. Be aware that such files
will be packaged together with your code, so it is in general a bad idea to include
any large files.

Suppose we have a schema.json in our project, which we place in exampleproject/data/schema.json.
If we want to include this in our package, we must use the package_data argument of setup:

setup(
    # ...,
    package_data={'exampleproject': ['data/schema.json']}
)

This will make sure the file is included in the package. We can also choose to include
all files based on a pattern, for example:

setup(
    # ...,
    package_data={'': ['*.json']}
)

This will add all *.json files in any package it encounters.

Now don't try to figure out the installed files' location yourself, as
pkg_resources has some very handy convenience functions:

For example, we could read in our schema using:

from json import load
from pkg_resources import resource_stream

schema = load(resource_stream('exampleproject', 'data/schema.json'))

Metadata

If you are going to publish your package, then you probably want to give your
potential users some more information about your package, including a description,
the name of the author or maintainer, and the url to the package's home page.
You can find a complete list of all allowed metadata in the setuptools
docs.

Additionally, if you are going to publish to PyPI, then you may want to
automatically load the contents of your README.md
into the long_description
,
and provide classifiers to tell pip even
more about your package.

Wrap-up

This blog should be a good starting point to set up most of your python projects.
If you want to read more about python packaging have a look
at the docs. Here is an example setup.py
which combines all parts shown in this blog:

from setuptools import setup, find_packages

setup(
    name='example',
    version='0.1.0',
    description='Setting up a python package',
    author='Rogier van der Geer',
    author_email='rogiervandergeer@godatadriven.com',
    url='https://blog.godatadriven.com/setup-py',
    packages=find_packages(include=['exampleproject', 'exampleproject.*']),
    install_requires=[
        'PyYAML',
        'pandas==0.23.3',
        'numpy>=1.14.5'
    ],
    extras_require={'plotting': ['matplotlib>=2.2.0,, 'jupyter']},
    setup_requires=['pytest-runner', 'flake8'],
    tests_require=['pytest'],
    entry_points={
        'console_scripts': ['my-command=exampleproject.example:main']
    },
    package_data={'exampleproject': ['data/schema.json']}
)

and the accompanying setup.cfg:

[aliases]
test=pytest

[flake8]
max-line-length=120

Improve your Python skills, learn from the experts!

At GoDataDriven we offer a host of Python courses from beginner to expert, taught by the very best professionals in the field. Join us and level up your Python game:

Footnotes

0: In this blog I have used setuptools
to set up my example project. Alternatively
you could also use distutils,
which is the standard tool for packaging in python, but it lacks features
such as the find_packages() function and entry_points.
Since the use of setuptools is very common nowadays and many of its features
can be particularly useful, I suggest that you should use setuptools.

1: If you want the version of your package to also be available inside python,
have a look here.

2: You could also list your packages manually, but this is particularly error-prone.

3: Alternatively you could run python setup.py install, but using pip has
many benefits, among which are automatic installation of dependencies and the
ability to uninstall or update your package.

4: You could also use the scripts argument (see for
example here)
but as this requires you to create a python shell script it may not work
as well (or at all) on Windows.

Subscribe to our newsletter

Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.