Pytest Tutorial: An Introduction To Unit Testing

Background
Imagine you are a data scientist who has just developed some awesome new model that is going to bring the company a lot of money. The next step is to send it to production. You spend some days making the code PEP compliant, applying **[linting](https://en.wikipedia.org/wiki/Lint%28software%29), etc. Finally, you create a pull request**_ on GitHub excited about your new release. Then, a Software Engineer asks: ‘I don't see any tests here?'
This scenario has happened to me and is quite frequent with junior Data Scientists. Testing is an essential part of any software project and Data Science is no different. Therefore, it is an important concept and tool to nail down as it will be invaluable in your career. In this post, I dive into the need for testing and how we can easily carry them out by using Pytest.
What are Tests?
Testing is something we do naturally by simply inferring if the output is what we have expected which is called exploratory testing. However, this is not ideal especially when you have a large codebase with numerous steps, as it would be hard to detect where the problem is occurring.
Therefore, it is common to practice having written tests for your code. You would have some input and expected output. This _automates_ the testing process and speeds up the debugging process.
The most common and frequent written tests are _unit tests_. These are tests that test small blocks of code, typically functions and classes, to verify the block is doing what it should.
The general advantages of unit tests are:
- Speeds up debugging and finding the issues
- Identifying bugs earlier
- More robust and maintainable code
- Leads to better code design with less complexity
Unit tests are the foundational tests in the testing period with integration and _system_ testing following.

What is Pytest?
Pytest is an easy-to-use python package to carry out unit testing. It is the most popular testing package alongside Python's native unit test framework. Pytest has several advantages over other testing frameworks:
- Open source
- Skip and label tests
- Parallelized test execution
- Very easy and intuitive to use
Now let's begin some testing!
Installation and Setup
You can install pytest
through pip by simply writing:
pip install pytest
In your terminal or command line. If you want a certain version:
pip install pytest==
You can verify it is installed on your machine through:
pytest --version
The best practice is to have the tests in a separate directory, such as tests/
, to the main code. Another requirement is that all test files are prefixed with test_*.py
or suffixed *_test.py
using _snake case. Similarly, all test functions and classes should start with `testor
Test(_**[camel case](https://en.wikipedia.org/wiki/Camel_case)**_) respectively. This ensures that
pytest` knows which functions, classes, and files are tests.
Basic Example
Let's go through a very simple example.
First, we will create a new directory pytest-example/
containing two files: calculations.py
and test_calculations.py
. In the calculations.py
file, we will code the following function:
Python">def sum(a: float, b: float) -> float:
"""
Calculate the sum of the two numbers.
:param a: The first number to be added.
:param b: The second number to be added.
:return: The sum of the two numbers.
"""
return a + b
And in the test_calculations.py
file, we write its corresponding unit test:
from calculations import sum
def test_sum():
assert sum(5, 10) == 15
This test can be run by executing either of these commands:
pytest
pytest test_calculations.py
And the output looks like this:

Good news, our test passed!
However, if our assert
is incorrect:
def test_sum():
assert sum(5, 10) == 10
The output would be:

Several Tests
It is possible to have several tests for different functions. For example, let's add another function to calculations.py
:
def sum(a: float, b: float) -> float:
"""
Calculate the sum of the two numbers.
:param a: The first number to be added.
:param b: The second number to be added.
:return: The sum of the two numbers.
"""
return a + b
def multiply(a: float, b: float) -> float:
"""
Calculate the product of the two numbers.
:param a: The first number to be added.
:param b: The second number to be added.
:return: The product of the two numbers.
"""
return a * b
And then add the test for the multiply
function in test_calculations.py
:
from calculations import sum, multiply
def test_sum():
assert sum(5, 10) == 15
def test_multiply():
assert multiply(5, 10) == 50
Executing pytest
:

The two tests have passed!
However, what if you wanted, say, to just run the test_multiply
function? Well, all you need to do is pass that function name as an argument when executing pytest
:
pytest test_calculations.py::test_multiply

As we can see, pytest
only ran test_multiply
as we wanted!
If we wanted to now add a divide
function, it would be best practise to now turn them into classes:
class Calculations:
def __init__(self, a: float, b: float) -> None:
"""
Initialize the Calculation object with two numbers.
:param a: The first number.
:param b: The second number.
"""
self.a = a
self.b = b
def sum(self) -> float:
"""
Calculate the sum of the two numbers.
:return: The sum of the two numbers.
"""
return self.a + self.b
def multiply(self) -> float:
"""
Calculate the product of the two numbers.
:return: The product of the two numbers.
"""
return self.a * self.b
def divide(self) -> float:
"""
Calculate the quotient of the two numbers.
:return: The quotient of the two numbers.
"""
return self.a / self.b
from calculations import Calculations
import pytest
class TestCalculations:
def test_sum(self):
calculations = Calculations(5, 10)
assert calculations.sum() == 15
def test_multiply(self):
calculations = Calculations(5, 10)
assert calculations.multiply() == 50
def test_divide(self):
calculations = Calculations(5, 10)
assert calculations.divide() == 0.5
Pytest Fixtures
In the above TestCalculations
class, notice that we initialise the Calculations
class several times. This is not optimal and luckily pytest
has fixtures to address this exact scenario:
from calculations import Calculations
import pytest
@pytest.fixture
def calculations():
return Calculations(5, 10)
class TestCalculations:
def test_sum(self, calculations):
assert calculations.sum() == 15
def test_multiply(self, calculations):
assert calculations.multiply() == 50
def test_divide(self, calculations):
assert calculations.divide() == 0.5
Instead of initialising Calculations
multiple times, we can attach the fixture as a decorator to contain the information on the input data.
Pytest Parametrize
Up to this point, we have only passed one test case for each test function. However, there may be multiple edge cases you want to test and verify. Pytest makes this process very easy through the parametrize decorator:
from calculations import Calculations
import pytest
@pytest.fixture
def calculations():
return Calculations(5, 10)
class TestCalculations:
@pytest.mark.parametrize("a, b, expected_output",
[(1, 3, 4), (10, 50, 60), (100, 0, 100)])
def test_sum(self, a, b, expected_output):
assert Calculations(a, b).sum() == expected_output
def test_multiply(self, calculations):
assert calculations.multiply() == 50
def test_divide(self, calculations):
assert calculations.divide() == 0.5
Where we have used the pytest.mark.parametrize
decorator to test several inputs for the sum
function. The output looks like this:

Notice that we have 5 test passing instead of 3, this is because we are passing two extra tests to the sum
function.
Summary & Further Thoughts
Testing, particularly Unit Testing, is an essential skill to learn and understand as Data Scientist as it helps prevents bugs and speeds up development time. The most common testing package, in Python, is Pytest. This is an easy-to-use framework with an intuitive testing procedure. In this article, we have shown how you can use Pytest making use of its fixtures and parametrize features.
The full code used in this article is available here:
Medium-Articles/Software Engineering /pytest-example at main · egorhowell/Medium-Articles
Another Thing!
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist.