Software Engineering Best Practices for Writing Maintainable ML Code

Author:Murphy | View: 26798 | Time: 2025-03-23 13:10:51

Unlike traditional software engineering projects, ML codebases tend to lag behind in code quality due to their complex and evolving nature, leading to increased technical debt and difficulties in collaboration. Prioritizing maintainability is important to create robust ML solutions that can adapt, scale, and deliver value over time.

In recent years, machine learning has taken the world by storm, transforming industries from healthcare to finance and more. As more organizations jump on the ML bandwagon to discover new possibilities and insights, the significance of writing maintainable and robust ML code becomes crucial. By crafting ML code that's easy to work with and stands the test of time, teams can collaborate better and guarantee success as models and projects grow and adapt. The following section will show common examples from ML codebases and explain how to handle those properly.

Don't Create Monoliths

This tip is probably irrelevant for you, but it's written for the single person who is not aware of this (until now)!

Monolithic scripts, a.k.a. a single script for the whole project, may arise when you reuse your experimental code in production. Copy, paste, done! It's always a bad idea to create one single script for a project. It's difficult to read (even for the writer), hard to debug and inefficient. You can't easily add new features or modify the code, because every time the whole thing has to run. Adding unittests is impossible as well, because the monolith is ‘the whole unit'.

Another problem with a single script is reusability. You can't reuse the code in other projects, because it's so hard to read.

There is only one reason to write a monolith; that is if you don't like the colleague who takes over your work. If you want to get this person frustrated, it's an easy way to accomplish that.

What to do instead? Write modules and classes. Create different code files that have one specific purpose. Every file should contain functions or classes and methods. By doing this, the code becomes way easier to read, debug, reuse and test. In the next tip you can find a commonly used directory structure.

Don't Over-Engineer the Repository Structure

This might seem counterintuitive, but it's quite an important one. Over-engineering the repository structure refers to creating a complex and unnecessarily convoluted organization for your code and project files. It involves introducing layers of abstraction, excessive folder structures, and intricate naming conventions that may not provide any significant benefits in terms of maintainability, scalability, or collaboration. Instead, it adds unnecessary complexity, making it harder for team members to understand, navigate, and contribute to the project.

To name a few issues that may arise: slower development because of the increased cognitive load for developers. The learning curve for new team members becomes more complex, and the complexity of the repo can lead to code duplication and fragmentation.

How can you maintain a healthy repository structure? Here are some tips that might help:

Follow standard directory layouts and naming conventions that are widely accepted in the ML community (see below).
Group related files and modules together in appropriate directories. For example, keep data preprocessing code separate from model training code.
Provide clear and concise documentation to guide team members on how to navigate and contribute to the project.
Regularly review the repository structure with the team to identify areas that can be simplified or improved.
And last but not least: Keep it simple! Avoid adding unnecessary layers of abstraction or complicated folder hierarchies unless they genuinely improve organization and readability.

Here is an example of a standard directory structure:

project_root/
|-- data/
|   |-- raw/                   # Raw data files (read-only)
|   |-- processed/             # Processed data files (generated by scripts)
|   |-- intermediate/          # Intermediate data files (generated by scripts)
|-- notebooks/                # Jupyter notebooks (for exploration, analysis, and visualization)
|-- src/
|   |-- data/                  # Data processing scripts
|   |-- models/                # Model implementation and training scripts
|   |-- evaluation/            # Model evaluation and testing scripts
|   |-- utils/                 # Utility functions and helper scripts
|-- experiments/              # Experiment logs, metrics, and model checkpoints
|-- configs/                  # Configuration files (hyperparameters, settings)
|-- tests/                    # Unit tests and test data
|-- docs/                     # Documentation files (if separate from notebooks)
|-- README.md                 # Project overview and instructions
|-- pyproject.toml            # Poetry project file for package management
|-- poetry.lock               # Poetry lock file for pinned dependencies
|-- Dockerfile                # Dockerfile for containerization
|-- .gitignore                # List of files to ignore in version control (e.g., data files, virtual environments)

Many nested folders can make a repo too complex. Photo by Didssph on Unsplash

Understand your Programming Language Behavior

In some situations, a programming language does not behave as you would expect. This can cause frustration and a lot of time wasted on debugging code. To prevent this, it helps a lot if you are aware of strange behavior of your programming language.

If we take a look at Python, here are some examples that you should be aware of.

The first one you might encounter as programmer is the following. You want to round numbers and then you discover this:

print(round(2.5))  # output: 2 (expectation: 3) (!)
print(round(3.5))  # output: 4 (expectation: 4)
print(round(4.5))  # output: 4 (expectation: 5) (!)
pirnt(round(5.5))  # output: 6 (expectation: 6)

What's going on? In Python 3, the decision is made to use bankers' rounding, which means that 0.5 is rounded to an even number. Why does this make sense? I like this explanation on Stack Overflow.

Another example of strange behavior in Python:

def append_to_list(item, my_list=[]):
    my_list.append(item)
    return my_list

print(append_to_list(1))  # output: [1]
print(append_to_list(2))  # output: [1, 2] (!)

In the example above the default argument of the list is an empty list. But if you run the function for a second time, the output of the second run is surprising. It returns the previous list with the new item added! The reason for this is that in Python there are mutable default arguments, meaning that changes made to the default object persist across function calls. To avoid this, set the default to None and create a new instance of the mutable object within the function call.

And another one, floating-point arithmetic precision:

print(0.1 + 0.2 == 0.3)  # output: False (!)

It's better to accept small tolerances when comparing floating point values. If you are comparing floating point values during testing and you use pytest, you can solve this problem with approx:

from pytest import approx

def test_example():
    assert 0.1 + 0.2 == approx(0.3)  # output: True

In the special case you are using Python 2.7, the following example can be confusing. Variables used in a list comprehension can "leak" into the outer scope:

x = 5
my_list = [x for x in range(3)]
print(x)  # output: 2 (!)

Sorting with None will raise a TypeError:

my_list = [None, 3, 1, 2]
sorted_list = sorted(my_list)  # raises TypeError (!)

You can also assign an attribute to an instance that is not present in its class:

class MyClass:
    pass

obj = MyClass()
obj.new_attribute = 42
print(obj.new_attribute)  # output: 42 (!)

Another confusing example is about inconsistent closures. A closure is a function that captures and remembers the environment in which it was created, including any variables from its outer scope. While closures are a powerful feature, their behavior in loops can lead to surprising results:

def create_multipliers():
    multipliers = []
    for i in range(5):
        def multiplier(x):
            return i * x
        multipliers.append(multiplier)
    return multipliers

multipliers_list = create_multipliers()

# calling the closure functions from the list
print(multipliers_list[0](2))  # output: 8 (4 * 2)
print(multipliers_list[1](2))  # output: 8 (4 * 2)
print(multipliers_list[2](2))  # output: 8 (4 * 2)
print(multipliers_list[3](2))  # output: 8 (4 * 2)
print(multipliers_list[4](2))  # output: 8 (4 * 2)

All functions return the product of the last value of i in the loop (which is 4) with x. This happens because closures in Python close over variables, not their values, which means they retain a reference to the variable i. When the closure is called later (outside the loop), it looks up the current value of i in its enclosing scope, which is now 4 (the last value in the loop).

An easy way to solve this would be to use default arguments:

def create_multipliers_fixed():
    multipliers = []
    for i in range(5):
        def multiplier(x, i=i):
            return i * x
        multipliers.append(multiplier)
    return multipliers

multipliers_list_fixed = create_multipliers_fixed()

# calling the fixed closure functions from the list
print(multipliers_list_fixed[0](2))  # output: 0 (0 * 2)
print(multipliers_list_fixed[1](2))  # output: 2 (1 * 2)
print(multipliers_list_fixed[2](2))  # output: 4 (2 * 2)
print(multipliers_list_fixed[3](2))  # output: 6 (3 * 2)
print(multipliers_list_fixed[4](2))  # output: 8 (4 * 2)

There are many more examples besides these ones. Being aware of this will make your code more robust and reliable and it will avoid all those unwanted surprises!

Handling Multiple Return Values

Adding more and more parameters to return statements in functions can make Python code messy and harder to maintain, especially when the number of parameters grows large. Every time you change something, you need to update all the calling code, leading to a maintenance nightmare. Developers might mistakenly pass the parameters in the wrong order, leading to logical errors.

Python provides an elegant solution for this: [namedtuple](https://docs.python.org/3/library/collections.html) from Python's collections module.

Here's how you can use namedtuples to improve the clarity and maintainability of your code:

from collections import namedtuple

def calculate_statistics(numbers):
    total = sum(numbers)
    mean = total / len(numbers)
    maximum = max(numbers)
    Statistics = namedtuple('Statistics', ['sum', 'mean', 'maximum'])
    return Statistics(sum=total, mean=mean, maximum=maximum)

# example
data = [12, 5, 8, 14, 10]
result = calculate_statistics(data)

print("Sum:", result.sum)         # output: Sum: 49
print("Mean:", result.mean)       # output: Mean: 9.8
print("Maximum:", result.maximum) # output: Maximum: 14

Easy, right? Using namedtuple has many benefits, like making your code more readable, being immutable and it's more memory efficient (implemented in C). And maybe the biggest one for you as a programmer: you don't have to update all the calling code after adding another parameter to the return statement.

Note: namedtuples in Python are similar to case classes in Scala.

A Note on Exception Handling

The worst way of handling exceptions is blindly continuing when one occurs:

try:
    result = do_something()
except:
    pass

Never do this. If something happens and the result couldn't be obtained, the script continues like everything is normal. The exceptions are silently ignored. This can lead to hidden bugs and unexpected behavior.

In some repos, you will find many try except blocks. Avoid using try-except blocks for normal flow control. Exceptions should be used for handling exceptional situations, not regular flow.

Another bad practice is catching generic exceptions:

try:
    result = do_something()
except Exception as e:
    log_error(f'Exception occurred: {str(e)}')

This can hide specific errors and make debugging difficult. Also avoid bare except blocks. You won't have a clue about what's going on.

How should you handle exceptions? Here are some useful tips.

Try to be specific, and only catch generic exceptions for debugging. In some cases it can be useful to add a finally block. Code in the finally block will always run.

try:
    result = do_something()
except FileNotFoundError:
    log_error("File not found.")
except ValueError:
    log_error("Invalid input.")
except Exception as e:
    # catch any other unexpected exceptions and log them for debugging
    log_error(f"An unexpected error occurred: {str(e)}")
finally:
    # this runs always, whether an exception occurred or not
    close_resources()

You can also create custom exception classes. With custom exceptions, you can provide more specific error messages and help distinguish different types of errors. It can be as easy as this:

class CustomError(Exception):
    def __init__(self, message, *args):
        self.message = message
        super(CustomError, self).__init__(message, *args)

def some_function():
    if some_condition:
        raise CustomError("This is a custom error message.")

try:
    some_function()
except CustomError as ce:
    log_error(str(ce))

Dealing Properly with Large Conditional Logic Trees

The complexity of (business) logic can quickly escalate. It begins with a simple if-else statement, but as it expands, it turns into a massive conditional structure that becomes difficult to manage, leading to frustration. Fortunately, various approaches exist to address this challenge and enhance maintainability.

One solution is to separate the logic from the processing, for example in a dictionary. Let's look at the following if-else tree:

def process_input(x):
    if x == 'A':
        return 'Apple'
    elif x == 'B':
        return 'Banana'
    elif x == 'C':
        return 'Cherry'
    elif x == 'D':
        return 'Date'
    # ... and so on for many more cases
    else:
        return 'Unknown'

We can refactor this using a dictionary:

def process_input(x):
    mapping = {
        'A': 'Apple',
        'B': 'Banana',
        'C': 'Cherry',
        'D': 'Date',
        # ... and so on for many more cases
    }
    return mapping.get(x, 'Unknown')

Best is to define the mapping outside of the function in a separate settings or configuration file. This approach is okay, but it can still become quite a large dictionary if you aren't careful.

Another way to handle this is by using polymorphism. Create a base class with a common interface, and then implement subclasses for each specific case. Each subclass will handle its unique logic.

Suppose we want to perform different mathematical operations based on the input op and apply them to x and y:

class Operation:
    def perform(self, x, y):
        raise NotImplementedError

class Addition(Operation):
    def perform(self, x, y):
        return x + y

class Subtraction(Operation):
    def perform(self, x, y):
        return x - y

class Multiplication(Operation):
    def perform(self, x, y):
        return x * y

class Division(Operation):
    def perform(self, x, y):
        if y != 0:
            return x / y
        else:
            raise ValueError("Cannot divide by zero.")

def calculate(op, x, y):
    operations = {
        '+': Addition(),
        '-': Subtraction(),
        '*': Multiplication(),
        '/': Division(),
    }

    operation = operations.get(op)
    if operation:
        return operation.perform(x, y)
    else:
        raise ValueError("Invalid operation.")

And a final warning: beware of code repetition. If you have multiple models and you want to use their outputs to get to a final score, don't create nested Conditional Logic Trees. Instead, you can use multiple small functions to get to the final score.

Let's look at a toy example from school. Your final grade for math will be calculated out of your attendance percentage and your exam percentage score.

First look at this:

def final_score(attendance: float, exam: float) -> int:
    if attendance < 0.25:
        if exam < 0.25:
            return 4
        elif exam < 0.5:
            return 5
        elif exam < 0.75:
            return 6
        else:
            return 7
    elif attendance < 0.5:
        if exam < 0.25:
            return 5
        elif exam < 0.5:
            return 6
        elif exam < 0.75:
            return 7
        else:
            return 8
    elif attendance < 0.75:
        if exam < 0.25:
            return 6
        elif exam < 0.5:
            return 7
        elif exam < 0.75:
            return 8
        else:
            return 9
    else:
        if exam < 0.25:
            return 7
        elif exam < 0.5:
            return 8
        elif exam < 0.75:
            return 9
        else:
            return 10

Looks complex right? And what if there are more factors involved in calculating the grade? The tree will grow exponentially with every added parameter.

Fortunately, there exists a more concise and efficient solution for such scenarios:

from typing import List

def map_score(score, score_ranges: List[float]) -> int:
    for i, threshold in enumerate(score_ranges):
        if score < threshold:
            return 2*i
    return 2*(i+1)

def final_score(parameter_scores: List[float], base_score: int = 4) -> int:
    parameter_range = [0.25, 0.5, 0.75]
    scores = [map_score(parameter_score, parameter_range)/len(parameter_scores) for parameter_score in parameter_scores]
    return sum(scores) + base_score

# example
attendance = 0.6
exam = 0.85
result = final_score([attendance, exam])
print("Final Score:", result)

This solution scales really well! You can add a parameter to the list with its corresponding score and it will be included in the result.

Note: You might want to round the final solution when adding more parameters. Make sure to do it properly (see the tip about programming language behavior).

In the future you can use these examples as inspiration to create smarter code and avoid long conditional logic trees.

Conclusion

Congratulations on reaching the end of this post! The provided tips can be instrumental in keeping codebases structured and maintainable. Striking a balance between chaotic monolithic files and overly complex nested directories is key to get a good development experience.

By understanding the programming language's behavior, adopting named tuples for multiple return values, handling errors effectively and simplifying conditional logic, developers and teams can invest less time in maintenance and more in adding useful functionality or exploring new projects. Remember to fortify codebases with tests, linting, code formatting and documentation to ensure long-term health and productivity.

Happy coding, and until next time!