A Guide to Powerful Python Enumerations

Author:Murphy  |  View: 20339  |  Time: 2025-03-22 21:38:45

PYTHON PROGRAMMING

Python enumerations offer useful data types. Photo by Waldemar on Unsplash

Enumeration types are used in various programming languages, such as C, C#, C++, Java, Go, Rust – and, of course, Python. For some reasons I'm unaware of, Python enumerations are undervalued and underused by data scientists. I myself have used them relatively frequently and have seen them in use – but definitely far too rarely. I hope this article will constitute a small step in changing this situation.


Python enumerations offer too powerful a tool to be ignored. If one doesn't know how enumerations work, one may consider it overly complicated – which is why it's best to first learn how they work and only then try to use them. I hope that after reading this article, you'll agree with me that not only enumerations don't complicate code, but also they can make it significantly simpler, shorter and less prone to errors – and even more performant.

We'll discuss enumerations using a particular example from data science. To this end, we will define and compare four classes, two based on the combination of strings and the typing.Literal type, and the other two based on enum.Enum, the standard-library class constituting the enumeration standard in Python. Each of these four classes will aim to represent a time-series model from among a number of possible models, and the model is to be selected by the user.

This article introduces the basics of Python enumerations. It'll equip you with the sufficient knowledge to use this Programming tool and benefit from it in most use cases in which such types offer most. In your daily work, this basic knowledge will occur be enough, but there's more to enumerations. I'll dedicate one of the future articles to discuss the intricacies of Python enumerations.

Introduction to Python enumerations

A very good source on enumerations is the official Python documentation:

enum – Support for enumerations

Enumerations – shortly called enums – are a data type that enables one to define a set of named values. They are called enumerations because an enumeration type contains a number of possible members, and you can enumerate them. For instance, these can be days of week:

  • Monday (1)
  • Tuesday (2)
  • Wednesday – Saturday (3–6)
  • Sunday (7)

or months of a year:

  • January (1)
  • February (2)
  • March – November (3–11)
  • December (12)

In these two examples, the items are numbered according to their natural ordering, the values creating a meaningful list of ordered items. Sometimes, however, enumerations – despite the name – don't have to be enumerated, like:

  • colors, e.g., red, green, blue
  • crop plant species, e.g., wheat, barley, oat, maize
  • a company's products; e.g., pencil, ball pen, fountain pen

As you see, an enumeration is a type that enables you to define a type with a number of choices, whether in a particular order or not.

Warning: Don't confuse enumerations with the enumerate() function!

Note that in Python, "enumerations" are understood as classes inheriting from one of the enum module's enumeration classes. Therefore, don't confuse these enumerations with the built-in enumerate() function, which works like below:

>>> x = [1, 2, "AMAZING", 56]
>>> for index, value in enumerate(x):
...     print(f"Index {index}: value of {value}")
Index 0: value of 1
Index 1: value of 2
Index 2: value of AMAZING
Index 3: value of 56

Enums are not the only type you can use for such data, but they are worth using for several reasons. We'll analyze them below, based on our first enumeration class:

from enum import Enum

class DayOfWeek(Enum):
    MONDAY = 1
    TUESDAY = 2
    WEDNESDAY = 3
    THURSDAY = 4
    FRIDAY = 5
    SATURDAY = 6
    SUNDAY = 7

The upper case used for member names (MONDAY, TUESDAY, …) is a standard. Use it unless you've got a sensible reason for not doing so.

Instead of using this enumeration, we could use various alternatives:

# Global variables
MONDAY = 1
TUESDAY = 2
WEDNESDAY = 3
THURSDAY = 4
FRIDAY = 5
SATURDAY = 6
SUNDAY = 7

# Integers
1, 2, 3, 4, 5, 6, 7

# Strings
"Monday", "Tuesday", "Wednesday", ...

# List / tuple
DAYS = ["Monday", "Tuesday", "Wednesday", ...]

# Dict
DAYS = {1: "Monday", 2: "Tuesday", 3: "Wednesday", ...}
# or
DAYS = {"Monday": 1, "Tuesday": 2, "Wednesday": 3, ...}

Let's analyze the advantages of the enumeration type:

  • The DayOfWeek enum class makes the code more readable and easier to maintain. Its members are DayOfWeek.MONDAY, DayOfWeek.TUESDAY, and so on.
  • Above, I used a term "member" to describe the elements the class contains, and this is indeed the official term used to describe these seven enumerated entries in the class. Each entry, like MONDAY or TUESDAY in DayOfWeek, is called "an enum member" or just "a member". Each enum member has a name and a value. For example, DayOfWeek.MONDAY has a name of MONDAY and the corresponding value of 1, which is an ordinal number representing the position of the day in the week. So, the term "name" refers to the string label of the enum member; e.g., "MONDAY" is the name of the first enum member and "SUNDAY" of the last one.
  • The definition of the class is very clear and simple. The meaning of each member is informative, thanks to the class's name (DayOfWeek) and the name of value of each member (MONDAY, 1). In some cases, values don't convey any particular meaning; they are just indices in the enumeration. Otherwise, not only names but also values can be very meaningful. In our case, values do convey an important piece of information; we'll later use an example in which values will not only be meaningful, but also they will be quite useful as Python objects.
  • That a class is an enumeration provides a significant piece of information itself – one that none of the alternatives provided above do: this is an enumeration of a particular type. For instance, the DayOfWeek enumeration informs that the days of week are counted from 1 to 7 and can be represented by integer values. While some of the alternatives do convey a similar message, the enumeration type does so in the clearest form – it is exactly what it was designed for.
  • You can check if a particular value has the type of DayOfWeek, by isinstance(value, Day).
  • Enums help prevent errors, since the number of an enum's members is restricted. You cannot use any other value than those defined in the class definition. Thanks to this, the user can't pass another value, one that is not defined in the type (class) definition, without an error.
  • When enums are stored as integers, they can make code more efficient – likely as efficient as integers themselves, but far clearer. However, as mentioned above, values can also be other objects, in which case each member will keep its own object. As we'll see soon, this can be quite helpful.

To summarize, enums can make code readable, maintainable, efficient, and error-prone.

Using enums in Python

As shown above, creating a Python enumeration class is simple and its definition is very informative (or rather can be, depending on how it's defined). Using an enumeration class is simple, too:

>>> DayOfWeek.TUESDAY

>>> day = DayOfWeek.TUESDAY
>>> day.name
'TUESDAY'
>>> day.value
2

A great thing about enumerations is that the have a built-in checker of possible values:

>>> DayOfWeek.TUESDAI
Traceback (most recent call last):
  ...
AttributeError: TUESDAI

So, you don't need to worry about incorrect values. If a user provides an incorrect value, the code will raise an exception. So, there's no need to implement additional runtime checking: it's already there.

Let's see what attributes the instance of such a class has:

>>> dir(day)
['FRIDAY', 'MONDAY', 'SATURDAY', 'SUNDAY',
 'THURSDAY', 'TUESDAY', 'WEDNESDAY',
 '__class__', '__doc__', '__eq__', '__hash__',
 '__module__', 'name', 'value']

Of these, we typically use the values (names of days in our case) as well as name and value, which we've used above. However, we should also look into the attributes of the class (not its values) itself:

>>> dir(DayOfWeek)
['FRIDAY', 'MONDAY', 'SATURDAY', 'SUNDAY', 'THURSDAY',
 'TUESDAY', 'WEDNESDAY', '__class__', '__contains__',
 '__doc__', '__getitem__', '__init_subclass__',
 '__iter__', '__len__', '__members__', '__module__',
 '__name__', '__qualname__']

These are not all attributes of Python enums but the most important ones from a user point of view. I discussed this in the following article:

Don't Let the Python dir() Function Trick You!

You'll see there that a class inheriting from the enum.Enum class, but also members of such a class, have many more attributes than the dir() functions shows. We won't analyze these attributes in detail here, as such knowledge is unnecessary for you to use enums in your daily Python coding. I do encourage you, however, to perform such analysis, especially if you're interested in the intricacies of enums.

Let's consider several features of our enumeration class, DayOfWeek. It is these features that make Python enumerations so useful, helping one to write readable and concise code.

First, the class is iterable:

>>> for day in DayOfWeek: print(day)
DayOfWeek.MONDAY
DayOfWeek.TUESDAY
DayOfWeek.WEDNESDAY
DayOfWeek.THURSDAY
DayOfWeek.FRIDAY
DayOfWeek.SATURDAY
DayOfWeek.SUNDAY

The class enables you to access values via a string with a value's name (e.g., "SUNDAY") using brackets (like in the case of a dictionary), a method that you can find useful in various situations:

>>> DayOfWeek["SUNDAY"]

>>> DayOfWeek["SUNDAI"]
Traceback (most recent call last):
  ...
KeyError: 'SUNDAI'

This will enable you to find if a string represents one of the names of the class. You can handle this exception:

>>> day = "SUNDAI"
>>> try:
...     DayOfWeek[day]
... except KeyError:
...     print(f"Incorrect day: {day}")
Incorrect day: SUNDAI

In addition, our enumeration class has a built-in method to make such a validation easier. It implements the .__contains__() method, so we can use the in keyword to check membership:

>>> DayOfWeek.SUNDAY in DayOfWeek
True
>>> "SUNDAY" in DayOfWeek.__members__
True
>>> "SUNDAI" in DayOfWeek.__members__
False

We cannot directly use the in keyword, however, for values (in our case, these are integers from 1 to 7, representing the order of the days in a week). We can do this indirectly, for instance in the following way:

>>> any(day.value == 7 for day in DayOfWeek)
True

This doesn't look like the nicest code ever, but sometimes you may wish to use it. You can create a function to help you with this:

>>> def is_value_day(v):
...     return any(day.value == v for day in DayOfWeek)
>>> is_value_day(7)
True
>>> is_value_day(8)
False

Since this function is directly related to the class, you can make it a class method:

class DayOfWeek(Enum):
    MONDAY = 1
    TUESDAY = 2
    WEDNESDAY = 3
    THURSDAY = 4
    FRIDAY = 5
    SATURDAY = 6
    SUNDAY = 7

    @classmethod
    def is_value_day(cls, v: int) -> bool:
        return any(day.value == v for day in cls)

Let's see the method in action:

>>> DayOfWeek.is_value_day(7)
True
>>> DayOfWeek.is_value_day(8) 
False

In the DayOfWeek class, we used values from 1 to 7 for days of weeks, but sometimes you might prefer using unordered values. There's no problem in doing so:

class Options(Enum):
    ESCAPE = -1
    IGNORE = 0
    GETIN = 1
    COMMENT = 2
    QUESTION = 3

What's more, you can use non-integer values:

class Option(Enum):
    ESCAPE = "Esc"
    IGNORE = None
    GETIN = "Enter"
    COMMENT = "Ctrl+C"
    QUESTION = "Ctrl+Shift+Q"

Let's see how comparisons of instances of this enumeration class work:

>>> from enum import Enum
>>> class Option(Enum):
...     ESCAPE = "Esc"
...     IGNORE = None
...     GETIN = "Enter"
...     COMMENT = "Ctrl+C"
...     QUESTION = "Ctrl+Shift+Q"
>>> option = Option.ESCAPE
>>> option

>>> option == Option.ESCPAE
True
>>> option == Option["ESCAPE"]
True
>>> op = "Esc"
>>> any(o.value == op for o in Option)
True
>>> option == op
False

But:

>>> option == Option[op]
Traceback (most recent call last):
  ...
KeyError: 'Esc'

Sometimes, we might want this last comparison to be true – or, in other words, we might want to use also names in comparisons. Then, op (which is Option.ESCAPE) would return True when compared to another Option.ESCAPE, to "ESCAPE", but also to "Esc".

This could be helpful when the code uses plenty of string values instead of names of the class's options. As we saw in the code block above, the default (built-in) comparisons of enum.Enum instances are not performed using their values, but using names (repeated from the code block above):

>>> option == Option.COMMENT
True
>>> option == Option["COMMENT"]
True

In order to achieve such behavior of comparisons, we can implement our custom comparison method to make such comparisons simpler. Even better, we can overload the default .__eq__() method. An example of such implementation is as follows:

from enum import Enum
from typing import Any, Union

class Option(Enum):
    ESCAPE = "Esc"
    IGNORE = "None"
    GETIN = "Enter"
    COMMENT = "Ctrl+C"
    QUESTION = "Ctrl+Shift+Q"

    def __eq__(self, other: Any) -> bool:
        if isinstance(other, Option):
            return self.name == other.name
        if isinstance(other, str):
            return self.value == other
        return False

Unlike before, the value for Option.IGNORE is a string, not None. This is to make comparisons simpler and more natural – otherwise, Duplicate.IGNORE could appear equal to a None value, rather an unnatural behavior.

The class passes [mypy](https://mypy-lang.org/) static checking. Let's see this .__eq__() method in action:

>>> option1 = Option.ESCAPE
>>> option1 == 1
False
>>> option1 == Option
False
>>> option1 == Option.ESCAPE
True
>>> option1 == Option["ESCAPE"]
True
>>> option1 == "Esc"
True
>>> option1 == "ESCAPE"
False

As we saw in the previous example, the one in which we compared Option values using the built-in .__eq__() method, the new implementation makes these comparisons work differently – while previously both would be negative, this time one of the two comparisons occurs to be true:

>>> option1 == "Esc"
True
>>> option1 == "ESCAPE"
False

Exercise: Think how the class would behave if we changed the .__eq__() method to the following:

def __eq__(self, other: Any) -> bool:
    if isinstance(other, Option):
        return self.name == other.name
    if isinstance(other, str):
        try:
            cond = Option[other] in Option
        except KeyError:
            cond = False
        return self.value == other or cond
    return False

Now that we know we can compare an enumeration class's values with a string, and we know how to do this – let's ponder whether we should do this. Meaning, should you overload the .__eq__() method in order to change the behavior of comparisons of the class's values?

That's a very valid question. Generally, you should avoid overloading built-in methods for built-in classes such as enum.Enum – because the code behaves differently than the code based on the built-in methods. Generally, overwriting objects not only can be confusing, but also it can be quite dangerous. You can read about this in the following article:

Overwriting in Python: Tricky. Dangerous. Powerful

On the one hand, this can come quite handy in some use cases, but on the other, such code may appear confusing to other developers. Hence, do such things only when you're certain that it'll help, and that the code won't be confusing (or at least it won't be too confusing). If, eventually, you decide to go for it, document this clearly, best with convincing examples (using doctest tests).

We've covered the basics of Python enumerations, and a little more. This should be enough for you to start using this tool. Don't get discouraged after using an enumeration class once or twice. Enumerations are quite specific, and so you may need to spend some time with them to appreciate them. This time would be well spent – once you grasp the idea of enumerations, you'll enjoy using them.

Although there's much more to Python enumerations, we'll stop here. Instead of digging deeper, I'd like to present you a use case from the data science realm. I'll show you a practical example of enumerations, but we'll also benchmark this class against some other alternatives. I am pretty sure some of you have starting thinking whether enumeration types are performant or not – let's check this.

Performance of Python enums

Not all Python enums will be performant. In fact, some may be quite slow. This will depend on various factors, but the main of them is how they were implemented. In this article, I want to show you that enums can help achieve very performant and readable code – but you need to know how to do this (e.g., what sort of enumeration to implement, how to implement the enum-related code, and the like).

Thus, the readability of an enum class comes with its implementation – but performance doesn't. When performance matters, it's our task to optimize an enumeration class. This section shows an example of how to achieve this.

The readability of an enum class comes with its implementation – but performance doesn't.

The problem

Let's start with the description of the problem. We want to implement a class that will orchestrate running a selected forecasting model for a dataframe (df: pd.DataFrame) and forecasting horizon, which is the number of periods for which we want create a forecast (horizon: int); in addition, the user needs to provide additional arguments to be passed to the model (params: dict[str, Any]).

Let's assume that running such a model means running a function that implements the following interface:

from typing import Any, Callable
import pandas as pd

ModelFunc = Callable[
    [pd.DataFrame, int, dict[str, Any]],
    pd.DataFrame
]

This interface means that the function:

  • takes the following arguments: a Pandas dataframe with the training, validation and test data (to be split by the function), an integer value indicating the forecasting horizon, and a dictionary with any number of additional parameters (of any type) to be used by the model;
  • returns a Pandas dataframe with the forecast.

We will use this interface, without going into the implementation details of the models. This means whatever function that implements this interface can be used here. We will use the following mocks of functions for five forecasting models:

# Moving average
def run_moving_average(
    df: pd.DataFrame,
    horizon: int,
    params: dict[str, Any]
) -> pd.DataFrame:
    return pd.DataFrame()

# Exponential smoothing
def run_exponential_smoothing(
    df: pd.DataFrame,
    horizon: int,
    params: dict[str, Any]
) -> pd.DataFrame:
    return pd.DataFrame()

# ARIMA
def run_ARIMA(
    df: pd.DataFrame,
    horizon: int,
    params: dict[str, Any]
) -> pd.DataFrame:
    return pd.DataFrame()

# TBATS
def run_TBATS(
    df: pd.DataFrame,
    horizon: int,
    params: dict[str, Any]
) -> pd.DataFrame:
    return pd.DataFrame()

# Prophet
def run_prophet(
    df: pd.DataFrame,
    horizon: int,
    params: dict[str, Any]
) -> pd.DataFrame:
    return pd.DataFrame()

Since the functions are practically empty (they simply return an empty Pandas dataframe), during any benchmarks they will introduce only minimal overhead. That way, benchmarking different implementations of the concept, we will mainly compare the overhead used by the implementation itself.

Implementation based on strings and typing.Literal

In our first implementation, we'll use a string representation of the models (so, for instance, ARIMA will represent the ARIMA model) along with the typing.Literal type, which creates a type that accepts a number of distinct values. You can read about it here:

Python Type Hinting with Literal

In our case, we will use it to create the ModelChoice literal type with the model names:

from typing import Any, Callable, Literal

ModelChoice = Literal[
    "moving_average",
    "exponential_smoothing",
    "ARIMA",
    "TBATS",
    "Prophet"
]

(We will use the other objects imported from typing later on.) The user will provide one of these values in order to let the app know which model it should use. Let's create the class responsible for this:

class ModelString:
    function_map: dict[ModelChoice, ModelFunc] = {
        "moving_average": run_moving_average,
        "exponential_smoothing": run_exponential_smoothing,
        "ARIMA": run_ARIMA,
        "TBATS": run_TBATS,
        "Prophet": run_prophet,
    }

    def __init__(
        self,
        model_name: ModelChoice
    ) -> None:
        if model_name not in self.function_map:
            raise NotImplementedError(
                f"Not implemented model: {model_name}"
            )
        self.model = self.function_map[model_name]
        self.model_name = model_name

    def run(
        self,
        df: pd.DataFrame,
        horizon: int,
        params: dict
    ) -> Any:
        return self.model(df, horizon, params)

    def __str__(self) -> str:
        return f"Instance of {self.model_name} model"

WARNING: Do note that I'm using class names with suffices (here, "String" in ModelString). We need to do this here to differentiate the two types of models (based on strings and enumerations), but you wouldn't regularly do this in production code; typically, Model would be enough.

Let's analyze this code:

  • Instantiation: The class's instance is created using only one argument: model_name. The provided model_name value is validated to check whether it's one of the expected choices, retrieved from the function_map dictionary being an attribute of the ModelString class. We could use get_args(ModelString) instead, but it would much slower. NotImplementedError is raised when model_name is not recognized. The function corresponding to the selected model is assigned to the model, attribute.
  • The .run() method: It calls the function kept under the .model attribute. Note that this way, this attributes becomes a static method of the ModelString class (we'll return to this in a minute).
  • The .__str__() method: It's good to define it, as otherwise the class's instances would have non-informative string representation. This method is the reason we created the .model_name attribute, which is why the necessity to define this method adds a small overhead to instantiation due to the creation of the .model_name attribute.

As discussed above, model becomes a static method that calls the corresponding function. If you think about this, it occurs that… that we don't actually need the .run() method at all! Let's redefine the class in the following way:

class ModelStringSimpler:
    function_map: dict[ModelChoice, ModelFunc] = {
        "moving_average": run_moving_average,
        "exponential_smoothing": run_exponential_smoothing,
        "ARIMA": run_ARIMA,
        "TBATS": run_TBATS,
        "Prophet": run_prophet,
    }

    def __init__(
        self,
        model_name: ModelChoice
    ) -> None:
        if model_name not in self.function_map:
            raise NotImplementedError(
                f"Not implemented model: {model_name}"
            )
        self.run: ModelFunc = self.function_map[model_name]
        self.model_name = model_name

    def __str__(self) -> str:
        return f"Instance of {self.model_name} model"

These two definitions aren't equivalent, though both classes will provide the very same result. The latter will be slightly faster, as it's free of the additional overhead of calling self.function_map[model_name] via the .run() method. Instead, it directly calls the corresponding function (self.function_map[model_name]), thanks to keeping in the attribute of the same name, .run. We'll benchmark both approaches soon.

Note that we could use __slots__ for these classes, as they can be fixed in terms of its attributes: we don't need to enable the user to add new attributes to the class's instances.

Implementation based on enumerations

Let's define the corresponding class using enumerations. It could be something like this:

class ModelEnum(Enum):
    MOVING_AVERAGE = 0
    EXPONENTIAL_SMOOTHING = 1
    ARIMA = 2
    TBATS = 3
    PROPHET = 4

    def run(
        self,
        df: pd.DataFrame,
        horizon: int,
        params: dict[str, Any]
    ) -> pd.DataFrame:
        return [
            run_moving_average,
            run_exponential_smoothing,
            run_ARIMA,
            run_TBATS,
            run_prophet,
        ][self.value](df, horizon, params)

Let's analyze this class:

  • Instantiation: A class's instance is created when you're creating it's value, e.g., model = ModelEnum.PROPHET. Unlike a regular class, we can't use class attributes or additional instance attributes, hence we keep all the required information inside the .run() method.
  • The .run() method: It's a regular instance method that calls the corresponding function. I decided to use a list and get it's element by an index that corresponds to a model's value from the class definition. That's why you must use here values starting from 0 and incremented by 1. You could use a dictionary instead and use a name of the model (e.g., "PROPHET"), but it occurred to be slower. Note that unlike in the case of the string-based classes above, we need to get the corresponding function every time we call the method. This will mean additional overhead when we're calling this model more than once – in the string-based classes, the .run static method doesn't have to do this, so it doesn't have this overhead.
  • No implementation of the .__str__() method: We don't need to do this because enumerations have their default implementation of .__str__() and .__repr__(), and they're enough for our purposes. You'll get 'ModelEnum.PROPHET' or '', respectively.

In the above definitions, we followed a similar convention to that we used for our string-based classes. But enumerations are more powerful than that. Let me show how I would define such a class in my daily work.

First, notice that used a typical approach to enumerations, that is, we used integer values for each model. The truth is, we can use any value, not only integers. In our case, we can simplify the class substantially by doing this:

class ModelEnumSimple(Enum):
    MOVING_AVERAGE = run_moving_average
    EXPONENTIAL_SMOOTHING = run_exponential_smoothing
    ARIMA = run_ARIMA
    TBATS = run_TBATS
    PROPHET = run_prophet

Won't you agree this is the simplest among the four definitions? Everything is clear, and links between each model name and its corresponding function are clear as a bell.

The classes in action

We'll use some mocked values for the required arguments:


>>> df = pd.DataFrame()
>>> h = 10
>>> params = {}

Let's see whether the four classes work the same way:

>>> ModelString("Prophet").run(df, h, params)
pandas.core.frame.DataFrame
>>> ModelStringSimple("Prophet").run(df, h, params)
pandas.core.frame.DataFrame
>>> ModelEnum.PROPHET.run(df, h, params)
pandas.core.frame.DataFrame
>>> ModelEnumSimple.PROPHET(df, h, params)
pandas.core.frame.DataFrame

This is indeed what we expected. Note that the call to the ModelEnumSimple class is the simplest and the shortest (ignoring the class's names).

There's one more thing I like in the last class: its default repr representation. Let's see what representation each of the four classes has:

>>> ModelString("Prophet")
<__main__.ModelString at 0x1cf91584890>
>>> ModelStringSimple("Prophet")
<__main__.ModelStringSimple at 0x1cf90514950>
>>> ModelEnum.PROPHET

>>> ModelEnumSimple.PROPHET
 pandas.core.frame.DataFrame>

In order to show this, I didn't overload the .__repr__() method for any of the classes, so these are the default ones. This shows that enum.Enum offers very useful default string representation of its instances, using .__str__() and .__repr__(). You can read more about these two methods in the following article:

Python OPP, and Why repr() and str() Matter

In my opinion, the ModelEnumSimple class is definitely the simplest and the most concise among the four we've implemented— I hope you'll agree. In the next section, we'll benchmark the four models.

Benchmarks

Here, we will benchmark the four implementations, in two contexts: instantiation and running a model. If you'd like to learn more about benchmarking execution time in Python, feel invited to read these two articles:

Benchmarking Python code with timeit

Benchmarking Python Functions the Easy Way: perftester

In this article, I'll use IPython‘s timeit magic function for benchmarking.

During benchmarking, I will load a module that contains the definitions of the four class, along with these mock objects to be used as arguments:

df = pd.DataFrame()
h = 10
params = {}

These values don't matter at all, as the functions, as we know, do nothing – they take three arguments (we'll use the above values of df, h and params) and return an empty dataframe. That way, we will measure the overhead of calling a model via the four classes – which is something we indeed want to benchmark. In all benchmarks, we will use the Prophet model, although it makes no difference which model we use.

I will run the benchmarks on a Windows 10 Pro machine (4 logical and 8 physical cores, 32 GB of RAM), in the IPython console run in a PowerShell terminal in Visual Studio Code, in Python 3.11.6.

Let's start with comparing the time required to create an instance of each class. We'll use relatively many runs, that is, 20 repeats of an experiment, each creating 20 million instances in a raw (one after another, each subsequent one overwriting the previous one). Here are the results:

Benchmarks of instantiation of the four classes. Screenshot by author

Both enumeration classes are definitely faster – about three times – to instantiate. We can assume there are no differences between ModelEnum and ModelEnumSimple, even though there must be some very small overhead due to the former including the definition of the .run() method. The difference between ModelString and ModelStringSimple is also negligible.

Now, let's see how the four models compare in terms of the overhead time of calling the corresponding function (for the model of choice, Prophet in our experiment):

Benchmarks of calling models from the four classes. Screenshot by author

This time, we can see that the ModelStringSimple model is faster than ModelString, which is something we expected. In fact, this is why we defined the simpler model class.

Interestingly, the ModelEnum class is slower than these two classes when calling its .run() method¹. While the difference between it and ModelSimple is rather minor, the ModelStringSimple is visible faster. If not for the ModelEnumSimple class, we would have decided that the enumeration class is actually slower.

But this is where ModelEnumSimple shows its real power. We decided this was the simplest and the most concise class, and now we also see that it's also the fastest! Calling a model's function using this class is over twice faster than doing the same using ModelStringSimple and over 2.6 times faster than doing the same the remaining two classes.

So, clearly, among the four classes it is the one with the simplest and the most concise definition, ModelEnumSimple, which is also the fastest one. It's like it has no disadvantages when compared to the other classes. This very class would be my first choice in this exercise. Of course, we defined a simple class, and sometimes you need to implement more methods in such a class. You should design your tools based on the needs, so in such an instance, a custom class may occur to be a better choice than an enumeration.

Of course, we're talking about data science and forecasting, and running forecasting models can take a significant amount of time. In such instances, these small differences we observed, calculated in nanoseconds, can be so small that usually can be ignored. Sometimes, however, we can implement an enumeration class that's used so frequently that its overhead time can help us save quite some time – so don't assume that these small differences are universally ignorable. They often can be ignored, but this doesn't mean they always can.

Conclusion

In the above sections, I clearly indicated the ModelEnumSimple class as the one I'd choose, thanks to its advantages and practically no disadvantages (assuming that the class is used in a right context). So, let's summarize its advantages:

  1. Simplicity and readability. The ModelEnumSimple class maps names of the class members (that is, ModelEnumSimple.ARIMA or ModelEnumSimple.PROPHET), representing forecasting models, directly to their corresponding functions. This is a very readable approach, as it clearly shows which function corresponds to which model, without additional logic or structures, which the definitions of the other three classes had to use.
  2. Ease of use. Using an enumeration class (either ModelEnum or ModelEnumSimple) for function mapping prevents possible errors due to typos in string literals, which are more common in the string-based classes (ModelString and ModelStringSimple). Enum values can be easily checked by IDEs and/or at runtime, which reduces the risk of typing (or any other, for that matter) errors.
  3. Performance. The way ModelEnumSimple directly maps enums to the corresponding functions leads to significant enhancement of performance. Accessing an enum member is typically faster than key lookup in a dictionary, which is because enum members are resolved to their corresponding values at compile time, unlike dictionary keys, which are resolved at runtime.
  4. Error-Prone: ModelEnumSimple is less error-prone. Enums enforce a limited set of values that the user can choose from, which reduces runtime errors due to invalid inputs. This contrasts with the string-based approach, where mistyping a string key might lead to key errors or the need for additional error handling.

Overall, the ModelEnumSimple class combines the benefits of clear code with performance efficiency and error reduction. These qualities make it a robust choice, especially in scenarios where performance is crucial and errors in specifying model types need to be minimized.

Note that when you need to optimize code in terms of execution time (and memory, for that matter), you should first profile the code before implementing code optimization. However, if you can use an approach that leads to the most readable code that, at the same time, has a smaller overhead time, why even think of using a different approach? It's like all advantages being collected in one place – and this place is a class based on enum.Enum.

Remember, however, that the implementation matters. I presented both ModelEnum and ModelEnumSimple classes for a reason. I wanted to show you that it's not enough to just use an enumeration class. If you want to make it a readable class and minimize the overhead time of using it, you need to pay attention to the implementation. And this is not only due to runtime overhead, but also – often mainly – due to readability. Compare these two enumeration classes: ModelEnumSimple is not only much shorter, but also much more readable. The links between each model and its function is clear – which we can't say about ModelEnum, which uses a list with these functions indexed by the models' values. This is a way less intuitive approach.


We've covered the basics of Python enumerations. There's more to this topic, and you can read about this in the official Python documentation:

enum – Support for enumerations

You'll find there additional interesting tools, such as the StrEnum class; the IntEnum class; enum flags (enum.Flag); the auto() function; [enum.property](https://docs.python.org/3/library/enum.html#enum.property), a dedicated property decorator for enumerations, and more. Since these are more advanced enumeration tools, we'll discuss them in a dedicated article.

I hope I convinced you to use Python enumerations – or at least to try them. Their definitions can be much simpler than any other approach, and this is what makes me like enumerations so much. Once you learn how to use them – the knowledge from this article should be enough – you'll see that enumerations are intuitive, simple and readable.

Footnotes

¹ We haven't covered here quite some enumeration topics. The ModelEnum class could be also significantly improved by using the enum.IntEnum class, whose values work like integers. Thus, we wouldn't have to get a member's value by self.value, because self would suffice. We'll cover IntEnum enums in the subsequent article on Python enumerations, however, so we didn't discuss this approach in this article.

Tags: Enumeration Hands On Tutorials Oop Programming Python

Comment