Data Analysis with Named Lambda Functions

Author:Murphy  |  View: 28091  |  Time: 2025-03-22 22:28:27

PYTHON PROGRAMMING

Lambda functions can be very useful in data science – and not only anonymous ones. Photo by Daniel Monteiro on Unsplash

Technically, you should not use named lambda functions, since this is like naming a function that is anonymous in nature:

Should You Define Named Python Functions Using lambda?

In actual code, especially in production, I never do things like that – and you shouldn't, either. Lambdas are reserved for specific situations – and these specific situations do not include naming anonymous functions.

This is what I wrote in the above-cited article:

And I do hope I have convinced you. Even if both types of function definition seem equally fine to you – even then I would not use named lambda definitions. This is because using them, you do not gain anything, at the same time risking that others will disagree with you. And if even that does not convince you, remember that doing so you're going against PEP8.

This is all true. But…

Is data analysis code an exception?

It can be. What I mean is, analysis code that no one will ever see is an exception, as you really don't need to follow rules of clean code. The point is to make the code work and forget it.

Let's be clear about one thing. I don't mean analysis code that you save to be reused, whether in a Python script or a notebook. I mean code you use in a Python session – and then you simply close the session and forget the code. The code's gone and no one will see it ever again.

If this is the case, who cares? Why shouldn't I define a named lambda function, if it's faster and more readable than a regular def-based definition? I see no obstacles for doing so, and so, to be honest, I often define named lambda functions.

Until now, no one knew. Now you know.

Why?

Couldn't I just use a regular function definition? Why do I do that?

Yes, I could. But I do this for a couple of reasons:

  • Quite often implementing a lambda function is faster than implementing the corresponding def-defined function. It's several characters fewer, which can make a difference when the definition fits one line.
  • I often find lambda one-liners more readable than the corresponding def-based function definitions. It's a subjective thing, maybe related to a huge number of lambda functions that I've implemented.
  • I love Python. I love its syntax. I love its syntactic sugar. And I love Python one-liners! Implementing an interesting lambda one-liner often gives me fun and pleasure, and I like the joy of programming in Python.
  • I like the lambda expression. I understand such lambda functions. Sometimes they even seem simpler to me than the corresponding traditional def definition.

This can look like this:

Surely, I could define n_of_unique in the traditional way:

def n_of_unique(d, col):
    return len(d[col].unique())

I could even add type hints:

def n_of_unique(d: pd.DataFrame, col: str) -> int:
    return len(d[col].unique())

In an interactive Python session, all these three versions work the very same way. When I'm analyzing data interactively in a Python (or IPython) session, I often don't bother: I go for a lambda function. Since I assign it to a name (here, n_of_unique), it's not really an anonymous function anymore.

If I work with a df dataframe throughout the session, and I don't want to provide its name as an argument every time, sometimes I do this:

n_of_unique = lambda col: len(df[col].unique())

Note that this function uses the dataframe currently stored under the df name. If this dataframe changes, the function will work with the changed dataframe:

This works like that because this version of n_of_unique() uses a variable from its outer scope, that is, the df dataframe.

Of course, you can do the same using a regular def definition. lambda function definition does not enable you to do more; they just can be be shorter.

Conclusion

I'm not going to claim that one should use named lambda functions in interactive Python sessions – rather that there's nothing truly wrong in it. It's just one-time analysis code, and such function definitions are quite handy; why not use them, then?

So, when analyzing data, don't be afraid to use named lambda functions, given that you find such definitions clear, like me. For me, they are both clear and useful, as they require less characters typed in one line. Sure, you can write a one-line def definition, but it's less readable in my eyes:

# lambda one-liner
n_of_unique = lambda d, col: len(d[col].unique())

# def one-liner
def n_of_unique(d, col): return d, col: len(d[col].unique())

If you don't like lambda function definitions or consider them unreadable, don't use them. Most of all, don't use them when you're going to share the code – even with your future self.

We're discussing a particular use case of lambda functions. Even if you don't want to use them that way, do remember that lambda definitions are very useful in quite a few other scenarios. One example is non-default sorting data containers, like here:

>>> x = [('a', 1), ('b', 3), ('c', 2)]
>>> sorted(x)
[('a', 1), ('b', 3), ('c', 2)]
>>> sorted(x, key=lambda i: i[1])               
[('a', 1), ('c', 2), ('b', 3)]

As you see, we defined how to sort the x list of tuples: by the second element. With key unprovided, sorting would use the first elements of the tuples. You will find many more situations throughout Python in which it's good to know how to use lambda. Surely, even here you can use a def-defined function:

>>> def take_2nd_el(t): return t[1]
>>> sorted(x, key=take_2nd_el)
[('a', 1), ('c', 2), ('b', 3)]

or you can use an item getter:

>>> from operator import itemgetter
>>> sorted(x, key=itemgetter(1))
[('a', 1), ('c', 2), ('b', 3)]

but still I do prefer the concise lambda definition above.


Don't underestimate Python's anonymous functions, created using the lambda statement. It's quite likely that the more you use them, the more readable they will become to you, and the more you will want to use them.

If you find yourself in this situation and there's no one around to look over your shoulder at what you're doing, try using named lambda functions in an interactive Python session. Who knows, maybe you'll find them as useful as I do?

And if you do, you don't have to boast about it to everyone you know. Don't be like me, letting everyone know that I use named lambda functions in Python, even though I myself wrote an article claiming that you shouldn't do this… Enjoy using them, but do so in the comfort of your own computer.

Tags: Anonymous Function Data Analysis Data Science Lambda Expressions Python

Comment