NaN Values in the Python Standard Library

Author:Murphy  |  View: 26302  |  Time: 2025-03-23 12:28:46

PYTHON PROGRAMMING

Photo by cyrus gomez on Unsplash

NaN stands for Not-a-Number. Thus, a NaN object represents what this very name conveys – something that isn't a number. It can be a missing value but also a non-numerical value in a numerical variable. As we shouldn't use a non-numerical value in purely numerical containers, we indicate such a value as not-a-number, NaN. In other words, we can say NaN represents a missing numerical value.

In this article, we will discuss NaN objects available in the Python standard library.


NaN values occur frequently in numerical data. If you're interested in details of this value, you will find them, for instance, here:

NaN – Wikipedia

In this article, we will not discuss all the details of NaN values.¹ Instead, we will discuss several examples of how to work with NaN values in Python.

Each programming language has its own approach to NaN values. In programming languages focused on computation, NaN values are fundamental. For example, in R, you have NULL (a counterpart of Python's None), NA (for not available), and NaN (for not-a-number):

Screenshot from an R session. Image by author.

In Python, you have None and a number of objects representing NaN. It's worth to know that Pandas differentiates between NaN and NaT, a value representing missing time. This article will discuss NaN values in the standard library; NaN (and NaT, for that matter) in the mainstream numerical Python frameworks – such as NumPy and Pandas – will be covered in a future article.

If you haven't worked with numerical data in Python, you may not have encountered NaN at all. However, NaN values are ubiquitous in Python programming, so it's important to know how to work with them.

Introduction to NaN in Python

When you work with a list object, you can use both numerical and non-numerical values. So, it can be [1, 2, "three"] or [1, 2, None] and the like. You can even do this:

>>> [1, "two", list, list(), list.sort]
[1, 'two', , [], ]

So, lists accept any objects. If you want to perform numerical calculations on such lists, you can, but you need to adapt the code:

>>> x = [1, 2, "three"]
>>> sum(x)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> sum(xi for xi in x if isinstance(xi, (float, int)))
3

As you can see, objects that are not numbers can stay whatever they are here, and you can still perform numerical computations – it's not simple or concise code, but it works.

NaN – not a number – stands for a missing numerical value.

This is not the case, however, with container types that only accept objects of a particular numerical type. This can be array.array, numpy.array or a pandas series (and equivalently, a numerical column of a pandas dataframe). If defined as numerical, they don't accept non-numerical values, with one exception: a NaN value.

NaN means a non-numerical value, but as you will see soon, its type is numerical – float, to be precise. Why? For the simple reason that thanks to that, you can use NaN in numerical containers.

Why not get rid of them?

Why not just get rid of all such values? Why bother in the first place?

One common use case for NaN values is data analysis and visualization. For example, consider a dataset that includes several columns with missing values for some rows. You cannot remove a cell from a data frame, so you can either keep all these rows and handle somehow the missing values or remove all rows with one or more NaN values. Removing rows with missing values is a common practice, but it has a cost: It removes non-NaN values for some of the columns, and it is rarely wise to discard information we already have.

Another use case for NaN values is in error handling. For example, if a function expects a numerical input but receives a string or other non-numeric value, it may return a NaN value to indicate that the input was invalid. We will see an example soon. This allows the calling code to handle the error gracefully, rather than raising an exception or returning an unexpected result. By using NaN values to represent errors or missing data, it's possible to perform calculations and processing on datasets that may include invalid or missing values. You could return None instead, but in Python None can mean a variety of things while NaN conveys a more specific piece of information, one directly related to the numerical character of values – that this is not a number.

When you work with numerical values and tools, you should know how to use NaN. However, when your application is of a general character and thus does not need a numerical framework (like NumPy or Pandas), quite often you will see that NaN can be simply ignored, or represented by None. If this makes code simpler without any sacrifice, consider doing this.

Examples

NaN values can mean a variety of thing:

  • A regular missing value – it was not provided, did not come through, things like that. In your notebook, you would indicate it as "NA" or "N/A": Not Applicable. Not Applicable as in, you can't apply it. It's missing, and so we need to indicate it as missing. You can use NaN.
  • A result from a function that got incorrect values of arguments of numerical types. Instead of throwing an error, the function returns NaN.
  • A mistake. This can be an input error; believe me or not, input errors are more frequent than most of us imagine, and they can affect subsequent analyses quite a lot. For instance, many people still think that spinach is a great source of iron. Well, it isn't, so why so many think so? It came from an input error – a misplaced decimal point. You can use NaN to indicate data elements with mistakes – unless you're certain you can correct the mistake.
  • A comment. It's a string value that has been mistakenly entered into a numerical variable. This can happen when a person entering data wants to explain why a particular value is missing, such as "Unclear reading" or "I overslept." Although these are still missing data, they provide more information than simply a blank value. Sometimes this information is important, but other times it is not. For numerical computation, however, the value of such a comment is usually minor or nonexistent. Therefore, if you need to use a numerical container for this variable, you can use NaN to represent the comment.

These are four examples, but other situations are also possible. Although each situation is slightly different, from a numerical computation point of view, they are all the same: the value is not a number. We need to do something with it, and using NaN is a common option.

NaN in the standard library

Python offers several types of NaN values, and we will discuss them below. In this article, we focus on the standard library, but be aware that if you use a numerical framework, it most likely has its own implementation (or rather representation) of NaN and functions / methods that work with it.²

Although the Python standard library is not the most suitable tool for numerical computation, it does offer both numerical containers and dedicated tools. An example of a numerical container is the array module with its array.array container type. While it's not the best tool to work with directly, it enables you to work with Cython efficiently without using non-standard-library tools like NumPy. An example of a dedicated numerical tool from the standard library is the math module:

math – Mathematical functions – Python 3.10.8 documentation

You have two ways to use NaN values in the Python standard library: float("nan") and math.nan. I have read many books on Python, but I don't recall seeing either of these values mentioned. My memory is not perfect, but I suspect that even if these values are mentioned in some books, they are not given much attention. As a result, I believe that many data scientists, and even Python developers outside of the Data Science realm, are unaware of float("nan") and math.nan, even though they may be familiar with np.nan, which is the standard way to represent NaN values in NumPy arrays and pandas DataFrames (see below). A possible reason is that these two are not as widely used as np.nan.

Both of these NaN objects are values, both of the float type:

The type of both float("nan") and math.nan is float. Image by author.

By the way, at this point you shouldn't be surprised to learn that the type of np.Nan is also float.

It's important to remember how two NaN values compare:

Python-repl">>>> float("nan") is float("nan")
False
>>> float("nan") == float("nan")
False
>>> math.nan is math.nan
False
>>> math.nan == math.nan
False

This is because we only know that NaN is not a number, but we have no idea what sort of value it is. In one case, it can be a string; in another case, it can be a different string; still another, it can be a long dictionary; and yet in another, it can be a missing value, as NaN is frequently used for NA. So, we cannot assume that two NaN values are equal to each other. This can make quite a difference when working with numerical vectors and matrices:

>>> [1, 2, 3] == [1, 2, 3]
True
>>> [1, 2, float("nan")] == [1, 2, float("nan")]
False

However, if we create a new NaN object, we will see this:

>>> NaN = float("nan")
>>> NaN is NaN
True
>>> NaN == NaN
False
>>> NaN = math.nan
>>> NaNmath = math.nan
>>> NaNmath is NaNmath
True
>>> NaNmath == NaNmath
False

Have you noticed that even though the is comparison returns True, the == comparison returns False? So, the object is itself, but it's not equal to itself…

Remember about this behavior when using a newly defined sentinel like NaN or NaNmath above. I know it's tempting, and I myself have done this more than once. Hence, do so only if this behavior is what you want to achieve.

Let's return to this example:

>>> x = [1, 2, "three"]
>>> sum(xi for xi in x if isinstance(xi, (float, int)))
3

and let's see our NaN values in action. Instead of adjusting the sum() function, let us replace "three" with a NaN value. In order to do so, we can use the following function:³

from collections.abc import Sequence
from typing import Any

def use_nan(__x: Sequence[float | Any]) -> Sequence[float]:
    """Replace non-numerical values with float("nan").

    >>> NaN = float("nan")
    >>> use_nan([1, 2, 3])
    [1, 2, 3]
    >>> use_nan([1., 2., 3.])
    [1.0, 2.0, 3.0]
    >>> use_nan([1, 2., 3.])
    [1, 2.0, 3.0]
    >>> use_nan([1, 2, "str"])
    [1, 2, nan]
    >>> use_nan((1, 2, str))
    (1, 2, nan)
    >>> use_nan(1., 2, Any, str, (1, 2,)))
    (1.0, 2, nan, nan, nan)
    """
    return type(__x)([xi
                      if isinstance(xi, (float, int))
                      else float("nan")
                      for xi in __x])

Now, let's use the function right before using the sum() function, which, as we saw above, doesn't accept non-numerical values:

>>> x = use_nan([1, 2, float("nan")])
>>> sum(x)
nan
>>> import math
>>> sum([1, 2, math.nan])
nan

Hah? What's happening? We used NaN values in order to make sum() work, and indeed it does not throw an error the way it did before. But it simply returns nan

From a mathematical point of view, it makes perfect sense: Adding a number to not a number will not give a number, will it? This is the reason we got nan above. But is it what we want to achieve?

It depends. Usually, we have a choice of how we want to handle NaN values. The most typical approach is to drop them. This is done by removing whole rows or columns from a dataframe or cells from a variable. Another approach— frequently used in statistics – is to fill in missing values with other values; this is called imputation.

This article doesn't aim to go into detail about these methods. You can read about them in a number of statistics books, but also in various articles; the two below describe using such methods in Python:

3 Ultimate Ways to Deal With Missing Values in Python

How to Handle Missing Data in Python? [Explained in 5 Easy Steps]

As we saw above, methods from the standard library do work with NaN values, but they will simply return nan, which is both a repr and a str representation of float("nan"). So, we need to remove the not-a-number values manually from the container. Unfortunately, given how comparisons of NaN values work, the following will not work

>>> x = use_nan([1, 2, "three"])
>>> sum(xi for xi in x if xi is not float("nan"))
nan
>>> sum(xi for xi in x if xi != float("nan"))
nan

So, nan again. How come?

We know already what's happening: NaN returns False when being compared to other NaN values, which is done in if xi is not NaN and if xi != NaN. Hence, we need a dedicated function to check for NaN values. The standard library offers such a function, in the math module:

>>> sum(xi for xi in x if not math.isnan(xi))
3

Conclusion

We discussed using NaN values in the Python standard library. This knowledge should be enough for you to work with NaN values in the standard library tools. Numerical frameworks can implement their own NaN values, however. An example is NumPy's np.nan.

As already mentioned, I don't believe that as a Python programmer, you will frequently use these two NaN sentinels from the standard library. You should, however, know how to work with them anyway, as there may be times when you need to use them, even if you are using a numerical framework like NumPy. Also, I don't think installing NumPy only to use np.nan would be a wise thing to do. I hope this article will help you handle such situations.

Footnotes

¹ It's worth noting that, line None, NaN values are sentinel values:

Sentinel value – Wikipedia

² An example is np.nan and Numpy functions that work with data with NaN values, such as np.nansum(), np.nanmean(), np.nanmax(), np.nanmin() and np.nanstd().

³ The function's docstring contains a number of doctests; you can read about this fantastic documentation-testing tool, which can also be used for unit testing, in the following article:

Python Documentation Testing with doctest: The Easy Way


Thanks for reading. If you enjoyed this article, you may also enjoy other articles I wrote; you will see them here. And if you want to join Medium, please use my referral link below:

Join Medium with my referral link – Marcin Kozak

Tags: Data Science Hands On Tutorials Nan Python Python Programming

Comment