NaN Values in the Python Standard Library
PYTHON PROGRAMMING

NaN
stands for Not-a-Number. Thus, a NaN
object represents what this very name conveys – something that isn't a number. It can be a missing value but also a non-numerical value in a numerical variable. As we shouldn't use a non-numerical value in purely numerical containers, we indicate such a value as not-a-number, NaN
. In other words, we can say NaN
represents a missing numerical value.
In this article, we will discuss NaN
objects available in the Python standard library.
NaN
values occur frequently in numerical data. If you're interested in details of this value, you will find them, for instance, here:
In this article, we will not discuss all the details of NaN
values.¹ Instead, we will discuss several examples of how to work with NaN
values in Python.
Each programming language has its own approach to NaN
values. In programming languages focused on computation, NaN
values are fundamental. For example, in R, you have NULL
(a counterpart of Python's None
), NA
(for not available), and NaN
(for not-a-number):

In Python, you have None
and a number of objects representing NaN
. It's worth to know that Pandas differentiates between NaN
and NaT
, a value representing missing time. This article will discuss NaN
values in the standard library; NaN
(and NaT
, for that matter) in the mainstream numerical Python frameworks – such as NumPy and Pandas – will be covered in a future article.
If you haven't worked with numerical data in Python, you may not have encountered NaN
at all. However, NaN
values are ubiquitous in Python programming, so it's important to know how to work with them.
Introduction to NaN
in Python
When you work with a list
object, you can use both numerical and non-numerical values. So, it can be [1, 2, "three"]
or [1, 2, None]
and the like. You can even do this:
>>> [1, "two", list, list(), list.sort]
[1, 'two', , [], ]
So, lists accept any objects. If you want to perform numerical calculations on such lists, you can, but you need to adapt the code:
>>> x = [1, 2, "three"]
>>> sum(x)
Traceback (most recent call last):
File "", line 1, in
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> sum(xi for xi in x if isinstance(xi, (float, int)))
3
As you can see, objects that are not numbers can stay whatever they are here, and you can still perform numerical computations – it's not simple or concise code, but it works.
NaN
– not a number – stands for a missing numerical value.
This is not the case, however, with container types that only accept objects of a particular numerical type. This can be array.array
, numpy.array
or a pandas
series (and equivalently, a numerical column of a pandas
dataframe). If defined as numerical, they don't accept non-numerical values, with one exception: a NaN
value.
NaN
means a non-numerical value, but as you will see soon, its type is numerical – float
, to be precise. Why? For the simple reason that thanks to that, you can use NaN
in numerical containers.
Why not get rid of them?
Why not just get rid of all such values? Why bother in the first place?
One common use case for NaN
values is data analysis and visualization. For example, consider a dataset that includes several columns with missing values for some rows. You cannot remove a cell from a data frame, so you can either keep all these rows and handle somehow the missing values or remove all rows with one or more NaN
values. Removing rows with missing values is a common practice, but it has a cost: It removes non-NaN values for some of the columns, and it is rarely wise to discard information we already have.
Another use case for NaN
values is in error handling. For example, if a function expects a numerical input but receives a string or other non-numeric value, it may return a NaN
value to indicate that the input was invalid. We will see an example soon. This allows the calling code to handle the error gracefully, rather than raising an exception or returning an unexpected result. By using NaN
values to represent errors or missing data, it's possible to perform calculations and processing on datasets that may include invalid or missing values. You could return None
instead, but in Python None
can mean a variety of things while NaN
conveys a more specific piece of information, one directly related to the numerical character of values – that this is not a number.
When you work with numerical values and tools, you should know how to use NaN
. However, when your application is of a general character and thus does not need a numerical framework (like NumPy or Pandas), quite often you will see that NaN
can be simply ignored, or represented by None
. If this makes code simpler without any sacrifice, consider doing this.
Examples
NaN
values can mean a variety of thing:
- A regular missing value – it was not provided, did not come through, things like that. In your notebook, you would indicate it as "NA" or "N/A": Not Applicable. Not Applicable as in, you can't apply it. It's missing, and so we need to indicate it as missing. You can use
NaN
. - A result from a function that got incorrect values of arguments of numerical types. Instead of throwing an error, the function returns
NaN
. - A mistake. This can be an input error; believe me or not, input errors are more frequent than most of us imagine, and they can affect subsequent analyses quite a lot. For instance, many people still think that spinach is a great source of iron. Well, it isn't, so why so many think so? It came from an input error – a misplaced decimal point. You can use
NaN
to indicate data elements with mistakes – unless you're certain you can correct the mistake. - A comment. It's a string value that has been mistakenly entered into a numerical variable. This can happen when a person entering data wants to explain why a particular value is missing, such as "Unclear reading" or "I overslept." Although these are still missing data, they provide more information than simply a blank value. Sometimes this information is important, but other times it is not. For numerical computation, however, the value of such a comment is usually minor or nonexistent. Therefore, if you need to use a numerical container for this variable, you can use
NaN
to represent the comment.
These are four examples, but other situations are also possible. Although each situation is slightly different, from a numerical computation point of view, they are all the same: the value is not a number. We need to do something with it, and using NaN
is a common option.
NaN in the standard library
Python offers several types of NaN
values, and we will discuss them below. In this article, we focus on the standard library, but be aware that if you use a numerical framework, it most likely has its own implementation (or rather representation) of NaN
and functions / methods that work with it.²
Although the Python standard library is not the most suitable tool for numerical computation, it does offer both numerical containers and dedicated tools. An example of a numerical container is the array
module with its array.array
container type. While it's not the best tool to work with directly, it enables you to work with Cython efficiently without using non-standard-library tools like NumPy. An example of a dedicated numerical tool from the standard library is the math
module:
You have two ways to use NaN
values in the Python standard library: float("nan")
and math.nan
. I have read many books on Python, but I don't recall seeing either of these values mentioned. My memory is not perfect, but I suspect that even if these values are mentioned in some books, they are not given much attention. As a result, I believe that many data scientists, and even Python developers outside of the Data Science realm, are unaware of float("nan")
and math.nan
, even though they may be familiar with np.nan
, which is the standard way to represent NaN values in NumPy arrays and pandas DataFrames (see below). A possible reason is that these two are not as widely used as np.nan
.
Both of these NaN
objects are values, both of the float
type:

By the way, at this point you shouldn't be surprised to learn that the type of np.Nan
is also float
.
It's important to remember how two NaN
values compare:
Python-repl">>>> float("nan") is float("nan")
False
>>> float("nan") == float("nan")
False
>>> math.nan is math.nan
False
>>> math.nan == math.nan
False
This is because we only know that NaN
is not a number, but we have no idea what sort of value it is. In one case, it can be a string; in another case, it can be a different string; still another, it can be a long dictionary; and yet in another, it can be a missing value, as NaN
is frequently used for NA
. So, we cannot assume that two NaN
values are equal to each other. This can make quite a difference when working with numerical vectors and matrices:
>>> [1, 2, 3] == [1, 2, 3]
True
>>> [1, 2, float("nan")] == [1, 2, float("nan")]
False

However, if we create a new NaN
object, we will see this:
>>> NaN = float("nan")
>>> NaN is NaN
True
>>> NaN == NaN
False
>>> NaN = math.nan
>>> NaNmath = math.nan
>>> NaNmath is NaNmath
True
>>> NaNmath == NaNmath
False
Have you noticed that even though the is
comparison returns True
, the ==
comparison returns False
? So, the object is itself, but it's not equal to itself…
Remember about this behavior when using a newly defined sentinel like NaN
or NaNmath
above. I know it's tempting, and I myself have done this more than once. Hence, do so only if this behavior is what you want to achieve.

Let's return to this example:
>>> x = [1, 2, "three"]
>>> sum(xi for xi in x if isinstance(xi, (float, int)))
3
and let's see our NaN
values in action. Instead of adjusting the sum()
function, let us replace "three"
with a NaN
value. In order to do so, we can use the following function:³
from collections.abc import Sequence
from typing import Any
def use_nan(__x: Sequence[float | Any]) -> Sequence[float]:
"""Replace non-numerical values with float("nan").
>>> NaN = float("nan")
>>> use_nan([1, 2, 3])
[1, 2, 3]
>>> use_nan([1., 2., 3.])
[1.0, 2.0, 3.0]
>>> use_nan([1, 2., 3.])
[1, 2.0, 3.0]
>>> use_nan([1, 2, "str"])
[1, 2, nan]
>>> use_nan((1, 2, str))
(1, 2, nan)
>>> use_nan(1., 2, Any, str, (1, 2,)))
(1.0, 2, nan, nan, nan)
"""
return type(__x)([xi
if isinstance(xi, (float, int))
else float("nan")
for xi in __x])
Now, let's use the function right before using the sum()
function, which, as we saw above, doesn't accept non-numerical values:
>>> x = use_nan([1, 2, float("nan")])
>>> sum(x)
nan
>>> import math
>>> sum([1, 2, math.nan])
nan
Hah? What's happening? We used NaN
values in order to make sum()
work, and indeed it does not throw an error the way it did before. But it simply returns nan
…
From a mathematical point of view, it makes perfect sense: Adding a number to not a number will not give a number, will it? This is the reason we got nan
above. But is it what we want to achieve?
It depends. Usually, we have a choice of how we want to handle NaN
values. The most typical approach is to drop them
. This is done by removing whole rows or columns from a dataframe or cells from a variable. Another approach— frequently used in statistics – is to fill in missing values with other values; this is called imputation.
This article doesn't aim to go into detail about these methods. You can read about them in a number of statistics books, but also in various articles; the two below describe using such methods in Python:
3 Ultimate Ways to Deal With Missing Values in Python
How to Handle Missing Data in Python? [Explained in 5 Easy Steps]
As we saw above, methods from the standard library do work with NaN
values, but they will simply return nan
, which is both a repr
and a str
representation of float("nan")
. So, we need to remove the not-a-number values manually from the container. Unfortunately, given how comparisons of NaN
values work, the following will not work
>>> x = use_nan([1, 2, "three"])
>>> sum(xi for xi in x if xi is not float("nan"))
nan
>>> sum(xi for xi in x if xi != float("nan"))
nan
So, nan
again. How come?
We know already what's happening: NaN
returns False
when being compared to other NaN
values, which is done in if xi is not NaN
and if xi != NaN
. Hence, we need a dedicated function to check for NaN
values. The standard library offers such a function, in the math
module:
>>> sum(xi for xi in x if not math.isnan(xi))
3
Conclusion
We discussed using NaN
values in the Python standard library. This knowledge should be enough for you to work with NaN
values in the standard library tools. Numerical frameworks can implement their own NaN
values, however. An example is NumPy's np.nan
.
As already mentioned, I don't believe that as a Python programmer, you will frequently use these two NaN
sentinels from the standard library. You should, however, know how to work with them anyway, as there may be times when you need to use them, even if you are using a numerical framework like NumPy. Also, I don't think installing NumPy only to use np.nan
would be a wise thing to do. I hope this article will help you handle such situations.
Footnotes
¹ It's worth noting that, line None
, NaN
values are sentinel values:
² An example is np.nan
and Numpy functions that work with data with NaN
values, such as np.nansum()
, np.nanmean()
, np.nanmax()
, np.nanmin()
and np.nanstd()
.
³ The function's docstring contains a number of doctests; you can read about this fantastic documentation-testing tool, which can also be used for unit testing, in the following article:
Thanks for reading. If you enjoyed this article, you may also enjoy other articles I wrote; you will see them here. And if you want to join Medium, please use my referral link below: