Python Set Is Way Faster Than List, True Or False?

A few weeks ago, I wrote another article to explain the mechanisms and logic behind some popular "Python Tricks". One of them is to use Python Set but not List when we can.
After this article became popular, many readers asked me or argued that Python Set isn't always fast. That's absolutely true. Therefore, I decided to write this article to have a deep dive into the Data Structures of Python List and Set.
In this article, I'll first use practical code to compare the performance between Python List and Set in different scenarios. Then, I'll introduce the data structures they used, which are Dynamic Array and Hash Table. Based on the characteristics of these data structures, I'll explain why Python List or Set has better performance in certain scenarios.
1. Performance

Let's start with the experiments of performance. It is not right to say Python Set has a higher performance than Python List. We need to consider different scenarios, such as creation, look-up, appending and deletion.
It is much more convenient to use Jupyter Notebook to do this testing. So, we can use %timeit
magic command to easily evaluate the elapsed time.
Creation – List Win (2x faster)
To test the creation performance, we can simply use range(10000)
to generate 10,000 numbers. Please note that this is a generator, but we can create a list or a set from this generator.
# Create list for 100 times
%timeit -n 100 my_list = list(range(10000))
# Create set for 100 times
%timeit -n 100 my_set = set(range(10000))

It can be seen that creating a set takes approximately twice as long as creating a list. This is because the data structure of a set takes more time and space than a list. This will be discussed in the next section.
Look up – Set Win (1000x faster)
Before we can test the look up performance and the later test cases, we need to create a list and a set properly. Apart from that, we also need some numbers for testing purposes. For example, we need a number to be looked up in the list and set. Let's generate 1,000 of them, which should be enough.
import random
# Creat a list and a set
my_list = list(range(10000))
my_set = set(my_list)
# Random 1000 numbers for testing purpose
test_numbers = random.sample(range(10000), 1000)
Now, we can test the look-up performance using the following code in Jupyter Notebook.
%timeit -n 100 for num in test_numbers: num in my_list
%timeit -n 100 for num in test_numbers: num in my_set

Since 1ms = 1000μs, the look-up performance of Python Set is 1000x faster than Python List.
Appending – List Win (0.2x faster)
Now, let's test appending. We need another test number dataset because our set was created from numbers 1 to 10,000. Our current test numbers are randomly chosen from the same range for testing lookup performance. However, we need test numbers from outside this range. Otherwise, we make no effect on the set because we can't append duplicated items in a set.
# Random 1000 numbers from 10,000 to avoid duplicates in the set
test_numbers = random.sample(range(10000, 20000), 1000)
# Insertion into list
%timeit -n 100 for num in test_numbers: my_list_copy.append(num)
# Insertion into set
%timeit -n 100 for num in test_numbers: my_set_copy.add(num)

This time List won, but not much, only 0.2x faster. It is fair to say that the performances are about to be the same.
Deletion – Set Win (800x faster)
To test the deletion performance, we need our original test numbers back because we need to delete the numbers where they exist in the List and Set.
# Create the test numbers again
test_numbers = random.sample(range(10000), 1000)
# Deletion from list
%timeit -n 100 for num in test_numbers: my_list.remove(num) if num in my_list else None
# Deletion from set
%timeit -n 100 for num in test_numbers: my_set.discard(num)

This time Python Set won again, roughly 800x faster. This is because the deleting action is kind of similar to the look up. The number needs to be found in the list or set and then removed afterwards.
However, the performance of the set was not as fast as that of the look-up, which was 1000x faster. The reason is that removing an item from a list is a little bit easier than a set.
Now, we are clear about the performance comparison in different scenarios. Let's talk about why.

2. List v.s. Set – Data Structure Comparison

The data structures of Python List and Set are Dynamic Array and Hash Table. It is the differences between them that lead to the performance comparison results.
For the rest of the comparison, although they are between Dynamic Array and Hash Table, I'll still use Python List and Python Set to make it simple and contextual.
2.1 Structures
Python List makes use of continuous blocks of memory. The items in a Python List are stored adjacently. Therefore, the structure of it is very compact and effective.
It's worth mentioning that the index of a list doesn't need to be stored explicitly. Instead, when we use an index to access an item in a list, the index will be used as input to calculate the memory location. A simple example is that if we need to access the 5th item, the memory location will be _5 * item_length_inbytes.
Python Set will create a Hash table for all its items. There will be hashed keys for all the items, which will be calculated using a particular hash function.

Therefore, when we look up a particular item from a set, the hash function can help to convert the item into its key using the hash function and then use the key to find the item directly in O(1).
The Lookup and Deletion Performance
This is why the performance of looking up an item in a set is way faster than in a list. Similarly, to delete an item, the main action is to locate the item. So, the performance of deleting an item in a set is also much faster than a list.
The Creation Performance
Also because of the structure, building a hash table is more expensive than allocating a block of continuous memory. Therefore, the performance of creating a list is slightly faster.
2.2 Resizing
When a Python List grows, a new block of memory will need to be allocated, and some internal pointers need to be updated, but generally, it's not a very expensive action.
When a Python Set grows, all the existing items need to be re-hashed. That means the "dictionary" needs to be rebuilt.
The Appending Performance
This is why appending or inserting items into a list is slightly faster than a set. Because re-hasing usually takes more time, especially when there are a large amount of items.
Summary

In this article, we compare the performance of Python List and Set in different scenarios. We've seen that in some of the scenarios, Python List can be even faster than Python Set. However, it needs to be highlighted that when the performance of Python Set is better, it is usually thousands of times faster, while the other way around, Python List can only be slightly faster in the other scenarios.
I hope this article may satisfy some of your curiosity and facilitate your decision when coding!