NP-What? Complexity Types of Optimization Problems Explained

Author:Murphy  |  View: 28197  |  Time: 2025-03-23 13:01:58
Complex building. Image created with Midjourney by the author.

How come the shortest path problem is easy to solve, but the traveling salesman problem is not? What are the mathematical ideas on this? How to determine if a problem will take an unmanageable number of steps if its size increases? In this post you will learn the basics on this topic. And if you want to get serious with this, I've included a short note on one of the millennium prize problems related to this topic at the end of the post.

Before we get started with NP hardness, you should know the basics of time complexity. If you are familiar with time complexity, Big O notation, and worst case analysis, you can skip the following section.


Time Complexity

When we work with computers and write programs, we often deal with problems that can be solved in different ways. One important thing we need to consider is how efficient these solutions are. Time complexity helps us understand how fast an algorithm runs as the size of the problem it's solving gets bigger.

Big O notation can be compared with labeling the algorithm with a simple sticker that tells us how long the algorithm takes to finish based on how many things we're dealing with. It's a way to describe how the number of steps of an algorithm grows relative to the input size of the problem.

Note: Time complexity essentially relates to the number of steps you take, instead of actual time, so it's kind of bad naming. Otherwise you could use a faster computer and the same algorithm.

Labeling boxes (algorithms) with a sticker: how fast are you? Image by author.

We usually focus on the worst-case scenario because we want to be sure that no matter what input we give the algorithm, it won't take longer than a certain amount of time. This helps us make sure our solution is reliable even when things get tough.

If you are searching for a specific page of a book and your algorithm goes from beginning to the end of the book, worst case would be it's the final page. Image by author.

Just like we choose the fastest route when driving somewhere, we want to pick the most efficient algorithm for our problem. We compare algorithms based on their time complexities. An algorithm that runs faster (has a lower time complexity) is like taking the quicker route to our destination. As mentioned before, a faster algorithm needs less steps to find the solution in the worst-case scenario.

Now, let's explore these ideas with some real-world examples to make them clearer.

Constant Time: O(1)

Imagine you're planning a trip to a nearby park. You live right next to it, so no matter how many friends you invite, the time it takes to walk there remains the same. Whether it's just you or a group of 10, the time required to get to the park is constant – it doesn't change based on the number of people.

Walking to a park takes about the same amount of time disregarding the number of people. Image by author.

A programming example of constant time is using a key to find the corresponding value in a dictionary.

Linear Time: O(n)

Now, think about planning a picnic and gathering all your friends. As you invite each friend, you need to call or message them individually. So, if you're inviting 10 friends, you make 10 calls; if you're inviting 50 friends, you make 50 calls. The time it takes to contact your friends grows linearly as the number of friends increases.

Inviting one or 10 friends makes a difference: it takes around 10 times the amount of time to invite 10 friends. Image by author.

If you loop once over all items in a list, it takes linear time.

Logarithmic Time: O(log n)

In the park you will play a game where you need to find a hidden treasure. The game gives you clues that help you narrow down the search area. Each clue helps you eliminate half of the park. As you find more clues, the search area becomes smaller. Another way to look at it is this: if the park is 100 m², it takes at most 7 steps to find the treasure (2⁷ = 128). Adding another square meter to the park doesn't matter, it still takes 7 steps (101 < 128). Until we hit the 128, we'll never need an extra step for the worst case (what is worst case here?).

Reduce the search space with 50% every step. An extra square meter will not make that much of a difference. Image by author.

This is similar to how binary search works, where the number of possible locations reduces by half with each clue, leading to a logarithmic time complexity.

Quadratic Time: O(n²)

You're trying to plan a road trip to visit various tourist spots in a city. To figure out the shortest routes between every pair of locations, you need to compare each location with every other location. So, if you have 5 locations, you make 5 4 = 20 comparisons; if you have 10 locations, you make 10 9 = 90 comparisons. The number of comparisons grows quadratically as the number of locations increases.

mylist = [1, 2, 3, 4, 5]

for n in mylist:
    for m in mylist:
        print(n*m) 

A nested for loop is an example that takes quadratic time. You're looping over all elements twice, so it takes n n iterations. The code sample above will print 5 5 = 25 numbers.

Factorial Time: O(n!)

Last but not least: factorial time. Imagine you're organizing a grand travel adventure with your friends. You want to plan the best route that visits all the countries on your wish list. However, finding the optimal route involves considering every possible permutation of the countries. As you add more countries to your list, the number of possible routes grows factorially. For example, if you have 3 countries, there are 3! = 6 possible routes (ABC, ACB, BAC, BCA, CAB, CBA). If you have 4 countries, there are 4! = 24 routes, and so on.

In this scenario, as you increase the number of countries on your list, the time it takes to consider all possible routes increases dramatically due to the factorial growth. This mirrors the factorial time complexity, where the time required grows incredibly fast as the input size increases. Just as planning an itinerary becomes overwhelming as you add more countries, factorial time complexity becomes infeasible for large problem sizes due to its rapid growth.

We can visualize time complexities in a graph. The green ones are fast, while the orange and red ones are hard to handle when n increases. You should try to avoid those if possible.

Number of operations related to input size for different Big O time complexities. Image by author.

Let's discover how this relates to NP-hardness.


NP-Hardness

Constant, linear, and quadratic time are examples of polynomial time. Polynomial time is of the form O(nˣ). If we compare polynomial time with exponential and factorial time, you can see there's a big difference for large values of n:

Look at the pink numbers, they show how the exponential and the factorial time complexity functions increase as the size of data increases. Click to enlarge. Image by author.

As you can see in the table, there is a huge difference between polynomial time and exponential and factorial time. Polynomial time is reasonably fast for large values of n, it doesn't exceed 1 second for n < 100, while exponential and factorial time are unmanageable. (Although if the value of n becomes really large polynomial time can still take quite some time.)

The concept of NP-hardness helps classify problems based on their computational complexity. There are four categories of problems, and the easiest distinction is the one between P and NP problems.

P Problems

P problems, where the P stands for polynomial, are solvable in polynomial time. In other words, their solutions can be found relatively quickly, and the time it takes to solve them grows at most as a polynomial function of the problem size.

An example of a mathematical optimization problem that is a problem in P is the shortest path problem. How to get from point A to point B while minimizing the distance? Dijkstra's algorithm is an algorithm that can solve this problem in polynomial time (or less with a priority queue).

What's the shortest path from A to B? Image by author.

NP Problems

On the other hand, NP problems (or nondeterministic polynomial problems) encompass a broader range of challenges. These are problems for which a proposed solution can be efficiently verified in polynomial time. However, finding the solution itself might require exponential or even factorial time, making them much more difficult to solve compared to problems in P. In other words: If you are dealing with an NP problem that has a large size, it would be stupid to try to brute force it.

An NP problem is the Hamiltonian path problem: Given a graph, is there a path that visits each vertex exactly once and returns to the starting vertex? If someone claims they found a Hamiltonian path, you can verify it by checking if it indeed visits every vertex exactly once.

It's easy to verify that the yellow path is Hamiltonian. Image by author.

Within the realm of NP problems, we encounter two subcategories: NP-complete and NP-hard problems.

NP-complete Problems

Among NP problems, NP-complete problems stand out as the most challenging. An NP-complete problem is one that is in NP and exhibits a special property: if you could find a polynomial-time algorithm to solve it, you could solve all problems in NP in polynomial time. In essence, NP-complete problems are "hardest" in NP because they are at least as hard as any other problem in NP.

One of the most famous NP-complete problems is the traveling salesman problem (TSP), in which you need to find the shortest route while visiting all given locations once. The number of possible routes can be calculated with n! where n is the number of locations to visit. In previous posts, I coded solutions for TSP using mixed integer programming (second example) and simulated annealing (first example). Closely related is the Chinese Postman problem, for which you need to visit every edge of a graph at least once.

TSP: visit every node. Chinese postman problem: visit every edge at least once. Both are NP-complete problems. Image by author.

NP-hard Problems

NP-hard problems, while related, are distinct from NP-complete problems. An NP-hard problem is one that is at least as hard as the hardest problems in NP, regardless of whether it is in NP itself. In other words, NP-hard problems don't necessarily have efficient verification processes like those in NP. Instead, they serve as a benchmark for problems that are extremely difficult to solve on a computational level.

The halting problem asks whether, given a program and an input, the program will halt (stop executing) or run forever on that input. It's undecidable, which means there's no algorithm that can solve it for all cases. The halting problem is NP-hard but not in NP because its solutions can't be efficiently verified. In the next code snippet you find two easy examples of the halting problem, for other programs it can be problematic to determine.

# Example 1. Program with the following code will keep running
while True:
  continue

# Example 2. Program with only a print statement will halt after printing
print('Halt')

In summary, the landscape of computational complexity encompasses problems that range from easily solvable to incredibly challenging. While P problems are efficiently solvable, NP problems introduce a layer of complexity, with NP-complete problems representing the apex of computational difficulty within the NP class. Additionally, NP-hard problems provide insights into the boundaries of computational feasibility, even if they are not directly in the NP class.

If we assume that P ≠ NP (more on this in the last section), this is what the Euler diagram looks like for the set of problems in P, NP, NP-complete and NP-hard. Image by author.

Unfortunately, NP problems are everywhere in real life. For example, optimizing routes for delivery trucks, scheduling tasks efficiently, designing electronic circuits, and even folding proteins are all instances of NP problems. The intractability of these problems makes finding optimal solutions a huge challenge: they are notoriously challenging to solve for large inputs. This difficulty often leads to the development of approximation algorithms, heuristics, and specialized techniques to find solutions that might not be optimal but are acceptable within certain bounds.

How to write history? Continue reading to find out! Photo by Natalia Y. on Unsplash

Millennium Prize Problem: P versus NP

NP hardness is related to one of the unsolved millennium prize problems. The problem is easy to understand. If you can prove that P = NP or that P ≠ NP, you solved it! What does this mean? As you know by now, P problems take way less steps to solve than NP-complete problems for large n. But it has never been proven that P problems are truly different from NP problems, which means that it's not for certain that there exists no polynomial time algorithm to solve an NP problem. How can this problem be solved? There are two possible outcomes:

  1. If you can find a polynomial time algorithm that solves an NP-complete problem (e.g. the traveling salesman problem), you have proven that P = NP.
  2. If you can prove that there exists no polynomial time algorithm to solve a specific NP problem, you have proven that P ≠ NP. You might need to come up with a new NP-complete problem to make this possible.

Proving the first point will shake up the world, because internet security is build upon the idea of NP hardness. If a polynomial time algorithm can be found to crack codes, it would be disastrous. Many scientists believe the second outcome is true, and that there exists no polynomial time algorithm that solves NP-complete problems. But this has never been proven. If you want to write history, this is your chance!


Conclusion

Delving into the intricacies of computational complexity shows the challenges of problem solving in computer science. Time Complexity allows us to assess algorithm efficiency as problem sizes grow. Through this lens, we've explored scenarios like constant, linear, logarithmic, quadratic, and factorial complexities.

Shifting to NP-hardness, we navigate the landscape of problems with varying degrees of complexity. P versus NP problems stand out – solvable in polynomial time for P, while NP problems offer quick verification but often demand exponential or factorial time for getting to the solution. NP-complete and NP-hard problems emerge as the pinnacles of computational challenge. NP-complete problems encompass the "hardest" in NP, offering shortcuts to solve all NP problems efficiently. NP-hard problems, not confined to NP, represent the zenith of computational complexity.

Lastly, the enigma of P versus NP – an elusive millennium prize problem – holds the potential to reshape our comprehension of computational complexity. With far-reaching implications, the quest to prove P = NP or P ≠ NP beckons pioneers to shape the course of history.

Related

Why Every Data Scientist Should Learn Mathematical Optimization

Five ways to combine Mathematical Optimization and Machine Learning

Exact Algorithm or Heuristic?

Tags: Getting Started Math Mathematical Optimization Np Hard Time Complexity

Comment