Understanding Intersection Over Union for Object Detection (Code)

Author:Murphy | View: 28573 | Time: 2025-03-23 12:28:39

Evaluation of object detection models boils down to one thing: determining if a detection is valid or not

Determining whether detection is valid requires understanding the Intersection Over Union metric (IoU).

This article covers the following:

Basics of Iou – What is IoU?
How to compute (theoretically and in Python code) IoU for a single pair of detection and ground truth bounding boxes
Computing IoU for multiple sets of predicted and ground truth bounding boxes.
How to interpret IoU value?

What is Intersection over Union (IoU)?

IoU is a core metric for the evaluation of object detection models. It measures the accuracy of the object detector by evaluating the degree of overlap between the detection box and the ground truth box.

A ground truth box or label is an annotated box showing where the object is (the annotation is often done by hand, and the ground truth box is considered the object's actual position).
The detection box or predicted bounding box is the prediction from the object detector.

Formally, IoU is the area of intersection between the ground truth (gt) and predicted box (pd) divided by the union of the two boxes.

Example 1: Computing IoU for a Single Pair of Detection and Ground Truth

Let's start off with a simple example. Computing IoU for one detection and a ground truth.

To do that, we will need the top-left (x1, y1) and bottom-right (x2, y2) coordinates of the two boxes.

In the Figure below (right), we have two bounding boxes:

Predicted bounding box (p-box): (px1, py1, px2, py2) = (859, 31, 1002, 176) Ground truth bounding box (t-box): (tx1, ty1, tx2, ty2) = (860, 68, 976, 184)

Left: Image with 14 ground truths (blue boxes) and 12 predictions (red boxes). Right: a zoom-in to a single pair of ground truth and predicted box (Annotations done by author and the orchard image sourced from https://zenodo.org/record/3712808).

Important: In computer vision, the convention is that:

The x-axis is the horizontal dimension of an image, with increasing values from left to right and
The y-axis is the vertical dimension of an image with increasing values from top to bottom (this is not the case for a standard Cartesian system)

Step 1: Calculating the area of the two boxes

This Step calculates the area of the predicted box and ground truth. It is just the length multiplied by the width.

_predicted_area = (1002–859) (176–31) = 20735 ground_truth_area = (976–860) (184–68) = 13456_

Step 2: Find the intersection points

This Step is for finding the top-left(A) and bottom-right(B) coordinates for the intersection area.

That can be found by: top_left = max(px1, tx1), max(py1, ty2) bottom_right = min(px2, tx2), min(py2, ty2)

In our case, _top_left (A) = max(859, 860), max(31, 68) = (860, 68) bottom_right (B) = min(1002, 976), min(176, 184) = (976, 176)_

Showing the intersection region between two boxes (Image by Author).

Step 3: Compute Intersection Area

Since we have the intersection points, we can easily compute the area of the intersection rectangle as follows.

_intersection_area = (976–860) * (176–68) = 12528_

Step 4: Calculate the IoU value

IoU = intersection_area / union_area,

where _unionarea is the sum of the areas of the two boxes subtract the intersection area. That is,

_union_area = (area of ground truth + area of predicted box)-intersection_area = (20735+13456) – 12528 = 21663_

Therefore, IoU = 12528/21664 = 0.578286558

Let's put that into Python code

The following code can be used to compute IoU for a single pair of ground truth and predicted box. After the code snippet, let's break down the ideas used.

import numpy as np

def compute_iou(box1, box2):
    """
    This function computes the intersection-over-union of two boxes.
    Both boxes are expected to be in (x1, y1, x2, y2) format.
    where (x1, y1) is the top_left coordinates and 
    (x2, y2) is the bottom_right coordinates

    Arguments:
        box1 4 by 1 NumPy Array: The first box.
        box2 4 by 1 NumPy Array: The second box.

    Returns:
        iou (float): The intersection-over-union value for the two boxes.
    """
    # Calculate the area of each box
    area1 = np.prod(box1[2:] - box1[:2])
    area2 = np.prod(box2[2:] - box2[:2])
    print("Area of box 1 and box2, respectively: ", area1, area2)

    # Calculate the intersection coordinates (top left and bottom right)
    top_left = np.maximum(box1[:2], box2[:2])
    bottom_right = np.minimum(box1[2:], box2[2:])
    print("Top left and bottom right of intersection rectangle: ", top_left, bottom_right)

    # Calculate the intersection area
    intersection = np.prod(np.clip(bottom_right - top_left, a_min=0, a_max=None))
    print("Intersection area: ", intersection)
    # Calculate the union area
    union = area1 + area2 - intersection
    print("Union area: ", union)

    # Calculate the IoU
    iou = intersection / union if union > 0 else 0.0

    return iou

# Calling compute_iou with overlapping boxes
detection = np.array([859, 31, 1002, 176])
label = np.array([860, 68, 976, 184])
iou_value = compute_iou(detection, label)
print("IoU:", iou_value)

Output:

Area of box 1 and box2, respectively:  20735 13456
Top left and bottom right of intersection rectangle:  [860  68] [976 176]
Intersection area:  12528
Union area:  21663
IoU: 0.5783132530120482

Let's call the compute_iou() function the second time with non-overlapping boxes.

# Calling compute_iou with non-intersecting boxes
detection = np.array([810, 744, 942, 865])
label = np.array([109,563,217,671])
iou_value = compute_iou(detection, label)
print("IoU:", iou_value)

Output:

Area of box 1 and box2, respectively:  15972 11664
Top left and bottom right of intersection rectangle:  [810 744] [217 671]
Intersection area:  0
Union area:  27636
IoU: 0.0

Breaking down the code:

NumPy Vectorization allows us to implement operations like np.prod(), np.maximum(), np.minimum(), np.clip(), addition and subtraction on the arrays without the need to loop through the array elements or index to a single element.
np.clip() function limits or "clips" the values in an array within a specified range. In our case, np.clip(bottom_right - top_left, a_min=0, a_max=None) ensures that the resulting width and height values of the intersection are non-negative by setting negative values to 0.

Example 2: Computing IoU for Multiple Pairs of Ground Truth and Predicted Boxes

In this example, we want to calculate IoU values for all ground-truth and predicted box pairs in the Figure below (far left).

The image contains 12 predictions (red boxes) and 14 ground truths (blue boxes)

Left: An image with ground truths and detections plotted, Middle: Predictions and Right: Ground truths (Annotations done by author and the orchard image sourced from https://zenodo.org/record/3712808).

Computing IoUs for all pairs of detections and ground truths can be done easily by modifying the initial code, as shown below.

def compute_ious(boxes1, boxes2):
    """
    This function computes intersection-over-union of boxes.
    Both sets of boxes are expected to be in (x1, y1, x2, y2) format
    where (x1,y1) is the top-left coordinates and 
    (x2,y2) is the bottom right coordinates
    Arguments:
        boxes1: M by 4 NumPy array
        boxes2: N by 4 NumPy array
    Returns:
        iou MxN Numpy Matrix - containing the pairwise
            IoU values for every element in boxes1 and boxes2
    """
    # Compute area for all combination of boxes in boxes1 and boxes2
    area1 = np.prod(boxes1[:, 2:] - boxes1[:, :2], axis=1)
    area2 = np.prod(boxes2[:, 2:] - boxes2[:, :2], axis=1)

    # Top left and bottom right of the intersection for all box pairs
    top_left = np.maximum(boxes1[:, None, :2], boxes2[:, :2])  # NxMx2 Array
    bottom_right = np.minimum(boxes1[:, None, 2:], boxes2[:, 2:])  # NxMx2 Array

    # Compute intersection for all box pairs
    intersection = np.prod(np.clip(bottom_right - top_left, a_min=0, a_max=None), 2)

    return intersection / (area1[:, None] + area2 - intersection)

# Define detections and ground_truths
detections = np.array([[374,627,538,792],
[330,308,501,471],
[474,14,638,181],
[810,744,942,865],
[58,844,204,993],
[905,280,1022,425],
[887,412,1018,543],
[0,871,68,1008],
[859,31,1002,176],
[698,949,808,1023],
[0,400,47,505],
[234,0,314,58]])

ground_truths = np.array([[331,303,497,469],
[385,624,543,782],
[809,743,941,875],
[883,410,1024,556],
[918,287,1024,425],
[860,68,976,184],
[109,563,217,671],
[0,401,60,515],
[51,833,207,989],
[0,867,80,1024],
[273,877,403,1007],
[701,939,821,1024],
[905,608,1021,724],
[471,17,629,175]])
# Call compute_ious() function 
ious = compute_ious(detections, ground_truths)
print(ious)

Output (formatted for better viewing):

Output of compute_ious() function (Image by Author)

The output shows that:

One detection (Index 12) did not overlap any ground truth.
3 ground truths with no overlap with any detection – ground truths at indices 7, 11 and 13.
There are 11 detections with IoU>50% with ground truths.

Let's also break down a piece of the code that may not be super clear:

In the code above, the use of None in the indexing operations is a technique in NumPy to introduce new axes or dimensions to the array. It is often used to broadcast arrays of different shapes together to perform element-wise operations or to enable certain calculations.