Geometry Matching#

Geometry matching is the process of matching inferences to ground truths for computer vision workflows with a localization component, such as 2D and 3D object detection and instance segmentation. It is a building block for metrics like TP / FP / FN counts and any metrics derived from these, such as precision and recall.

While it may sound simple, geometry matching is surprisingly challenging and full of edge cases! In this guide, we'll focus on 2D object detection—specifically 2D bounding box matching—to learn about geometry matching algorithms.

API Reference: match_inferences, match_inferences_multiclass ↗

Algorithm Overview#

In a geometry matching algorithm, the following criteria must be met for a valid match:

The IoU between the inference and ground truth must be greater than or equal to a threshold
For multiclass workflows, inference label must match the ground truth label

Pseudocode: Geometry Matching

Loop through all images in your dataset;
Loop through all labels;
Get inferences and ground truths with the current label;
Sort inferences by descending confidence score;
Check against all ground truths and find a ground truth that results in maximum IoU;
Check for the following criteria for a valid match:
1. This ground truth is not matched yet AND
2. The IoU is greater than or equal to the IoU threshold;
Repeat 5-6 on the next inference;

Examples: Matching 2D Bounding Boxes#

Let's apply the algorithm above to the following examples of 2D object detection. Bounding boxes (see: BoundingBox) in the diagrams below use the following colors based on their type and the matching result:

example legends

This example contains two ground truth and two inference bounding boxes, each with the same label. The pair $(A, a)$ has high overlap (IoU of 0.9) and the pair $(B, b)$ has low overlap (IoU of 0.13). Let's find out what the matched results look like in this example with a IoU threshold of 0.5:

example 1

| Bounding Box | Score | IoU($\text{A}$) | IoU($\text{B}$) | | --- | --- | --- | --- | | $\text{a}$ | 0.98 | 0.9 | 0.0 | | $\text{b}$ | 0.6 | 0.0 | 0.13 |

Because inference $a$ has a higher confidence score than inference $b$ , it gets matched first. It is pretty clear that ground truth $A$ scores the highest IoU with inference $a$ , and IoU is greater than IoU threshold, so $a$ and $A$ are matched.

Next, inference $b$ gets compared against all ground truth bounding boxes. Once again, it is clear that ground truth $B$ scores the maximum IoU with inference $b$ , but this time IoU is less than the IoU threshold, so $b$ becomes an unmatched inference.

Now that we have checked all inferences, any ground truth bounding boxes that are not matched yet are marked as unmatched. In this case, ground truth $B$ is the only unmatched ground truth.

| Bounding Box(es) | Match Type | | --- | --- | | $(\text{A}, \text{a})$ | Matched Pair | | $\text{B}$ | Unmatched Ground Truth | | $\text{b}$ | Unmatched Inference |

Let's take a look at another example with multiple classes, Apple and Banana:

example 2

| Bounding Box | Class | Score | IoU($\text{A}$) | | --- | --- | --- | --- | | $\text{A}$ | `Apple` | — | — | | $\text{a}$ | `Apple` | 0.3 | 0.0 | | $\text{b}$ | `Banana` | 0.5 | 0.8 |

Each class is evaluated independently. Starting with Apple, there is one ground truth $A$ and one inference $a$ , but these two do not overlap at all (IoU of 0.0). Because IoU is less than the IoU threshold, there is no match for class Apple.

For class Banana, there is only one inference and no ground truths. Therefore, there is also no match for class Banana.

| Bounding Box(es) | Match Type | | --- | --- | | $\text{A}$ | Unmatched Ground Truth | | $\text{a}$ | Unmatched Inference | | $\text{b}$ | Unmatched Inference |

Here is another example with multiple inferences overlapping with the same ground truth:

example 3

| Bounding Box | Score | IoU($\text{A}$) | | --- | --- | --- | | $\text{a}$ | 0.5 | 0.8 | | $\text{b}$ | 0.8 | 0.5 |

Among the two inferences $a$ and $b$ , $b$ has a higher confidence score, so $b$ gets matched first. IoU between ground truth $A$ and $b$ is greater than the IoU threshold, so they become a match.

Inference $a$ is compared with ground truth $A$ , but even though IoU is greater than the IoU threshold, they cannot become a match because $A$ is already matched with $b$ , so inference $a$ remains unmatched.

| Bounding Box(es) | Match Type | | --- | --- | | $(\text{A}, \text{b})$ | Matched Pair | | $\text{a}$ | Unmatched Inference |

Finally, let's consider another scenario where there are multiple ground truths overlapping with the same inference:

example 4

| Bounding Box | Score | IoU($\text{A}$) | IoU($\text{B}$) | | --- | --- | --- | --- | | $\text{a}$ | 0.8 | 0.6 | 0.9 |

Inference $a$ has a higher IoU with ground truth $B$ , so $a$ and $B$ become matched.

| Bounding Box(es) | Match Type | | --- | --- | | $(\text{B}, \text{a})$ | Matched Pair | | $\text{A}$ | Unmatched Ground Truth |

Comparison of Matching Algorithms from Popular Benchmarks#

Geometry matching is a fundamental part of evaluation for workflows with localization. Metrics such as precision, recall, and average precision are built on top of these matches. The matching algorithm we've covered above is standard across various popular object detection benchmarks.

In this section, we'll examine the differences in matching algorithm from a few popular benchmarks:

PASCAL VOC 2012
COCO
Open Images V7

PASCAL VOC 2012#

The PASCAL VOC 2012 benchmark includes a difficult boolean annotation for each ground truth, used to differentiate objects that are difficult to recognize from an image. Any ground truth with the difficult flag and any inferences that are matched with a difficult ground truth will be ignored in the matching process. In other words, these ground truths and the inferences that are matched with them are excluded in the matched results. Hence, models will not be penalized for failing to detect these difficult objects, nor rewarded for detecting them.

Another difference that is noteworthy is how PASCAL VOC outlines the IoU criteria for a valid match. According to the evaluation section (4.4) in development kit doc, IoU must exceed the IoU threshold to be considered as a valid match.

Pseudocode: PASCAL VOC Matching

Loop through all images in your dataset;
Loop through all labels;
Get inferences and ground truths with the evaluating label;
Sort inferences by descending confidence score;
Check against all ground truths and find a ground truth that results in maximum IoU;
Check for the following criteria for a valid match:
1. This ground truth is not matched yet AND
2. The IoU is greater than the IoU threshold;
If matched with a difficult ground truth, ignore;
Repeat 5-7 on the next inference;

COCO#

COCO (Common Objects in Context) labels its ground truth annotations with an iscrowd field to specify when a ground truth includes multiple objects. Similarly to how difficult ground truths are treated in PASCAL VOC, any inferences matched with these iscrowd ground truths, are excluded from the matched results. This iscrowd flag is intended to avoid penalizing models for failing to detect objects in a crowded scene.

Pseudocode: COCO Matching

Loop through all images in your dataset;
Loop through all labels;
Get inferences and ground truths with the evaluating label;
Sort inferences by descending confidence score;
Check against all ground truths and find a ground truth that results in maximum IoU;
Check for the following criteria for a valid match:
1. This ground truth is not matched yet AND
2. The IoU is greater than or equal to the IoU threshold;
If matched with a iscrowd ground truth, ignore;
Repeat 5-7 on the next inference;

Open Images V7#

The Open Images V7 Challenge evaluation introduces two key differences in its matching algorithm.

The first is with the way that the images are annotated in this dataset. Images are annotated with positive image-level labels, indicating certain object classes are present, and with negative image-level labels, indicating certain classes are absent. Therefore, for fair evaluation, all unannotated classes are excluded from evaluation in that image, so if an inference has a class label that is unannotated on that image, this inference is excluded in the matching results.

An example of non-exhaustive labeling — An example of non-exhaustive image-level labeling from Open Images V7

The second difference is with handling group-of boxes, which is similar to iscrowd annotation from COCO but is not just simply ignored. If at least one inference is inside the group-of box, then it is considered to be a match. Otherwise, the group-of box is considered as an unmatched ground truth. Also, multiple correct inferences inside the same group-of box still count as a single match:

An example of group-of boxes — An example of `group-of` boxes from Open Images V7

Pseudocode: Open Images V7 Matching

Loop through all images in your dataset;
Loop through all positive image-level labels;
Get inferences and ground truths with the evaluating label;
Sort inferences by descending confidence score;
Check against all non-ground-of ground truths and find a ground truth that results in maximum IoU;
Check for the following criteria for a valid match:
1. This ground truth is not matched yet AND
2. The IoU is greater than or equal to the IoU threshold;
If matched with a difficult ground truth, ignore;
Repeat 5-7 on the next inference;
Loop through all unmatched inferences;
Check against all group-of ground truths and find a ground truth that results in maximum IoU;
Check for the matching criteria (6);
Repeat 10-11 on the next unmatched inference;

Limitations and Biases#

The standard matching algorithm appears to have an undesirable behavior when there are many overlapping ground truths and inferences with high confidence scores due to its greedy matching. Because it optimizes for higher confidence score and maximum IoU, it can potentially miss valid matches by matching a non-optimal pair, resulting in a poorer matching performance.

Example: Greedy Matching

An example of greedy matching

Bounding Box	Score	IoU( $A$ )	IoU( $B$ )
$a$	0.7	0.0	0.6
$b$	0.8	0.5	0.7

When there are two ground truths and two inferences, one inference $b$ with a higher score overlaps well with both ground truths $A$ and $B$ , and the other one, $a$ , with a lower score overlaps well with just one ground truth $B$ . Because the IoU between $B$ and $b$ is greater than IoU between $A$ and $b$ , inference $b$ is matched with ground truth $B$ , causing inference $a$ to fail to match.

This greedy matching behavior results in a higher false positive count in this type of scenario. Ideally, inference $a$ matches with ground truth $B$ , and inference $b$ matches with ground truth $A$ , resulting in no FPs.

Another behavior to note here is that it is possible to get different matching results depending on the ground truth order when there are multiple ground truths overlapping with an inference with the equal IoU or depending on the inference order when there are multiple inferences overlapping with a ground truth with the equal confidence score.

Example: Different Matching Results When Ground Truth Order Changes

An example of ground truth ordering

Bounding Box	Score	IoU( $A$ )	IoU( $B$ )
$a$	0.7	0.0	0.5
$b$	0.7	0.5	0.5

The three pairs of ground truth and inference have same IoU and both inferences have same confidence score.

If the ground truths are ordered as $[A, B]$ and the inferences as $[a, b]$ , inference $a$ is matched with $B$ first, so inference $b$ gets matched with $A$ .

If the inference order changes to $[b, a]$ , now inference $a$ may or may not be matched with any ground truth. The matched result can change depending on the ground truth order. If $A$ is evaluated before $B$ , inference $b$ is matched with $A$ , and $a$ can be matched with $B$ . However, if $B$ comes before $A$ , inference $b$ is matched with $B$ instead, leaving inference $a$ with no match.

As discussed earlier, the standard matching algorithm compares model inferences with annotated ground truths in two fundamental aspects: localization and classification. The comparison generates results, which entail matched pairs, unmatched ground truths, and unmatched inferences; however, these results do not reveal why certain matches were unsuccessful. A myriad of reasons can lead to a failed match, such as poor localization due to insufficient overlap (IoU), or good localization but poor classification. Surfacing these types of errors is profoundly useful during model debugging. For instance, confused matches where localization succeeded (i.e. IoU above the IoU threshold) but classification failed (i.e. mismatching label values) can be identified by matching unmatched inferences with unmatched ground truths once more after the initial matching. Confused matches are useful for creating a confusion matrix to focus on a detection model's classification performance.