Computer Vision#

In this document we will review best practices when setting up Kolena datasets for computer vision problems.


Supported File Data Formats#

The Kolena SDK supports uploading of data in the Pandas DataFrame format.

The Kolena web app supports the following file formats.

Format Description
.csv Comma-separated values file, ideal for tabular data
.parquet Apache Parquet format, efficient for columnar storage
.jsonl JSON Lines format, suitable for handling nested data

Supported file types are:

Type Format
Images jpg, jpeg, png, gif, bmp and other web browser supported images
Video mov, mp4, mpeg and other web browser supported video types
Point Cloud .pcd

Using the locator#

Kolena uses references to files stored in your cloud storage to render them. Refer to "Connecting Cloud Storage" for details on how to configure this.

Computer Vision data is best visualized in Studio using the Gallery mode. To enable the Gallery view store references to images in a column named locator. locator can be used as the unique identifier of the datapoint which is also referenced by your model results.

Kolena supports jpg, jpeg, png, gif, bmp and other web browser supported images.

Using fields#

You can add additional information about your image by adding columns to the .CSV file with the metadata name and values in each row. Below is an example datapoint:

locator ground_truth image_brightness image_contrast
s3://kolena-public-examples/cifar10/data/horse0000.png horse 153.994 84.126


Using thumbnails

In order to improve the loading performance of your image data, you can upload compressed versions of the image with the same dimensions as thumbnails. This results in an improved Studio experience due to faster image loading when filtering, sorting or using embedding sort.

Thumbnails are configured by adding a field called thumbnail_locator to the data, where the value points to a compressed version of the locator image.

If you wanted to add a thumbnail to the classification data shown above it would look like:

locator thumbnail_locator ground_truth image_brightness image_contrast
s3://kolena-examples/data/h0.png s3://kolena-examples/data/thumbnail/h0.png horse 153.994 84.126

Including Assets and Annotations#

Kolena supports the inclusion of overlay annotations and asset files as fields in a dataset.

We recommend using the annotation and asset dataclasses for ease of annotation and asset manipulation:

# Creates a single-row DataFrame with an image datapoint, a `bbox` annotation field, and a `mesh` asset file.

import pandas as pd
from kolena.annotation import BoundingBox
from kolena.asset import MeshAsset

locator = "s3://kolena-public-examples/coco-2014-val/data/COCO_val2014_000000000294.jpg"
bbox = BoundingBox(top_left=(27.7, 69.83), bottom_right=(392.61, 427))
mesh = MeshAsset(locator="s3://kolena-public-examples/a-large-dataset-of-object-scans/data/mesh/00004.ply")
df = pd.DataFrame([dict(locator=locator, bbox=bbox, mesh=mesh)])

# DataFrame can now be directly uploaded as a dataset
from kolena.dataset import upload_dataset
upload_dataset("my-dataset", df, id_fields=["locator"])

# Or serialized to CSV and uploaded through the web UI.
# If serializing to CSV please use the provided `` method. The Pandas provided `to_csv` method
# does not adhere to the JSON spec, and may serialize malformed objects.
from import dataframe_to_csv

dataframe_to_csv(df, "my-dataset.csv", index=False)

Specific Workflows#

2D Object Detection#


You can follow this example 2D object detection โ†—

annotations are used to visualize overlays on top of images. To render 2D bounding boxes you can use LabeledBoundingBox or BoundingBox annotations.

Consider a .csv file containing ground truth data in the form of bounding boxes for an Object Detection problem.

locator label min_x max_x min_y max_y
s3://kolena-public-examples/coco-2014-val/data/COCO_val2014_000000369763.jpg motorcycle 270.77 621.61 44.59 254.18
s3://kolena-public-examples/coco-2014-val/data/COCO_val2014_000000369763.jpg car 538.03 636.85 8.86 101.93
s3://kolena-public-examples/coco-2014-val/data/COCO_val2014_000000369763.jpg trunk 313.02 553.98 12.01 99.84

This looks like:

from kolena.annotation import LabeledBoundingBox
bboxes = [
    LabeledBoundingBox(top_left=(270.77, 44.59), bottom_right=(621.61, 254.18), label="motorcycle"),
    LabeledBoundingBox(top_left=(538.03, 8.86), bottom_right=(636.85, 101.93), label="car"),
    LabeledBoundingBox(top_left=(313.02, 12.01), bottom_right=(553.98, 99.84), label="trunk"),


Using bounding box categories

If you wish to analyze your model results based on specific characteristics of your bounding boxes you can provide values representing those characteristics using additional key value pairs. For example if location of a bounding box is important you can construct your LabeledBoundingBox like this

    LabeledBoundingBox(top_left=(313.02, 12.01), bottom_right=(553.98, 99.84), label="trunk", location="bottom-left")


When uploading .csv files for datasets that contain annotations, assets or nested values in a column use the dataframe_to_csv() function provided by Kolena to save a .csv file instead of pandas.to_csv(). pandas.to_csv does not serialize Kolena annotation objects in a way that is compatible with the platform.

Uploading Model Results#

Model results contain your model inferences as well as any custom metrics that you wish to monitor on Kolena. The data structure of model results is very similar to the structure of a dataset with minor differences.

  • Ensure your results are using the same unique ID field (the locator for instance) you have selected for your dataset.

  • Use ScoredBoundingBox or ScoredLabeledBoundingBox to pass on your model inferences confidence score for each bounding box.

  • Use compute_object_detection_results to compute your metrics that are supported by Kolena's Object Detection Task Metrics.

  • OR include the following columns in your results. The values for each of the columns is a List[ScoredLabeledBoundingBox].

    Column Name Description
    matched_inference Inferences that were matched to a ground truth.
    unmatched_inference Inferences that were not matched to a ground truth.
    unmatched_ground_truth Ground truths with no matching inference.
  • Leverage task metrics by adding the following columns to your CSV: count_TP, count_FP, count_FN, count_TN.


Once you have constructed your DataFrame use the upload_object_detection_results wrapper function to simplify the upload process and enable the Object Detection Task metrics automatically.


Follow the 2D Object Detection result upload example for optimal setup.

3D Object Detection#

annotations are used to visualize overlays on top of images. To render 3D Bounding boxes you can use BoundingBox3D or LabeledBoundingBox3D


Using bounding box categories

If you wish to analyze your model results based on specific characteristics of your bounding boxes you can provide values representing those characteristics using additional key value pairs. For example, if location of a bounding box is important you can construct your LabeledBoundingBox3D like this

    center=(313.02, 12.01, 15.5),
    dimensions=(553.98, 99.84, 231.17),
    rotations=(12, 16, 25),


When uploading .csv files for datasets that contain annotations, assets or nested values in a column use the dataframe_to_csv() function provided by Kolena to save a .csv file instead of pandas.to_csv(). pandas.to_csv does not serialize Kolena annotation objects in a way that is compatible with the platform.

Uploading Model Results#

Model results contain your model inferences as well as any custom metrics that you wish to monitor on Kolena. The data structure of model results is very similar to the structure of a dataset with minor differences.


Once you have constructed your DataFrame use the upload_object_detection_results wrapper function to simplify the upload process and enable the Object Detection Task metrics automatically.


Follow the 3D Object Detection result upload script on how to setup both 3D and 2D bounding boxes in your results for multi-modal 3D object detection data.


Videos are best represented in Kolena using the Gallery view. To setup the Gallery view, add links to your video files stored on the cloud under the locator column. Kolena automatically looks for that column name and renders your video files correctly. Kolena supports mov, mp4, mpeg and other web browser supported video types.


Annotation visualization over videos only works on videos with constant frame rates. For the best experience, include a frame_rate field on your video datapoints in frames per second as a float or int number.

Setting up bounding box annotations on videos#

To overlay bounding boxes on videos, you will need to define a new class based on LabeledBoundingBox or BoundingBox annotations where a frame_id property is added. You are able to add additional properties if you wish which can be used for filtering and visualizations.

# video bounding box for pedestrians with pedestrian id,
# risk of collision and label indicating if bounding box
# is occluded.
class PedestrianBoundingBox(LabeledBoundingBox):
    frame_id: int
    ped_id: str
    occlusion: str
    risk: Optional[str] = None

    def set_risk(self, risk: str) -> None:
        object.__setattr__(self, "risk", risk)


Kolena depends on the frame_id for rendering and it needs to be a zero-indexed integer.

To overlay bounding boxes with inferences (used when uploading model results) you will need a new class based on ScoredBoundingBox or ScoredLabeledBoundingBox. The requirements for rendering on a video is similar to the previous example:

# video bounding box with inference (represented by score),
# frame_id, pedestrian id, occlusion category,
# time_to_event (in this case potential collusion of
# a pedestrian and vehicle), failed_to_infer for capturing
# no inference cases

class ScoredPedestrianBoundingBox(ScoredLabeledBoundingBox):
    frame_id: int
    ped_id: str
    occlusion: str
    time_to_event: Optional[float]
    failed_to_infer: bool


Follow the Crossing Pedestrian Detection example on how to setup video based dataset and model results.