Computer Vision#
In this document we will review best practices when setting up Kolena datasets for computer vision problems.
Basics#
Supported File Data Formats#
The Kolena SDK supports uploading of data in the Pandas
DataFrame
format.
The Kolena web app supports the following file formats.
Format | Description |
---|---|
.csv |
Comma-separated values file, ideal for tabular data |
.parquet |
Apache Parquet format, efficient for columnar storage |
.jsonl |
JSON Lines format, suitable for handling nested data |
Supported file types are:
Type | Format |
---|---|
Images | jpg , jpeg , png , gif , bmp and other web browser supported images |
Video | mov , mp4 , mpeg and other web browser supported video types |
Point Cloud | .pcd |
Using the locator
#
Kolena uses references to files stored in your cloud storage to render them. Refer to "Connecting Cloud Storage" for details on how to configure this.
Computer Vision data is best visualized in Studio using the Gallery mode.
To enable the Gallery view store references to images in a column named locator
. locator
can be used as
the unique identifier of the datapoint which is also referenced by your model results.
Kolena supports jpg
, jpeg
, png
, gif
, bmp
and other web browser supported images.
Using fields#
You can add additional information about your image by
adding columns to the .CSV
file with the metadata name and values in each row.
Below is an example datapoint:
locator | ground_truth | image_brightness | image_contrast |
---|---|---|---|
s3://kolena-public-examples/cifar10/data/horse0000.png |
horse | 153.994 | 84.126 |
Tip
Using thumbnails
In order to improve the loading performance of your image data, you can upload compressed versions of the image with the same dimensions as thumbnails. This results in an improved Studio experience due to faster image loading when filtering, sorting or using embedding sort.
Thumbnails are configured by adding a field called thumbnail_locator
to the data, where the value points
to a compressed version of the locator
image.
If you wanted to add a thumbnail to the classification data shown above it would look like:
locator | thumbnail_locator | ground_truth | image_brightness | image_contrast |
---|---|---|---|---|
s3://kolena-examples/data/h0.png |
s3://kolena-examples/data/thumbnail/h0.png |
horse | 153.994 | 84.126 |
Including Assets and Annotations#
Kolena supports the inclusion of overlay annotations and asset files as fields in a dataset.
We recommend using the annotation and asset dataclasses for ease of annotation and asset manipulation:
# Creates a single-row DataFrame with an image datapoint, a `bbox` annotation field, and a `mesh` asset file.
import pandas as pd
from kolena.annotation import BoundingBox
from kolena.asset import MeshAsset
locator = "s3://kolena-public-examples/coco-2014-val/data/COCO_val2014_000000000294.jpg"
bbox = BoundingBox(top_left=(27.7, 69.83), bottom_right=(392.61, 427))
mesh = MeshAsset(locator="s3://kolena-public-examples/a-large-dataset-of-object-scans/data/mesh/00004.ply")
df = pd.DataFrame([dict(locator=locator, bbox=bbox, mesh=mesh)])
# DataFrame can now be directly uploaded as a dataset
from kolena.dataset import upload_dataset
upload_dataset("my-dataset", df, id_fields=["locator"])
# Or serialized to CSV and uploaded through the web UI.
# If serializing to CSV please use the provided `kolena.io.dataframe_to_csv` method. The Pandas provided `to_csv` method
# does not adhere to the JSON spec, and may serialize malformed objects.
from kolena.io import dataframe_to_csv
dataframe_to_csv(df, "my-dataset.csv", index=False)
Specific Workflows#
2D Object Detection#
Example
You can follow this example 2D object detection โ
annotations
are used to visualize overlays on top of images.
To render 2D bounding boxes you can use
LabeledBoundingBox
or
BoundingBox
annotations.
Consider a .csv
file containing ground truth data in the form of bounding boxes for an Object Detection problem.
locator | label | min_x | max_x | min_y | max_y |
---|---|---|---|---|---|
s3://kolena-public-examples/coco-2014-val/data/COCO_val2014_000000369763.jpg | motorcycle | 270.77 | 621.61 | 44.59 | 254.18 |
s3://kolena-public-examples/coco-2014-val/data/COCO_val2014_000000369763.jpg | car | 538.03 | 636.85 | 8.86 | 101.93 |
s3://kolena-public-examples/coco-2014-val/data/COCO_val2014_000000369763.jpg | trunk | 313.02 | 553.98 | 12.01 | 99.84 |
This looks like:
from kolena.annotation import LabeledBoundingBox
bboxes = [
LabeledBoundingBox(top_left=(270.77, 44.59), bottom_right=(621.61, 254.18), label="motorcycle"),
LabeledBoundingBox(top_left=(538.03, 8.86), bottom_right=(636.85, 101.93), label="car"),
LabeledBoundingBox(top_left=(313.02, 12.01), bottom_right=(553.98, 99.84), label="trunk"),
]
Tip
Using bounding box categories
If you wish to analyze your model results based on specific characteristics of your bounding boxes
you can provide values representing those characteristics using additional key value pairs.
For example if location of a bounding box is important you can construct your LabeledBoundingBox
like this
Note
When uploading .csv
files for datasets that contain annotations, assets or nested values in a column use the
dataframe_to_csv()
function provided by Kolena
to save a .csv
file
instead of pandas.to_csv()
.
pandas.to_csv
does not serialize Kolena annotation objects in a way that is compatible with the platform.
Uploading Model Results#
Model results contain your model inferences as well as any custom metrics that you wish to monitor on Kolena. The data structure of model results is very similar to the structure of a dataset with minor differences.
-
Ensure your results are using the same unique ID field (the
locator
for instance) you have selected for your dataset. -
Use
ScoredBoundingBox
orScoredLabeledBoundingBox
to pass on your model inferences confidence score for each bounding box. -
Use
compute_object_detection_results
to compute your metrics that are supported by Kolena's Object Detection Task Metrics. -
OR include the following columns in your results. The values for each of the columns is a
List[ScoredLabeledBoundingBox]
.Column Name Description matched_inference
Inferences that were matched to a ground truth. unmatched_inference
Inferences that were not matched to a ground truth. unmatched_ground_truth
Ground truths with no matching inference. -
Leverage task metrics by adding the following columns to your CSV:
count_TP
,count_FP
,count_FN
,count_TN
.
Note
Once you have constructed your DataFrame
use the
upload_object_detection_results
wrapper function to simplify the upload process and enable the Object Detection Task metrics automatically.
Example
Follow the 2D Object Detection result upload example for optimal setup.
3D Object Detection#
annotations
are used to visualize overlays on top of images.
To render 3D Bounding boxes you can use
BoundingBox3D
or
LabeledBoundingBox3D
Tip
Using bounding box categories
If you wish to analyze your model results based on specific characteristics of your bounding boxes
you can provide values representing those characteristics using additional key value pairs.
For example, if location of a bounding box is important you can construct your LabeledBoundingBox3D
like this
Note
When uploading .csv
files for datasets that contain annotations, assets or nested values in a column use the
dataframe_to_csv()
function provided by Kolena
to save a .csv
file instead of
pandas.to_csv()
.
pandas.to_csv
does not serialize Kolena annotation objects in a way that is compatible with the platform.
Uploading Model Results#
Model results contain your model inferences as well as any custom metrics that you wish to monitor on Kolena. The data structure of model results is very similar to the structure of a dataset with minor differences.
- Ensure your results are using the same unique ID field (the
locator
for instance) you have selected for your dataset. - Use
ScoredBoundingBox3D
orScoredLabeledBoundingBox3D
to pass on your model inferences confidence score for each bounding box. - Use
compute_object_detection_results
to compute your metrics that are supported by Kolena's Object Detection Task Metrics.
Note
Once you have constructed your DataFrame
use the
upload_object_detection_results
wrapper function to simplify the upload process and enable the Object Detection Task metrics automatically.
Example
Follow the 3D Object Detection result upload script on how to setup both 3D and 2D bounding boxes in your results for multi-modal 3D object detection data.
Video#
Videos are best represented in Kolena using the Gallery view. To setup the Gallery view, add links to your video files
stored on the cloud under the locator
column. Kolena automatically looks for that column name and renders your video files
correctly.
Kolena supports mov
, mp4
, mpeg
and other web browser supported video types.
Note
Annotation visualization over videos only works on videos with constant frame rates.
For the best experience, include a frame_rate
field on your video datapoints in frames per second as a float
or int
number.
Setting up bounding box annotations on videos#
To overlay bounding boxes on videos, you will need to define a new class based on
LabeledBoundingBox
or
BoundingBox
annotations where a
frame_id
property is added. You are able to add additional properties if you wish which can be used
for filtering and visualizations.
# video bounding box for pedestrians with pedestrian id,
# risk of collision and label indicating if bounding box
# is occluded.
class PedestrianBoundingBox(LabeledBoundingBox):
frame_id: int
ped_id: str
occlusion: str
risk: Optional[str] = None
def set_risk(self, risk: str) -> None:
object.__setattr__(self, "risk", risk)
Note
Kolena depends on the frame_id
for rendering and it needs to be a zero-indexed integer.
To overlay bounding boxes with inferences (used when uploading model results) you will need a new class based on
ScoredBoundingBox
or
ScoredLabeledBoundingBox
.
The requirements for rendering on a video is similar to the previous example:
# video bounding box with inference (represented by score),
# frame_id, pedestrian id, occlusion category,
# time_to_event (in this case potential collusion of
# a pedestrian and vehicle), failed_to_infer for capturing
# no inference cases
@dataclass(frozen=True)
class ScoredPedestrianBoundingBox(ScoredLabeledBoundingBox):
frame_id: int
ped_id: str
occlusion: str
time_to_event: Optional[float]
failed_to_infer: bool
Example
Follow the Crossing Pedestrian Detection example on how to setup video based dataset and model results.