Skip to content

Experimental Features#

kolena._experimental.quality_standard #

download_quality_standard_result(dataset, models, metric_groups=None, intersect_results=True) #

Download quality standard result given a dataset and list of models.

Parameters:

Name Type Description Default
dataset str

The name of the dataset.

required
models List[str]

The names of the models.

required
metric_groups Union[List[str], None]

The names of the metric groups to include in the result.

None
intersect_results bool

If True, only include datapoint that are common to all models in the metrics calculation. Note all metric groups are included when this value is None.

True

Returns:

Type Description
DataFrame

A Dataframe containing the quality standard result.

kolena._experimental.search #

upload_dataset_embeddings(dataset_name, key, df_embedding) #

Upload a list of search embeddings for a dataset.

Parameters:

Name Type Description Default
dataset_name str

String value indicating the name of the dataset for which the embeddings will be uploaded.

required
key str

String value uniquely corresponding to the model used to extract the embedding vectors. This is typically a locator.

required
df_embedding DataFrame

Dataframe containing id fields for identifying datapoints in the dataset and the associated embeddings as numpy.typing.ArrayLike of numeric values.

required

Raises:

Type Description
NotFoundError

The given dataset does not exist.

InputValidationError

The provided input is not valid.

kolena._experimental.object_detection #

compute_object_detection_results(dataset_name, df, *, ground_truths_field='ground_truths', raw_inferences_field='raw_inferences', gt_ignore_property=None, iou_threshold=0.5, threshold_strategy='F1-Optimal', min_confidence_score=0.01, batch_size=10000) #

Compute metrics of the model for the dataset.

Dataframe df should include a locator column that would match to that of corresponding datapoint and an inference column that should be a list of scored BoundingBoxes.

Parameters:

Name Type Description Default
dataset_name str

Dataset name.

required
df DataFrame

Dataframe for model results.

required
ground_truths_field str

Field name in datapoint with ground truth bounding boxes, defaulting to "ground_truths".

'ground_truths'
raw_inferences_field str

Column in model result DataFrame with raw inference bounding boxes, defaulting to "raw_inferences".

'raw_inferences'
gt_ignore_property Optional[str]

Field on the ground truth bounding boxes used to determine if the bounding box should be ignored. Bounding boxes will be ignored if this field exists and is equal to True.

None
iou_threshold float

The IoU ↗ threshold, defaulting to 0.5.

0.5
threshold_strategy Union[Literal['F1-Optimal'], float, Dict[str, float]]

The confidence threshold strategy. It can either be a fixed confidence threshold such as 0.5 or 0.75, or "F1-Optimal" to find the threshold maximizing F1 score.

'F1-Optimal'
min_confidence_score float

The minimum confidence score to consider for the evaluation. This is usually set to reduce noise by excluding inferences with low confidence score.

0.01
batch_size int

number of results to process per iteration.

10000

Returns:

Type Description
DataFrame

A DataFrame of the computed results

upload_object_detection_results(dataset_name, model_name, df, *, ground_truths_field='ground_truths', raw_inferences_field='raw_inferences', gt_ignore_property=None, iou_threshold=0.5, threshold_strategy='F1-Optimal', min_confidence_score=0.01, batch_size=10000, required_match_fields=None) #

Compute metrics and upload results of the model computed by compute_object_detection_results for the dataset.

Dataframe df should include a locator column that would match to that of corresponding datapoint and an inference column that should be a list of scored BoundingBoxes.

Parameters:

Name Type Description Default
dataset_name str

Dataset name.

required
model_name str

Model name.

required
df DataFrame

Dataframe for model results.

required
ground_truths_field str

Field name in datapoint with ground truth bounding boxes, defaulting to "ground_truths".

'ground_truths'
raw_inferences_field str

Column in model result DataFrame with raw inference bounding boxes, defaulting to "raw_inferences".

'raw_inferences'
gt_ignore_property Optional[str]

Name of a property on the ground truth bounding boxes used to determine if the bounding box should be ignored. Bounding boxes will be ignored if this property exists and is equal to True.

None
iou_threshold float

The IoU ↗ threshold, defaulting to 0.5.

0.5
threshold_strategy Union[Literal['F1-Optimal'], float, Dict[str, float]]

The confidence threshold strategy. It can either be a fixed confidence threshold such as 0.5 or 0.75, or "F1-Optimal" to find the threshold maximizing F1 score.

'F1-Optimal'
min_confidence_score float

The minimum confidence score to consider for the evaluation. This is usually set to reduce noise by excluding inferences with low confidence score.

0.01
batch_size int

number of results to process per iteration.

10000
required_match_fields Optional[List[str]]

Optionally specify a list of fields that must match between the inference and ground truth for them to be considered a match.

None

Returns:

Type Description
None

kolena._experimental.instance_segmentation #

upload_instance_segmentation_results(dataset_name, model_name, df, *, ground_truths_field='ground_truths', raw_inferences_field='raw_inferences', iou_threshold=0.5, threshold_strategy='F1-Optimal', min_confidence_score=0.01, batch_size=10000) #

Compute metrics and upload results of the model for the dataset.

Dataframe df should include a locator column that would match to that of corresponding datapoint and an inference column that should be a list of scored Polygons.

Parameters:

Name Type Description Default
dataset_name str

Dataset name.

required
model_name str

Model name.

required
df DataFrame

Dataframe for model results.

required
ground_truths_field str

Field name in datapoint with ground truth polygons, defaulting to "ground_truths".

'ground_truths'
raw_inferences_field str

Column in model result DataFrame with raw inference polygons, defaulting to "raw_inferences".

'raw_inferences'
iou_threshold float

The IoU ↗ threshold, defaulting to 0.5.

0.5
threshold_strategy Union[Literal['F1-Optimal'], float, Dict[str, float]]

The confidence threshold strategy. It can either be a fixed confidence threshold such as 0.5 or 0.75, or "F1-Optimal" to find the threshold maximizing F1 score..

'F1-Optimal'
min_confidence_score float

The minimum confidence score to consider for the evaluation. This is usually set to reduce noise by excluding inferences with low confidence score.

0.01

Returns:

Type Description
None

kolena._experimental.trace #

kolena_trace(func=None, *, dataset_name=None, model_name=None, model_name_field=None, sync_interval=THIRTY_SECONDS, id_fields=None, record_timestamp=True) #

Use this decorator to trace the function with Kolena, the input and output of this function will be sent as datapoints and results respectively For example:

@kolena_trace(dataset_name="test_trace", id_fields=["request_id"], record_timestamp=False)
def predict(data, request_id):
    pass
OR

@kolena_trace
def predict(data, request_id):
    pass

Parameters:

Name Type Description Default
func Optional[Callable]

The function to be traced, this is auto populated when used as a decorator

None
dataset_name Optional[str]

The name of the dataset to be created, if not provided the function name will be used

None
model_name Optional[str]

The name of the model to be created, if not provided the function name suffixed with _model will be used

None
model_name_field Optional[str]

The field in the input that should be used as model name, if this would override the model name

None
sync_interval int

The interval at which the data should be synced to the server, default is 30 seconds

THIRTY_SECONDS
id_fields Optional[List[str]]

The fields in the input that should be used as id fields, if not provided a default id field will be used

None
record_timestamp bool

If True, the timestamp of the input, output, and time elapsed will be recorded, default is True

True