Programmatically Compare Models#

Experimental Feature

Experimental features are under active development and may occasionally undergo API-breaking changes.

Download Quality Standard Result#

The Quality Standard should contain the key performance metrics a team uses to evaluate a model's performance on a dataset.

The SDK provides a function, download_quality_standard_result, to download a dataset's quality standard result. This enables users to automate processes surrounding a Quality Standard's result.

The return value is a multi-index DataFrame with indices (stratification, test_case) and columns (model, eval_config, metric_group, metric).

Use Case: Continuous Integration#

In order to automate deployment decisions with Kolena a team could:

Define the metric requirements a model must meet in order to be considered for deployment.
Upload model results as part of a CI/CD pipeline.
Download the dataset's quality standard results and programmatically compare against the defined criteria.
Proceed to the next stage of the CI/CD pipeline based on the outcome of the assessment. For instance:
- if a model surpasses all the defined thresholds, it is promoted.
- if a model partially surpasses the defined thresholds, it is a promotion candidate.
- if a model surpasses none of the defined thresholds, it is not promoted.