Programmatically Compare Models#
Experimental Feature
Experimental features are under active development and may occasionally undergo API-breaking changes.
Download Quality Standard Result#
The Quality Standard should contain the key performance metrics a team uses to evaluate a model's performance on a dataset.
The SDK provides a
function, download_quality_standard_result
,
to download a dataset's quality standard result. This enables users to automate processes surrounding a Quality
Standard's result.
The return value is a multi-index DataFrame with indices (stratification, test_case)
and columns (model, eval_config,
metric_group, metric)
.
Use Case: Continuous Integration#
In order to automate deployment decisions with Kolena a team could:
- Define the metric requirements a model must meet in order to be considered for deployment.
- Upload model results as part of a CI/CD pipeline.
- Download the dataset's quality standard results and programmatically compare against the defined criteria.
- Proceed to the next stage of the CI/CD pipeline based on the outcome of the assessment. For instance:
- if a model surpasses all the defined thresholds, it is promoted.
- if a model partially surpasses the defined thresholds, it is a promotion candidate.
- if a model surpasses none of the defined thresholds, it is not promoted.