upload_dataset(name, df, *, id_fields=None, commit_tags=None, dataset_tags=None, append_only=False, description=None) #

Create or update a dataset with the contents of the provided DataFrame df.

Updating id_fields

ID fields are used to associate model results (uploaded via upload_results) with datapoints in this dataset. When updating an existing dataset, update id_fields with caution.


Name Type Description Default
name str

The name of the dataset.

df Union[DataFrame, Iterator[DataFrame]]

A DataFrame or iterator of DataFrames. Provide an iterator to perform batch upload (example: csv_reader = pd.read_csv("PathToDataset.csv", chunksize=10)).

id_fields Optional[List[str]]

Optionally specify a list of ID fields that will be used to link model results with the datapoints within a dataset. When unspecified, a suitable value is inferred from the columns of the provided df. Note that id_fields must be hashable.

commit_tags Optional[List[str]]

Optionally specify a list of tags to associate with the dataset commit.

dataset_tags Optional[List[str]]

Optionally specify a list of tags to associate with the dataset.

append_only bool

If False, all datapoints in the dataset will be replaced by the ones in the input dataframe, and existing datapoints absent from the input dataframe will be removed from the dataset. If True, new datapoints from the input dataframe will be added, and existing datapoints will be modified if present in the input dataframe, but no datapoints will be deleted from the datasets. This behaves like an UPSERT operation.

description Optional[str]

Optionally specify the description of the dataset.


list_datasets() #

List the names of all uploaded datasets


Type Description

A list of the names of all uploaded datasets

download_dataset(name, *, commit=None, include_extracted_properties=False, filters=None) #

Download an entire dataset given its name.


Name Type Description Default
name str

The name of the dataset.

commit Optional[str]

The commit hash for version control. Get the latest commit when this value is None.

include_extracted_properties bool

If True, include kolena extracted properties from automated extractions in the dataset as separate columns

filters Optional[Filters]

[Experimental] Optional filter to specify which datapoints should be downloaded.



Type Description

A DataFrame containing the specified dataset.

EvalConfig = Optional[Dict[str, Any]] module-attribute #

User defined configuration for evaluating results, for example {"threshold": 7}.

DataFrame = Union[pd.DataFrame, Iterator[pd.DataFrame]] module-attribute #

A type alias representing a DataFrame, which can be either a pandas DataFrame or an iterator of pandas DataFrames.

EvalConfigResults #

Bases: NamedTuple

Named tuple where the first element (the eval_config field) is an evaluation configuration, and the second element (the results field) is the corresponding DataFrame of results.

ModelEntity #

The descriptor of a model tested on Kolena.

name: str instance-attribute #

Unique name of the model.

tags: List[str] instance-attribute #

Tags associated with the model.

metadata: Optional[Dict[str, Union[StrictInt, StrictFloat, StrictStr, None]]] = None class-attribute instance-attribute #

Metadata associated with the model.

download_results(dataset, model, commit=None, include_extracted_properties=False) #

Download results given dataset name and model name.

Concat dataset with results:

df_dp, results = download_results("dataset name", "model name")
for eval_config, df_result in results:
    df_combined = pd.concat([df_dp, df_result], axis=1)


Name Type Description Default
dataset str

The name of the dataset.

model str

The name of the model.

commit Optional[str]

The commit hash for version control. Get the latest commit when this value is None.

include_extracted_properties bool

If True, include kolena extracted properties from automated extractions in the datapoints and results as separate columns



Type Description
Tuple[DataFrame, List[EvalConfigResults]]

Tuple of DataFrame of datapoints and list of EvalConfigResults.

upload_results(dataset, model, results, thresholded_fields=None, tags=[], metadata=None) #

This function is used for uploading the results from a specified model on a given dataset.


Name Type Description Default
dataset str

The name of the dataset.

model str

The name of the model.

results Union[DataFrame, List[EvalConfigResults]]

Either a DataFrame or a list of EvalConfigResults.

thresholded_fields Optional[List[str]]

Optional columns in result DataFrame containing data associated with different thresholds.

tags List[str]

Optional list of tags to be associated with the model.

metadata Optional[Dict[str, Union[StrictInt, StrictFloat, StrictStr, None]]]

Optional dictionary of string key to values tobe associated with the model.



Type Description


get_models(dataset) #

Get all models with results on a given dataset.


Name Type Description Default
dataset str

The name of the dataset.



Type Description

A list of models tested on the given dataset.

upload_dataset_embeddings(dataset_name, key, df_embedding) #

Upload a list of search embeddings for a dataset.


Name Type Description Default
dataset_name str

String value indicating the name of the dataset for which the embeddings will be uploaded.

key str

String value uniquely corresponding to the embedding vectors. For example, this can be the name of the embedding model along with the column with which the embedding was extracted, such as resnet50-image_locator.

df_embedding DataFrame

Dataframe containing id fields for identifying datapoints in the dataset and the associated embeddings as numpy.typing.ArrayLike of numeric values.



Type Description

The given dataset does not exist.


The provided input is not valid.

Filters #

Filters to be applied on the dataset during the operation. Currently only used as an optional argument in download_dataset.

datapoint: Dict[str, GeneralFieldFilter] = field(default_factory=dict) class-attribute instance-attribute #

Dictionary of a field name of the datapoint to the GeneralFieldFilter to be applied on the field. In case of nested objects, use . as the delimiter to separate the keys. For example, if you have a ground_truth column of Label type, you can use ground_truth.label as the key to query for the class label.

GeneralFieldFilter #

Generic representation of a filter on Kolena.

value_in: Optional[List[Union[StrictStr, StrictBool]]] = None class-attribute instance-attribute #

A list of desired categorical values.

null_value: Optional[Literal[True]] = None class-attribute instance-attribute #

Whether to filter for cases where the field has null value or the field name does not exist.