kolena.dataset
#
-
Examples:
kolena/examples/dataset
↗
upload_dataset(name, df, *, id_fields=None, commit_tags=None, dataset_tags=None, append_only=False, description=None)
#
Create or update a dataset with the contents of the provided DataFrame df
.
Updating id_fields
ID fields are used to associate model results (uploaded via upload_results
)
with datapoints in this dataset. When updating an existing dataset, update id_fields
with caution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
df
|
Union[DataFrame, Iterator[DataFrame]]
|
A DataFrame or iterator of DataFrames. Provide an iterator to perform batch upload (example: |
required |
id_fields
|
Optional[List[str]]
|
Optionally specify a list of ID fields that will be used to link model results with the datapoints within a dataset. When unspecified, a suitable value is inferred from the columns of the provided |
None
|
commit_tags
|
Optional[List[str]]
|
Optionally specify a list of tags to associate with the dataset commit. |
None
|
dataset_tags
|
Optional[List[str]]
|
Optionally specify a list of tags to associate with the dataset. |
None
|
append_only
|
bool
|
If |
False
|
description
|
Optional[str]
|
Optionally specify the description of the dataset. |
None
|
list_datasets()
#
List the names of all uploaded datasets
Returns:
Type | Description |
---|---|
List[str]
|
A list of the names of all uploaded datasets |
download_dataset(name, *, commit=None, include_extracted_properties=False)
#
Download an entire dataset given its name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
commit
|
Optional[str]
|
The commit hash for version control. Get the latest commit when this value is |
None
|
include_extracted_properties
|
bool
|
If True, include kolena extracted properties from automated extractions in the dataset as separate columns |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the specified dataset. |
EvalConfig = Optional[Dict[str, Any]]
module-attribute
#
User defined configuration for evaluating results, for example {"threshold": 7}
.
DataFrame = Union[pd.DataFrame, Iterator[pd.DataFrame]]
module-attribute
#
A type alias representing a DataFrame, which can be either a pandas DataFrame or an iterator of pandas DataFrames.
EvalConfigResults
#
Bases: NamedTuple
Named tuple where the first element (the eval_config
field) is an evaluation configuration, and the second element
(the results
field) is the corresponding DataFrame of results.
ModelEntity
#
download_results(dataset, model, commit=None, include_extracted_properties=False)
#
Download results given dataset name and model name.
Concat dataset with results:
df_dp, results = download_results("dataset name", "model name")
for eval_config, df_result in results:
df_combined = pd.concat([df_dp, df_result], axis=1)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
str
|
The name of the dataset. |
required |
model
|
str
|
The name of the model. |
required |
commit
|
Optional[str]
|
The commit hash for version control. Get the latest commit when this value is |
None
|
include_extracted_properties
|
bool
|
If True, include kolena extracted properties from automated extractions in the datapoints and results as separate columns |
False
|
Returns:
Type | Description |
---|---|
Tuple[DataFrame, List[EvalConfigResults]]
|
Tuple of DataFrame of datapoints and list of |
upload_results(dataset, model, results, thresholded_fields=None, tags=[])
#
This function is used for uploading the results from a specified model on a given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
str
|
The name of the dataset. |
required |
model
|
str
|
The name of the model. |
required |
results
|
Union[DataFrame, List[EvalConfigResults]]
|
Either a DataFrame or a list of |
required |
thresholded_fields
|
Optional[List[str]]
|
Optional columns in result DataFrame containing data associated with different thresholds. |
None
|
tags
|
List[str]
|
Optional list of tags to be associated with the model. |
[]
|
Returns:
Type | Description |
---|---|
None
|
None |
get_models(dataset)
#
Get all models with results on a given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
str
|
The name of the dataset. |
required |
Returns:
Type | Description |
---|---|
List[ModelEntity]
|
A list of models tested on the given dataset. |
upload_dataset_embeddings(dataset_name, key, df_embedding)
#
Upload a list of search embeddings for a dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_name
|
str
|
String value indicating the name of the dataset for which the embeddings will be uploaded. |
required |
key
|
str
|
String value uniquely corresponding to the embedding vectors. For example, this can be the name of the embedding model along with the column with which the embedding was extracted, such as |
required |
df_embedding
|
DataFrame
|
Dataframe containing id fields for identifying datapoints in the dataset and the associated embeddings as |
required |
Raises:
Type | Description |
---|---|
NotFoundError
|
The given dataset does not exist. |
InputValidationError
|
The provided input is not valid. |