kolena.dataset
#
-
Examples:
kolena/examples/dataset
↗
upload_dataset(name, df, *, id_fields=None, commit_tags=None, append_only=False)
#
Create or update a dataset with the contents of the provided DataFrame df
.
Updating id_fields
ID fields are used to associate model results (uploaded via upload_results
)
with datapoints in this dataset. When updating an existing dataset, update id_fields
with caution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
df
|
Union[DataFrame, Iterator[DataFrame]]
|
A DataFrame or iterator of DataFrames. Provide an iterator to perform batch upload (example: |
required |
id_fields
|
Optional[List[str]]
|
Optionally specify a list of ID fields that will be used to link model results with the datapoints within a dataset. When unspecified, a suitable value is inferred from the columns of the provided |
None
|
commit_tags
|
Optional[List[str]]
|
Optionally specify a list of tags to associate with the dataset commit. |
None
|
append_only
|
bool
|
If |
False
|
list_datasets()
#
List the names of all uploaded datasets
Returns:
Type | Description |
---|---|
List[str]
|
A list of the names of all uploaded datasets |
download_dataset(name, *, commit=None, include_extracted_properties=False)
#
Download an entire dataset given its name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the dataset. |
required |
commit
|
Optional[str]
|
The commit hash for version control. Get the latest commit when this value is |
None
|
include_extracted_properties
|
bool
|
If True, include kolena extracted properties from automated extractions in the dataset as separate columns |
False
|
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame containing the specified dataset. |
EvalConfig = Optional[Dict[str, Any]]
module-attribute
#
User defined configuration for evaluating results, for example {"threshold": 7}
.
DataFrame = Union[pd.DataFrame, Iterator[pd.DataFrame]]
module-attribute
#
A type alias representing a DataFrame, which can be either a pandas DataFrame or an iterator of pandas DataFrames.
EvalConfigResults
#
Bases: NamedTuple
Named tuple where the first element (the eval_config
field) is an evaluation configuration, and the second element
(the results
field) is the corresponding DataFrame of results.
ModelEntity
#
download_results(dataset, model, commit=None, include_extracted_properties=False)
#
Download results given dataset name and model name.
Concat dataset with results:
df_dp, results = download_results("dataset name", "model name")
for eval_config, df_result in results:
df_combined = pd.concat([df_dp, df_result], axis=1)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
str
|
The name of the dataset. |
required |
model
|
str
|
The name of the model. |
required |
commit
|
Optional[str]
|
The commit hash for version control. Get the latest commit when this value is |
None
|
include_extracted_properties
|
bool
|
If True, include kolena extracted properties from automated extractions in the datapoints and results as separate columns |
False
|
Returns:
Type | Description |
---|---|
Tuple[DataFrame, List[EvalConfigResults]]
|
Tuple of DataFrame of datapoints and list of |
upload_results(dataset, model, results, thresholded_fields=None, tags=[])
#
This function is used for uploading the results from a specified model on a given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
str
|
The name of the dataset. |
required |
model
|
str
|
The name of the model. |
required |
results
|
Union[DataFrame, List[EvalConfigResults]]
|
Either a DataFrame or a list of |
required |
thresholded_fields
|
Optional[List[str]]
|
Optional columns in result DataFrame containing data associated with different thresholds. |
None
|
tags
|
List[str]
|
Optional list of tags to be associated with the model. |
[]
|
Returns:
Type | Description |
---|---|
None
|
None |
get_models(dataset)
#
Get all models with results on a given dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
str
|
The name of the dataset. |
required |
Returns:
Type | Description |
---|---|
List[ModelEntity]
|
A list of models tested on the given dataset. |