Density Score#
The density_score
quantifies the local data density surrounding each data point within the 2D UMAP embedding space.
It is calculated by applying a Kernel Density Estimation (KDE) using a Gaussian kernel on the two-dimensional embeddings
of the dataset. The bandwidth parameter of the KDE (defaulted to 0.1 but tunable) controls the smoothness of
the density estimation, influencing how local or global the density estimation is. The resulting density_score
reflects
the density distribution of data points in the 2D space.
Note
To further assist users with data curation tasks, Kolena automatically calculated a number of metrics based on the embedding space details. Enable automatic embedding extractions or upload your own embeddings to utilize these scores.
Interpretation#
A higher density_score
indicates that a data point is located in a region with many neighboring points,
suggesting that it
represents a common pattern or feature within the dataset. Conversely, a lower density_score
suggests that the data point
resides in a sparsely populated region, potentially highlighting outliers or rare patterns. This metric is particularly
useful for visual data exploration, enabling you to identify clusters, anomalies, and understand the overall structure of
your dataset in the embeddings visualizer.