Density-Based Spatial Clustering of Applications with Noise
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96].
It is a density-based clustering non-parametric algorithm: given a set of observations in some space,
it groups together observations that are closely packed together (observations with many nearby neighbors),
marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).
Details
Given the set
of
n
p
-dimensional feature vectors (further referred as observations),
a positive floating-point number epsilon
and a positive integer minObservations
,
the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:- core observation
- An observationxis called core observation if at leastminObservationsinput observations (includingx) are within distanceepsilonfrom observationx;
- directly reachable
- An observationyis directly reachable fromxifyis within distanceepsilonfrom core observationx. Observations are only said to be directly reachable from core observations.
- reachable
- An observationyis reachable from an observationxif there is a path
with
and
, where each
is directly reachable from
. This implies that all observations on the path must be core observations, with the possible exception of
y. - noise observation
- Noise observations are observations that are not reachable from any other observation.
- cluster
- Two observationsxandyare considered to be in the same cluster if there is a core observationz, andxandyare both reachable fromz.
Each cluster gets a unique identifier, an integer number from
.
Each observation is assigned an identifier of the cluster it belongs to,
or
if the observation considered to be a noise observation.
0
to
Examples
C++ (CPU)
Java*
There is no support for Java on GPU.
Batch Processing:
Distributed Processing:
Python* with DPC++ support
Batch Processing:
Python*