Developer Guide and Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

Density-Based Spatial Clustering of Applications with Noise

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).

Details

Given the set LaTex Math image. of
n
p
-dimensional feature vectors (further referred as observations), a positive floating-point number
epsilon
and a positive integer
minObservations
, the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:
core observation
An observation
x
is called core observation if at least
minObservations
input observations (including
x
) are within distance
epsilon
from observation
x
;
directly reachable
An observation
y
is directly reachable from
x
if
y
is within distance
epsilon
from core observation
x
. Observations are only said to be directly reachable from core observations.
reachable
An observation
y
is reachable from an observation
x
if there is a path LaTex Math image. with LaTex Math image. and LaTex Math image. , where each LaTex Math image. is directly reachable from LaTex Math image. . This implies that all observations on the path must be core observations, with the possible exception of
y
.
noise observation
Noise observations are observations that are not reachable from any other observation.
cluster
Two observations
x
and
y
are considered to be in the same cluster if there is a core observation
z
, and
x
and
y
are both reachable from
z
.
Each cluster gets a unique identifier, an integer number from
0
to LaTex Math image. . Each observation is assigned an identifier of the cluster it belongs to, or LaTex Math image. if the observation considered to be a noise observation.

Computation

The following computation modes are available:

Examples

C++ (CPU)
Batch Processing:
Distributed Processing:
Java*
There is no support for Java on GPU.
Batch Processing:
Distributed Processing:
Python* with DPC++ support
Batch Processing:
Python*
Batch Processing:
Distributed Processing:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.