Developer Guide

Contents

Density-Based Spatial Clustering of Applications with Noise

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).

Details

Given the set
X
= {
x
1
= (
x
11
, ...,
x
1
p
), ...,
x
n
= (
x
n
1
, ...,
x
np
)} of
n
p
-dimensional feature vectors (further referred as observations), a positive floating-point number epsilon and a positive integer
minObservations
, the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:
  • An observation
    x
    is called core observation if at least
    minObservations
    input observations (including
    x
    ) are within distance
    epsilon
    from observation
    x
    ;
  • An observation
    y
    is directly reachable from
    x
    if
    y
    is within distance
    epsilon
    from core observation
    x
    . Observations are only said to be directly reachable from core observations.
  • An observation
    y
    is reachable from observation
    x
    if there is a path
    x
    1
    , ...,
    x
    m
    with
    x
    1
    =
    x
    and
    x
    m
    =
    y
    , where each
    x
    i
    +1
    is directly reachable from
    x
    i
    . This implies that all observations on the path must be core observations, with the possible exception of
    y
    .
  • All observations not reachable from any other observation are noise observations.
  • Two observations
    x
    and
    y
    is considered to be in the same cluster if there is a core observation
    z
    , that
    x
    and
    y
    are reachable from
    z
    .
Each cluster will get a unique identifier, an integer number from 0 to (total number of clusters – 1). Assignment of each observation is an identifier of the cluster to which it belongs, or -1 if the observation considered to be a noise observation.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804