Density-Based Spatial Clustering of Applications with Noise

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed in [Ester96]. It is a density-based clustering non-parametric algorithm: given a set of observations in some space, it groups together observations that are closely packed together (observations with many nearby neighbors), marking as outliers observations that lie alone in low-density regions (whose nearest neighbors are too far away).

Details

Given the set X = {x1 = (x11, ..., x1p), ..., xn = (xn1, ..., xnp)} of n p-dimensional feature vectors (further referred as observations), a positive floating-point number epsilon and a positive integer minObservations, the problem is to get clustering assignments for each input observation, based on the definitions below [Ester96]:

  • An observation x is called core observation if at least minObservations input observations (including x) are within distance epsilon from observation x;

  • An observation y is directly reachable from x if y is within distance epsilon from core observation x. Observations are only said to be directly reachable from core observations.

  • An observation y is reachable from observation x if there is a path x1, ..., xm with x1 = x and xm = y, where each xi+1 is directly reachable from xi. This implies that all observations on the path must be core observations, with the possible exception of y.

  • All observations not reachable from any other observation are noise observations.

  • Two observations x and y is considered to be in the same cluster if there is a core observation z, that x and y are reachable from z.

Each cluster will get a unique identifier, an integer number from 0 to (total number of clusters – 1). Assignment of each observation is an identifier of the cluster to which it belongs, or -1 if the observation considered to be a noise observation.

Optimization Notice: 
For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)