Multivariate BACON Outlier Detection
- Identify an initial basic subset of feature vectors that can be assumed as not containing outliers. The constantmis set to . The library supports two approaches to selecting the initial subset:
Each method choosesmfeature vectors with the smallest values of distances.
- Based on distances from the medians , where:
- medis the vector of coordinate-wise medians
- is the vector norm
- i = 1, …, n
- Based on the Mahalanobis distance , where:
- meanandSare the mean and the covariance matrix, respectively, ofnfeature vectors
- i = 1, …, n
- Compute the discrepancies using the Mahalanobis distance above, where mean and S are the mean and the covariance matrix, respectively, computed for the feature vectors contained in the basic subset.
- Set the new basic subset to all feature vectors with the discrepancy less than , where:
- is the percentile of the Chi-square distribution withpdegrees of freedom
- , where:
- ris the size of the current basic subset
- , where and is the integer part of a number
- Iterate steps 2 and 3 until the size of the basic subset no longer changes.
- Nominate the feature vectors that are not part of the final basic subset as outliers.
Pointer to the numeric table with the data for outlier detection.
The input can be an object of any class derived from the
The floating-point type that the algorithm uses for intermediate computations. Can be
The initialization method, can be:
One-tailed probability that defines the quantile of the distribution with
pdegrees of freedom.
Recommended value: , where
nis the number of observations.
The stopping criterion. The algorithm is terminated if the size of the basic subset is changed by less than the threshold.
Pointer to the numeric table of zeros and ones. Zero in the
i-th position indicates that the
i-th feature vector is an outlier.
By default, the result is an object of the
HomogenNumericTableclass, but you can define the result as an object of any class derived from