Getting Started Guide

Contents

Details

Given a set
X
of
n
feature vectors
x
1
= (
x
11
,…,
x
1
p
), ...,
x
n
= (
x
n
1
,…,
x
np
) of dimension
p
, the problem is to identify the vectors that do not belong to the underlying distribution using the BACON method (see [ Billor2000 ]).
In the iterative method, each iteration involves several steps:
  1. Identify an initial basic subset of
    m
    >
    p
    feature vectors that can be assumed as not containing outliers. The constant
    m
    is set to 5
    p
    . The library supports two approaches to selecting the initial subset:
    1. Based on distances from the medians ||
      x
      i
      -
      med
      ||, where:
      • med
        is the vector of coordinate-wise medians
      • ||.|| is the vector norm
      • i
        =1, ...,
        n
    2. Based on the Mahalanobis distance
      , where:
      • mean
        and
        S
        are the mean and the covariance matrix, respectively, of
        n
        feature vectors
      • i
        =1, ...,
        n
    Each method chooses
    m
    feature vectors with the smallest values of distances.
  2. Compute the discrepancies using the Mahalanobis distance above, where
    mean
    and
    S
    are the mean and the covariance matrix, respectively, computed for the feature vectors contained in the basic subset.
  3. Set the new basic subset to all feature vectors with the discrepancy less than
    , where:
    1. is the (1 - α) percentile of the Chi2 distribution with
      p
      degrees of freedom
    2. where
      • r
        is the size of the current basic subset
      • , where
        and [ ] is the integer part of a number
  4. Iterate steps 2 and 3 until the size of the basic subset no longer changes.
  5. Nominate the feature vectors that are not part of the final basic subset as outliers.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804