Getting Started Guide

Contents

Batch Processing

Algorithm Input

The multivariate outlier detection algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.
Input ID
Input
data
Pointer to the
n
x
p
numeric table with the data for outlier detection. The input can be an object of any class derived from the
NumericTable
class.
location
Pointer to the 1 x
p
numeric table with the vector of means. The input can be an object of any class derived from
NumericTable
except
PackedSymmetricMatrix
and
PackedTriangularMatrix
.
scatter
Pointer to the
p
x
p
numeric table that contains the variance-covariance matrix. The input can be an object of any class derived from
NumericTable
except
PackedTriangularMatrix
.
threshold
Pointer to the 1 x 1 numeric table with the non-negative number that defines the outlier region. The input can be an object of any class derived from
NumericTable
except
PackedSymmetricMatrix
and
PackedTriangularMatrix
.
If you do not provide at least one of the
location
,
scatter
,
threshold
inputs, the library will initialize all of them with the following default values:
location
Set of 0.0
scatter
Numeric table with diagonal elements equal to 1.0 and non-diagonal elements equal to 0.0
threshold
3.0

Algorithm Parameters

The multivariate outlier detection algorithm has the following parameters, which depend on the computation method parameter
method
:
Parameter
method
Default Value
Description
algorithmFPType
defaultDense
or
baconDense
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
Not applicable
defaultDense
Available methods for multivariate outlier detection:
  • defaultDense
    - Performance-oriented computation method
  • DEPRECATED:
    baconDense
    - Blocked Adaptive Computationally-efficient Outlier Nominators (BACON) method.
    This method is deprecated and will be removed in a future release. Use the
    bacon_outlier_detection::Batch
    algorithm instead.
DEPRECATED:
initialization
Procedure
defaultDense
Not applicable
This parameter is deprecated and will be removed in a future release. To initialize the algorithm, use tables in the input class.
The procedure for setting initial parameters of the algorithm. It is your responsibility to define the procedure.
Input objects for the initialization procedure are:
  • data
    - numeric table of size
    n
    x
    p
    that contains input data of the multivariate outlier detection algorithm
Results of the initialization procedure are:
  • location
    - numeric table of size 1 x
    p
    that contains the vector of means
  • scatter
    - numeric table of size
    p
    x
    p
    that contains the variance-covariance matrix
  • threshold
    - numeric table of size 1 x 1 with the non-negative number that defines the outlier region
If you do not set this parameter, the library uses the default initialization, which sets:
  • location
    to 0.0
  • scatter
    to the numeric table with diagonal elements equal to 1.0 and non-diagonal elements equal to 0.0
  • threshold
    to 3.0
baconDense
baconMedian
The initialization method. Can be:
  • baconMedian
    - Median-based method.
  • defaultDense
    - Mahalanobis distance-based method.
DEPRECATED:
alpha
baconDense
0.05
This parameter is deprecated and will be removed in a future release. Use the
bacon_outlier_detection::Batch
algorithm instead.
One-tailed probability that defines the (1 - α) quantile of the χ
2
distribution with
p
degrees of freedom.
Recommended value: α/
n
, where
n
is the number of observations.
DEPRECATED:
accuracyThreshold
baconDense
0.005
This parameter is deprecated and will be removed in a future release. Use the
bacon_outlier_detection::Batch
algorithm instead.
The stopping criterion. The algorithm is terminated if the size of the basic subset is changed by less than the threshold.

Algorithm Output

The multivariate outlier detection algorithm calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.
Result ID
Result
weights
Pointer to the
n
x 1 numeric table of zeros and ones. Zero in the
i
-th position indicates that the
i
-th feature vector is an outlier. By default, the result is an object of the
HomogenNumericTable
class, but you can define the result as an object of any class derived from
NumericTable
except the
PackedSymmetricMatrix
,
PackedTriangularMatrix,
and
CSRNumericTable
.
1

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reservered for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804