Batch Processing

Algorithm Input

The multivariate outlier detection algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID

Input

data

Pointer to the n x p numeric table with the data for outlier detection. The input can be an object of any class derived from the NumericTable class.

location

Pointer to the 1 x p numeric table with the vector of means. The input can be an object of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix.

scatter

Pointer to the p x p numeric table that contains the variance-covariance matrix. The input can be an object of any class derived from NumericTable except PackedTriangularMatrix.

threshold

Pointer to the 1 x 1 numeric table with the non-negative number that defines the outlier region. The input can be an object of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix.

Note

If you do not provide at least one of the location, scatter, threshold inputs, the library will initialize all of them with the following default values:
location

Set of 0.0

scatter

Numeric table with diagonal elements equal to 1.0 and non-diagonal elements equal to 0.0

threshold

3.0

Algorithm Parameters

The multivariate outlier detection algorithm has the following parameters, which depend on the computation method parameter method:

Parameter

method

Default Value

Description

algorithmFPType

defaultDense or baconDense

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

Not applicable

defaultDense

Available methods for multivariate outlier detection:

  • defaultDense - Performance-oriented computation method
  • DEPRECATED: baconDense - Blocked Adaptive Computationally-efficient Outlier Nominators (BACON) method.

    Note

    This method is deprecated and will be removed in a future release. Use the bacon_outlier_detection::Batch algorithm instead.

DEPRECATED: initializationProcedure

defaultDense

Not applicable

Note

This parameter is deprecated and will be removed in a future release. To initialize the algorithm, use tables in the input class.

The procedure for setting initial parameters of the algorithm. It is your responsibility to define the procedure.

Input objects for the initialization procedure are:

  • data - numeric table of size n x p that contains input data of the multivariate outlier detection algorithm

Results of the initialization procedure are:

  • location - numeric table of size 1 x p that contains the vector of means
  • scatter - numeric table of size p x p that contains the variance-covariance matrix
  • threshold - numeric table of size 1 x 1 with the non-negative number that defines the outlier region

If you do not set this parameter, the library uses the default initialization, which sets:

  • location to 0.0
  • scatter to the numeric table with diagonal elements equal to 1.0 and non-diagonal elements equal to 0.0
  • threshold to 3.0

baconDense

baconMedian

The initialization method. Can be:

  • baconMedian - Median-based method.
  • defaultDense - Mahalanobis distance-based method.

DEPRECATED: alpha

baconDense

0.05

Note

This parameter is deprecated and will be removed in a future release. Use the bacon_outlier_detection::Batch algorithm instead.

One-tailed probability that defines the (1 - α) quantile of the χ 2 distribution with p degrees of freedom.

Recommended value: α/n, where n is the number of observations.

DEPRECATED: accuracyThreshold

baconDense

0.005

Note

This parameter is deprecated and will be removed in a future release. Use the bacon_outlier_detection::Batch algorithm instead.

The stopping criterion. The algorithm is terminated if the size of the basic subset is changed by less than the threshold.

Algorithm Output

The multivariate outlier detection algorithm calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID

Result

weights

Pointer to the n x 1 numeric table of zeros and ones. Zero in the i-th position indicates that the i-th feature vector is an outlier. By default, the result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except the PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Examples

C++:

  • out_detect_mult_dense_batch.cpp

Java*:

  • OutDetectMultDenseBatch.java

Python*:

  • out_detect_mult_dense_batch.py

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)