# Multivariate Outlier Detection

In multivariate outlier detection methods, the observation point is the entire feature vector.

## Details

Given a set

*of*X

*feature vectors of dimension*n

*, the problem is to identify the vectors that do not belong to the underlying distribution (see [Ben2005] for exact definitions of an outlier).*p

The multivariate outlier detection method takes into account dependencies between features.
This method can be parametric, assumes a known underlying distribution for the data set, and defines an outlier region
such that if an observation belongs to the region, it is marked as an outlier.
Definition of the outlier region is connected to the assumed underlying data distribution.

The following is an example of an outlier region for multivariate outlier detection:

where
and Sigma_n are (robust) estimates of the vector of means and variance-covariance matrix computed for a given data set,
is the confidence coefficient, and
defines the limit of the region.

## Batch Processing

Algorithm Input

The multivariate outlier detection algorithm accepts the input described below.
Pass the

Input ID

as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.Input ID | Input |
---|---|

data | Pointer to the
numeric table with the data for outlier detection.
The input can be an object of any class derived from the NumericTable class. |

location | Pointer to the
numeric table with the vector of means.
The input can be an object of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix . |

scatter | Pointer to the
numeric table that contains the variance-covariance matrix.
The input can be an object of any class derived from NumericTable except PackedTriangularMatrix . |

threshold | Pointer to the
numeric table with the non-negative number that defines the outlier region.
The input can be an object of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix . |

If you do not provide at least one of the

location

, scatter

, threshold

inputs,
the library will initialize all of them with the following default values:location | A set of 0.0 |

scatter | A numeric table with diagonal elements equal to and non-diagonal elements equal to 1.0 0.0 |

threshold | 3.0 |

Algorithm Parameters

The multivariate outlier detection algorithm has the following parameters:

Parameter | Default Value | Description |
---|---|---|

algorithmFPType | float | The floating-point type that the algorithm uses for intermediate computations. Can be float or double . |

method | defaultDense | Performance-oriented computation method. |

Algorithm Output

The multivariate outlier detection algorithm calculates the result described below.
Pass the

Result ID

as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.Result ID | Result |
---|---|

weights | Pointer to the
numeric table of zeros and ones.
Zero in the -th position indicates that the i -th feature vector is an outlier.i By default, the result is an object of the HomogenNumericTable class,
but you can define the result as an object of any class derived from NumericTable
except the PackedSymmetricMatrix , PackedTriangularMatrix , and CSRNumericTable . |

## Examples

C++ (CPU)

Batch Processing:

Java*

Python*

Batch Processing:

## Performance Considerations

To get the best overall performance of multivariate outlier detection:

- If input data is homogeneous, provide input data and store results in homogeneous numeric tables of the same type as specified in thealgorithmFPTypeclass template parameter.
- If input data is non-homogeneous, use AOS layout rather than SOA layout.
- For the default outlier detection method (defaultDense), you can benefit from splitting the input data set into blocks for parallel processing.

Optimization Notice |
---|

Intel’s compilers may or may not optimize to the same degree for
non-Intel microprocessors for optimizations that are not unique to
Intel microprocessors. These optimizations include SSE2, SSE3, and
SSSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended
for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and
Reference Guides for more information regarding the specific
instruction sets covered by this notice. Notice revision #20110804 |