Developer Guide and Reference

  • 2021.3
  • 06/28/2021
  • Public Content
Contents

Univariate Outlier Detection

A univariate outlier is an occurrence of an abnormal value within a single observation point.

Details

Given a set
X
of
n
feature vectors LaTex Math image. of dimension
p
, the problem is to identify the vectors that do not belong to the underlying distribution (see [Ben2005] for exact definitions of an outlier).
The algorithm for univariate outlier detection considers each feature independently. The univariate outlier detection method can be parametric, assumes a known underlying distribution for the data set, and defines an outlier region such that if an observation belongs to the region, it is marked as an outlier. Definition of the outlier region is connected to the assumed underlying data distribution.
The following is an example of an outlier region for the univariate outlier detection:
LaTex Math image.
where LaTex Math image. and LaTex Math image. are (robust) estimates of the mean and standard deviation computed for a given data set, LaTex Math image. is the confidence coefficient, and LaTex Math image. defines the limits of the region and should be adjusted to the number of observations.

Batch Processing

Algorithm Input
The univariate outlier detection algorithm accepts the input described below. Pass the
Input ID
as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.
Input ID
Input
data
Pointer to the LaTex Math image. numeric table with the data for outlier detection.
The input can be an object of any class derived from the
NumericTable
class.
location
Pointer to the LaTex Math image. numeric table with the vector of means.
The input can be an object of any class derived from
NumericTable
except
PackedSymmetricMatrix
and
PackedTriangularMatrix
.
scatter
Pointer to the LaTex Math image. numeric table with the vector of standard deviations.
The input can be an object of any class derived from
NumericTable
except
PackedSymmetricMatrix
and
PackedTriangularMatrix
.
threshold
Pointer to the LaTex Math image. numeric table with non-negative numbers that define the outlier region.
The input can be an object of any class derived from
NumericTable
except
PackedSymmetricMatrix
and
PackedTriangularMatrix
.
If you do not provide at least one of the
location
,
scatter
,
threshold
inputs, the library will initialize all of them with the following default values:
location
A set of
0.0
scatter
A set of
1.0
threshold
A set of
3.0
Algorithm Parameters
The univariate outlier detection algorithm has the following parameters:
Parameter
Default Value
Description
algorithmFPType
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
defaultDense
Performance-oriented computation method, the only method supported by the algorithm.
Algorithm Output
The univariate outlier detection algorithm calculates the result described below. Pass the
Result ID
as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.
Result ID
Result
weights
Pointer to the LaTex Math image. numeric table of zeros and ones. Zero in the position LaTex Math image. indicates an outlier in the
i
-th observation of the
j
-th feature.
By default, the result is an object of the
HomogenNumericTable
class, but you can define the result as an object of any class derived from
NumericTable
except
PackedSymmetricMatrix
,
PackedTriangularMatrix
, and
СSRNumericTable
.

Examples

C++ (CPU)
Batch Processing:
Java*
There is no support for Java on GPU.
Batch Processing:
Python*
Batch Processing:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.