Developer Guide and Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

Principal Components Analysis (PCA)

Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.

Mathematical formulation

Training
Given the training set LaTex Math image. of
p
-dimensional feature vectors and the number of principal components
r
, the problem is to compute
r
principal directions (
p
-dimensional eigenvectors [Lang87]) for the training set. The eigenvectors can be grouped into the LaTex Math image. matrix
T
that contains one eigenvector in each row.
Training method:
Covariance
This method uses eigenvalue decomposition of the covariance matrix to compute the principal components of the datasets. The method relies on the following steps:
  1. Computation of the covariance matrix
  2. Computation of the eigenvectors and eigenvalues
  3. Formation of the matrices storing the results
Covariance matrix computation is performed in the following way:
  1. Compute the vector-column of sums LaTex Math image. .
  2. Compute the cross-product LaTex Math image. .
  3. Compute the covariance matrix LaTex Math image. .
To compute eigenvalues LaTex Math image. and eigenvectors LaTex Math image. , the implementer can choose an arbitrary method such as [Ping14].
The final step is to sort the set of pairs LaTex Math image. in the descending order by LaTex Math image. and form the resulting matrix LaTex Math image. . Additionally, the means and variances of the initial dataset are returned.
Training method:
SVD
This method uses singular value decomposition of the dataset to compute its principal components. The method relies on the following steps:
  1. Computation of the singular values and singular vectors
  2. Formation of the matrices storing the results
To compute singular values LaTex Math image. and singular vectors LaTex Math image. and LaTex Math image. , the implementer can choose an arbitrary method such as [Demmel90].
The final step is to sort the set of pairs LaTex Math image. in the descending order by LaTex Math image. and form the resulting matrix LaTex Math image. . Additionally, the means and variances of the initial dataset are returned.
Sign-flip technique
Eigenvectors computed by some eigenvalue solvers are not uniquely defined due to sign ambiguity. To get the deterministic result, a sign-flip technique should be applied. One of the sign-flip techniques proposed in [Bro07] requires the following modification of matrix
T
:
LaTex Math image.
where LaTex Math image. is
i
-th row, LaTex Math image. is the element in the
i
-th row and
j
-th column, LaTex Math image. is the signum function,
LaTex Math image.
Inference
Given the inference set LaTex Math image. of
p
-dimensional feature vectors and the LaTex Math image. matrix
T
produced at the training stage, the problem is to transform
X’
to the set LaTex Math image. , where LaTex Math image. is an
r
-dimensional feature vector, LaTex Math image. .
The feature vector LaTex Math image. is computed through applying linear transformation [Lang87] defined by the matrix
T
to the feature vector LaTex Math image. ,
LaTex Math image.
Inference methods:
Covariance
and
SVD
Covariance and SVD inference methods compute LaTex Math image. according to (1).

Programming Interface

All types and functions in this section are declared in the
oneapi::dal::pca
namespace and be available via inclusion of the
oneapi/dal/algo/pca.hpp
header file.
Descriptor
template<typename
Float
= detail::descriptor_base<>::float_t, typename
Method
= detail::descriptor_base<>::method_t, typename
Task
= detail::descriptor_base<>::task_t>
class
descriptor
Template Parameters
  • Float
    – The floating-point type that the algorithm uses for intermediate computations. Can be
    float
    or
    double
    .
  • Method
    – Tag-type that specifies an implementation of algorithm. Can be
    method::cov
    or
    method::svd
    .
  • Task
    – Tag-type that specifies type of the problem to solve. Can be
    task::dim_reduction
    .
Constructors
descriptor
(std::int64_t
component_count
= 0)
Creates a new instance of the class with the given
component_count
property value.
Public Methods
auto &
set_component_count
(int64_t
value
)
auto &
set_deterministic
(bool
value
)
Method tags
struct
cov
Tag-type that denotes Covariance computational method.
struct
svd
Tag-type that denotes SVD computational method.
using
by_default
= cov
Alias tag-type for Covariance computational method.
Task tags
struct
dim_reduction
Tag-type that parameterizes entities used for solving dimensionality reduction problem.
using
by_default
= dim_reduction
Alias tag-type for dimensionality reduction task.
Model
template<typename
Task
= task::by_default>
class
model
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
model
()
Creates a new instance of the class with the default property values.
Properties
const
table &
eigenvectors
= table{}
An LaTex Math image. table with the eigenvectors. Each row contains one eigenvector.
Getter & Setter


const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)

Training
train(...)
Input
template<typename
Task
= task::by_default>
class
train_input
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
train_input
(
const
table &
data
)
Creates a new instance of the class with the given
data
property value.
Properties
const
table &
data
= table{}
An LaTex Math image. table with the training data, where each row stores one feature vector.
Getter & Setter


const table & get_data() const
auto & set_data(const table &data)

Result
template<typename
Task
= task::by_default>
class
train_result
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
train_result
()
Creates a new instance of the class with the default property values.
Properties
const
table &
eigenvectors
= table{}
An LaTex Math image. table with the eigenvectors. Each row contains one eigenvector.
Getter & Setter


const table & get_eigenvectors() const

Invariants


eigenvectors == model.eigenvectors

const
model<Task> &
model
= model<Task>{}
The trained PCA model.
Getter & Setter


const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
eigenvalues
= table{}
A LaTex Math image. table that contains the eigenvalues for for the first LaTex Math image. features.
Getter & Setter


const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)

const
table &
variances
= table{}
A LaTex Math image. table that contains the variances for the first LaTex Math image. features.
Getter & Setter


const table & get_variances() const
auto & set_variances(const table &value)

const
table &
means
= table{}
A LaTex Math image. table that contains the mean values for the first LaTex Math image. features.
Getter & Setter


const table & get_means() const
auto & set_means(const table &value)

Inference
infer(...)
Input
template<typename
Task
= task::by_default>
class
infer_input
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
infer_input
(
const
model<Task> &
trained_model
,
const
table &
data
)
Creates a new instance of the class with the given
model
and
data
property values.
Properties
const
model<Task> &
model
= model<Task>{}
The trained PCA model.
Getter & Setter


const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
data
= table{}
The dataset for inference LaTex Math image. .
Getter & Setter


const table & get_data() const
auto & set_data(const table &value)

Result
template<typename
Task
= task::by_default>
class
infer_result
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be
task::dim_reduction
.
Constructors
infer_result
()
Creates a new instance of the class with the default property values.
Properties
const
table &
transformed_data
= table{}
An LaTex Math image. table that contains data projected to the LaTex Math image. principal components.
Getter & Setter


const table & get_transformed_data() const
auto & set_transformed_data(const table &value)

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.