# Principal Components Analysis (PCA)

Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.
## Programming Interface

All types and functions in this section are declared in the
oneapi::dal::pca
namespace and be available via inclusion of the
oneapi/dal/algo/pca.hpp
Descriptor
template<typename
Float
= float, typename
Method
= method::by_default, typename
class
descriptor
Template Parameters
• Float
– The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
• Method
– Tag-type that specifies an implementation of algorithm. Can be
method::cov
or
method::svd
.
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
descriptor
(std::int64_t
component_count
= 0)
Creates a new instance of the class with the given
component_count
property value.
Properties
std::int64_t
component_count
The number of principal components . If it is zero, the algorithm computes the eigenvectors for all features, .
Default value
: 0.
Getter & Setter

std::int64_t get_component_count() const
auto & set_component_count(int64_t value)

Invariants

component_count >= 0

bool
deterministic
Specifies whether the algorithm applies the sign-flip technique. If it is
true
, the directions of the eigenvectors must be deterministic.
Default value
: true.
Getter & Setter

bool get_deterministic() const
auto & set_deterministic(bool value)

Method tags
struct
cov
Tag-type that denotes Covariance computational method.
struct
svd
Tag-type that denotes SVD computational method.
using
by_default
= cov
Alias tag-type for Covariance computational method.
struct
dim_reduction
Tag-type that parameterizes entities used for solving dimensionality reduction problem.
using
by_default
= dim_reduction
Alias tag-type for dimensionality reduction task.
Model
template<typename
class
model
Template Parameters
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
model
()
Creates a new instance of the class with the default property values.
Properties
const
table &
eigenvectors
An table with the eigenvectors. Each row contains one eigenvector.
Default value
: table{}.
Getter & Setter

const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)

Training
train(...)
Input
template<typename
class
train_input
Template Parameters
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
train_input
(
const
table &
data
)
Creates a new instance of the class with the given
data
property value.
Properties
const
table &
data
An table with the training data, where each row stores one feature vector.
Default value
: table{}.
Getter & Setter

const table & get_data() const
auto & set_data(const table &data)

Result
template<typename
class
train_result
Template Parameters
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
train_result
()
Creates a new instance of the class with the default property values.
Public Methods
const
table &
get_eigenvectors
()
const
An table with the eigenvectors. Each row contains one eigenvector.
Properties
const
model
The trained PCA model.
Default value
Getter & Setter

const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
means
A table that contains the mean values for the first
r
features.
Default value
: table{}.
Getter & Setter

const table & get_means() const
auto & set_means(const table &value)

const
table &
eigenvalues
A table that contains the eigenvalues for for the first
r
features.
Default value
: table{}.
Getter & Setter

const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)

const
table &
variances
A table that contains the variances for the first
r
features.
Default value
: table{}.
Getter & Setter

const table & get_variances() const
auto & set_variances(const table &value)

Operation
template<typename
Descriptor
> pca::train_result
train
(
const
Descriptor &
desc
,
const
pca::train_input &
input
)
Parameters
• desc
– PCA algorithm descriptor
• input
– Input data for the training operation
Preconditions

input.data.has_data == true
input.data.column_count >= desc.component_count

Postconditions

result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances[i] >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count

Inference
infer(...)
Input
template<typename
class
infer_input
Template Parameters
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
infer_input
(
const
trained_model
,
const
table &
data
)
Creates a new instance of the class with the given
model
and
data
property values.
Properties
const
model
The trained PCA model.
Default value
Getter & Setter

const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
data
The dataset for inference .
Default value
: table{}.
Getter & Setter

const table & get_data() const
auto & set_data(const table &value)

Result
template<typename
class
infer_result
Template Parameters
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
infer_result
()
Creates a new instance of the class with the default property values.
Properties
const
table &
transformed_data
An table that contains data projected to the
r
principal components.
Default value
: table{}.
Getter & Setter

const table & get_transformed_data() const
auto & set_transformed_data(const table &value)

Operation
template<typename
Descriptor
> pca::infer_result
infer
(
const
Descriptor &
desc
,
const
pca::infer_input &
input
)
Parameters
• desc
– PCA algorithm descriptor
• input
– Input data for the inference operation
Preconditions

input.data.has_data == true
input.model.eigenvectors.row_count == desc.component_count
input.model.eigenvectors.column_count == input.data.column_count

Postconditions

result.transformed_data.row_count == input.data.row_count
result.transformed_data.column_count == desc.component_count

## Usage example

Training
``````pca::model<> run_training(const table& data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(5)
.set_deterministic(true);

const auto result = train(pca_desc, data);

print_table("means", result.get_means());
print_table("variances", result.get_variances());
print_table("eigenvalues", result.get_eigenvalues());
print_table("eigenvectors", result.get_eigenvectors());

return result.get_model();
}``````
Inference
``````table run_inference(const pca::model<>& model,
const table& new_data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(model.get_component_count());

const auto result = infer(pca_desc, model, new_data);

print_table("labels", result.get_transformed_data());
}``````

## Examples

oneAPI DPC++
Batch Processing:
oneAPI C++
Batch Processing:
Python* with DPC++ support
Batch Processing:

