Developer Guide and Reference

  • 2021.1
  • 12/04/2020
  • Public Content


Machine learning terms

Categorical feature
A feature with a discrete domain. Can be nominal or ordinal.
discrete feature, qualitative feature
predict what type of object is on the picture (a dog or a cat?), predict whether or not an email is spam
An unsupervised machine learning problem of grouping feature vectors into bunches, which are usually encoded as nominal values.
find big star clusters in the space images
Continuous feature
A feature with values in a domain of real numbers. Can be interval or ratio
quantitative feature, numerical feature
a person’s height, the price of the house
CSV file
A comma-separated values file (csv) is a type of a text file. Each line in a CSV file is a record containing fields that are separated by the delimiter. Fields can be of a numerical or a text format. Text usually refers to categorical values. By default, the delimiter is a comma, but, generally, it can be any character. For more details, see.
A collection of observations.
Dimensionality reduction
A problem of transforming a set of feature vectors from a high-dimensional space into a low-dimensional space while retaining meaningful properties of the original feature vectors.
A particular property or quality of a real object or an event. Has a defined type and domain. In machine learning problems, features are considered as input variable that are independent from each other.
attribute, variable, input variable
Feature vector
A vector that encodes information about real object, an event or a group of objects or events. Contains at least one feature.
A rectangle can be described by two features: its width and height
A process of applying a trainedmodel to the dataset in order to predict response values based on input feature vectors.
Inference set
A dataset used at the inference stage. Usually without responses.
Interval feature
A continuous feature with values that can be compared, added or subtracted, but cannot be multiplied or divided.
a time frame scale, a temperature in Celsius or Fahrenheit
A response with categorical or ordinal values. This is an output in classification and clustering problems.
the spam-detection problem has a binary label indicating whether the email is spam or not
An entity that stores information necessary to run inference on a new dataset. Typically a result of a training process.
in linear regression algorithm, the model contains weight values for each input feature and a single bias value
Nominal feature
A categorical feature without ordering between values. Only equality operation is defined for nominal features.
a person’s gender, color of a car
A feature vector and zero or more responses.
instance, sample
Ordinal feature
A categorical feature with defined operations of equality and ordering between values.
student’s grade
Observation which is significantly different from the other observations.
Ratio feature
A continuous feature with defined operations of equality, comparison, addition, subtraction, multiplication, and division. Zero value element means the absence of any value.
the height of a tower
predict temperature based on weather conditions
A property of some real object or event which dependency from feature vector need to be defined in supervised learning problem. While a feature is an input in the machine learning problem, the response is one of the outputs can be made by the model on the inference stage.
dependent variable
Supervised learning
Training process that uses a dataset with information about dependencies between features and responses. The goal is to get a model of dependencies between input feature vector and responses.
A process of creating a model based on information extracted from a training set. Resulting model is selected in accordance with some quality criteria.
Training set
A dataset used at the training stage to create a model.
Unsupervised learning
Training process that uses a training set with no responses. The goal is to find hidden patters inside feature vectors and dependencies between them.

oneDAL terms

A oneDAL concept for an object that provides access to the data of another object in the special data format. It abstracts data access from interface of an object and provides uniform access to the data stored in objects of different types.
Batch mode
The computation mode for an algorithm in oneDAL, where all the data needed for computation is available at the start and fits the memory of the device on which the computations are performed.
A oneDAL concept for an object that encapsulates the creation process of another object and enables its iterative creation.
Contiguous data
Data that are stored as one contiguous memory block. One of the characteristics of a data format.
Data format
Representation of the internal structure of the data.
data can be stored in array-of-structures or compressed-sparse-row format
Data layout
A characteristic of data format which describes the order of elements in a contiguous data block.
row-major format, where elements are stored row by row
Data type
An attribute of data used by a compiler to store and access them. Includes size in bytes, encoding principles, and available operations (in terms of a programming language).
Flat data
A block of contiguoushomogeneous data.
A method that returns the value of the private member variable.
std::int64_t get_row_count() const;
Heterogeneous data
Data which contain values either of different data types or different sets of operations defined on them. One of the characteristics of a data format.
A dataset with 100 observations of three interval features. The first two features are of float32 data type, while the third one is of float64 data type.
Homogeneous data
Data with values of single data type and the same set of available operations defined on them. One of the characteristics of a data format.
A dataset with 100 observations of three interval features, each of type float32
The object is immutable if it is not possible to change its state after creation.
Information about logical and physical structure of an object. All possible combinations of metadata values present the full set of possible objects of a given type. Metadata do not expose information that is not a part of a type definition, e.g. implementation details.
table object can contain three nominal features with 100 observations (logical part of metadata). This object can store data as sparse csr array and provides direct access to them (physical part)
Online mode
The computation mode for an algorithm in oneDAL, where the data needed for computation becomes available in parts over time.
Reference-counted object
A copy-constructible and copy-assignable oneDAL object which stores the number of references to the unique implementation. Both copy operations defined for this object are lightweight, which means that each time a new object is created, only the number of references is increased. An implementation is automatically freed when the number of references becomes equal to zero.
A method that accepts the only parameter and assigns its value to the private member variable.
void set_row_count(std::int64_t row_count);
A oneDAL concept for a dataset that contains only numerical data, categorical or continuous. Serves as a transfer of data between user’s application and computations inside oneDAL. Hides details of data format and generalizes access to the data.
A problem of applying a oneDAL algorithm to a dataset.

Common oneAPI terms

Application Programming Interface
Data Parallel C++ (DPC++) is a high-level language designed for data parallel programming productivity. DPC++ is based on SYCL* from the Khronos* Group to support data parallelism and heterogeneous programming.
OpenCL [OpenCLSpec] refers to CPU that controls the connected GPU executing kernels.
Just in Time Compilation — compilation during execution of a program.
Code written in OpenCL [OpenCLSpec] or SYCL and executed on a GPU device.
Standard Portable Intermediate Representation - V is a language for intermediate representation of compute kernels.
SYCL(TM) [SYCLSpec] — high-level programming model for OpenCL(TM) that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at