Logistic Regression
Logistic regression is a method for modeling the relationships
between one or more explanatory variables and a categorical variable
by expressing the posterior statistical distribution of the
categorical variable via linear functions on observed data. If the
categorical variable is binary, taking only two values, “0” and “1”,
the logistic regression is simple, otherwise, it is multinomial.
Details
Given n feature vectors of n p-dimensional feature vectors a vector
of class labels
, where
and
belongs, the problem is to
train a logistic regression model.
K
is the number of classes, describes the
class to which the feature vector
The logistic regression model is the set of vectors
that gives the posterior probability
If the categorical variable is binary, the model is defined as a single vector
that determines the posterior probability
Training Stage
Training procedure is an iterative algorithm which minimizes
objective function
where the first term is the negative log-likelihood of conditional
values),
and
are non-negative regularization
parameters applied to L1 and L2 norm of vectors in
.
Y
given X
, and the latter terms are regularization ones that
penalize the complexity of the model (large
For more details, see [Hastie2009], [Bishop2006].
For the objective function minimization the library supports the
iterative algorithms defined by the interface of
daal::algorithms::iterative_solver. See Iterative Solver.
Prediction Stage
Given logistic regression model and vectors
, the problem is to calculate the responses for those
vectors, and their probabilities and logarithms of probabilities
if required. The computation is based on formula (1) in
multinomial case and on formula (2) in binary case.
Usage of Training Alternative
To build a Logistic Regression model using methods of the Model Builder class of Logistic Regression,
complete the following steps:
- Create a Logistic Regression model builder using a constructor with the required number of responses and features.
- Use thesetBetamethod to add the set of pre-calculated coefficients to the model. Specify random access iterators to the first and the last element of the set of coefficients [ISO/IEC 14882:2011 §24.2.7]_.If your set of coefficients does not contain an intercept, interceptFlag is automatically set toFalse, and toTrue, otherwise.
- Use thegetModelmethod to get the trained Logistic Regression model.
- Use thegetStatusmethod to check the status of the model building process. IfDAAL_NOTHROW_EXCEPTIONSmacros is defined, the status report contains the list of errors that describe the problems API encountered (in case of API runtime failure).
If after calling the
coefficients.
getModel
method you use the setBeta
method to update coefficients,
the initial model will be automatically updated with the new
Examples
C++ (CPU)
Java*
There is no support for Java on GPU.
Python*
Batch Processing
Logistic regression algorithm follows the general workflow described
in Classification Usage Model.
Training
For a description of the input and output, refer to Classification Usage Model.
In addition to the parameters of classifier described in Classification Usage Model,
the logistic regression batch training algorithm has the following parameters:
Parameter | Default Value | Description |
---|---|---|
algorithmFPType | float | The floating-point type that the algorithm uses for intermediate computations. Can be float or double . |
method | defaultDense | The computation method used by the logistic regression. The only
training method supported so far is the default dense method. |
nClasses | Not applicable | The number of classes. A required parameter. |
interceptFlag | True | A flag that indicates a need to compute
|
penaltyL1 | 0 | L1 regularization coefficient L1 regularization is not supported on GPU. |
penaltyL2 | 0 | L2 regularization coefficient |
optimizationSolver | All iterative solvers are available as optimization procedures to use at the training stage: |
Prediction
For a description of the input, refer to Classification Usage Model.
At the prediction stage logistic regression batch algorithm has the following parameters:
Parameter | Default Value | Description |
---|---|---|
algorithmFPType | float | The floating-point type that the algorithm uses for intermediate
computations. Can be float or double . |
method | defaultDense | The computation method used by logistic regression. The only prediction
method supported so far is the default dense method. |
nClasses | Not applicable | The number of classes. A required parameter. |
resultsToCompute | computeClassesLabels | The 64-bit integer flag that specifies which extra characteristics of
the logistic regression to compute. Provide one of the following values to request a single characteristic
or use bitwise OR to request a combination of the characteristics:
|
Output
In addition to classifier output, logistic regression prediction calculates the result described below.
Pass the
Result ID
as a parameter to the methods that access the results of your algorithm.Result ID | Result |
---|---|
probabilities | A numeric table of size
computeClassesProbabilities option is enabled. |
logProbabilities | A numeric table of size
computeClassesLogProbabilities option is enabled. |
Note that:
- IfresultsToComputedoes not containcomputeClassesLabels, thepredictiontable isNULL.
- IfresultsToComputedoes not containcomputeClassesProbabilities, theprobabilitiestable isNULL.
- IfresultsToComputedoes not containcomputeClassesLogProbabilities, thelogProbabilitiestable isNULL.
- By default, each numeric table of this result is an object of theHomogenNumericTableclass, but you can define the result as an object of any class derived fromNumericTableexcept forPackedSymmetricMatrixandPackedTriangularMatrix.
Examples
C++ (CPU)
Batch Processing:
Java*
There is no support for Java on GPU.
Batch Processing:
Python* with DPC++ support
Batch Processing:
Python*
Batch Processing: