Quality Metrics for Linear Regression
Given a data set
that contains vectors of input variables
,
respective responses
computed at the prediction stage of the linear regression model
defined by its coefficients
,
,
,
and expected responses
,
i = 1, …, n
,
the problem is to evaluate the linear regression model by computing the root mean square error,
variancecovariance matrix of beta coefficients, various statistics functions, and so on.
See Linear Regression for additional details and notations.For linear regressions, the library computes statistics listed in tables below
for testing insignificance of beta coefficients and one of the following values of
QualityMetricsId
: singleBetafor a single coefficient
 groupOfBetasfor a group of coefficients
For more details, see [Hastie2009].
Details
The statistics are computed given the following assumptions about the data distribution:
 Responses ,i = 1, …, n, are independent and have a constant variance ,j = 1, …, k
 Conditional expectation of responses ,j = 1, …, k, is linear in input variables
 Deviations of ,i = 1, …, n, around the mean of expected responses ,j = 1, …, k, are additive and Gaussian.
Testing Insignificance of a Single Beta
The library uses the following quality metrics:
Quality Metric  Definition 

Root Mean Square (RMS) Error 
, j = 1, …, k 
Vector of variances

, j = 1, …, k 
A set of variancecovariance matrices
for vectors of betas
, j = 1, …, k 
, j = 1, …, k 
Zscore statistics used in testing of insignificance of a single coefficient

, j = 1, …, k ,
is the j th element of the vector of variance
and
is the t th diagonal element of the matrix

Confidence interval for

, j = 1, …, k ,
is the
percentile of the Gaussian distribution,
is the j th element of the vector of variance
,
is the t th diagonal element of the matrix

Testing Insignificance of a Group of Betas
The library uses the following quality metrics:
Quality Metric  Definition 

Mean of expected responses,

, j = 1, …, k 
Variance of expected responses,

, j = 1, …, k 
Regression Sum of Squares

, j = 1, …, k 
Sum of Squares of Residuals

, j = 1, …, k 
Total Sum of Squares

, j = 1, …, k 
Determination Coefficient

, j = 1, …, k 
Fstatistics used in testing insignificance of a group of betas

, j = 1, …, k ,
where
are computed for a model with
betas and
are computed for a reduced model with
betas (
betas are set to zero) 
Batch Processing
Testing Insignificance of a Single Beta
Algorithm Input
The quality metric algorithm for linear regression accepts the input described below.
Pass the
Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.Input ID  Input 

expectedResponses  Pointer to the
numeric table with responses ( k dependent variables) used for training the linear regression model.This table can be an object of any class derived from NumericTable . 
model  Pointer to the model computed at the training stage of the linear regression algorithm. The model can only be an object of the linear_regression::Model class. 
predictedResponses  Pointer to the
numeric table with responses ( k dependent variables) computed at the prediction stage of the linear regression algorithm.This table can be an object of any class derived from NumericTable . 
Algorithm Parameters
The quality metric algorithm for linear regression has the following parameters:
Parameter  Default Value  Description 

algorithmFPType  float  The floatingpoint type that the algorithm uses for intermediate computations. Can be float or double . 
method  defaultDense  Performanceoriented computation method, the only method supported by the algorithm. 
alpha  0.05  Significance level used in the computation of confidence intervals for coefficients of the linear regression model. 
accuracyThreshold  0.001  Values below this threshold are considered equal to it. 
Algorithm Output
The quality metric algorithm for linear regression calculates the result described below.
Pass the
Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.Result ID  Result 

rms  Pointer to the
numeric table that contains root mean square errors computed for each response (dependent variable) By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class
derived from NumericTable , except for PackedTriangularMatrix , PackedSymmetricMatrix , and CSRNumericTable . 
variance  Pointer to the
numeric table that contains variances
, j = 1, …, k computed for each response (dependent variable).By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class
derived from NumericTable , except for PackedTriangularMatrix , PackedSymmetricMatrix , and CSRNumericTable . 
betaCovariances  Pointer to the DataCollection object that contains k numeric tables, each with the
variancecovariance matrix for betas of the jth response (dependent variable), where m is the number of betas in the model (m is equal to p when interceptFlag is set to false at the training stage of the linear regression algorithm; otherwise, m is equal to p + 1 ).The collection can contain objects of any class derived from NumericTable . 
zScore  Pointer to the
numeric table that contains the Zscore statistics used in the testing of insignificance of individual linear regression coefficients,
where m is the number of betas in the model (m is equal to p when interceptFlag is set to false at the training stage
of the linear regression algorithm; otherwise, m is equal to
).By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class
derived from NumericTable , except for PackedTriangularMatrix , PackedSymmetricMatrix , and CSRNumericTable . 
confidenceIntervals  Pointer to the
numeric table that contains limits of the confidence intervals for linear regression coefficients:
where m is the number of betas in the model (m is equal to p when interceptFlag is set to false at the training stage
of the linear regression algorithm; otherwise, m is equal to
).By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class
derived from NumericTable , except for PackedTriangularMatrix , PackedSymmetricMatrix , and CSRNumericTable . 
inverseOfXtX  Pointer to the
numeric table that contains the
matrix,
where m is the number of betas in the model (m is equal to p when interceptFlag is set to false at the training stage
of the linear regression algorithm; otherwise, m is equal to
). 
Testing Insignificance of a Group of Betas
Algorithm Input
The quality metric algorithm for linear regression accepts the input described below.
Pass the
Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.Input ID  Input 

expectedResponses  Pointer to the
numeric table with responses ( k dependent variables) used for training the linear regression model.This table can be an object of any class derived from NumericTable . 
predictedResponses  Pointer to the
numeric table with responses ( k dependent variables) computed at the prediction stage of the linear regression algorithm.This table can be an object of any class derived from NumericTable . 
predictedReducedModelResponses  Pointer to the
numeric table with responses ( k dependent variables) computed at the prediction stage of the linear regression algorithm
using the reduced linear regression model, where
out of p beta coefficients are set to zero.This table can be an object of any class derived from NumericTable . 
Algorithm Parameters
The quality metric algorithm for linear regression has the following parameters:
Parameter  Default Value  Description 

algorithmFPType  float  The floatingpoint type that the algorithm uses for intermediate computations. Can be float or double . 
method  defaultDense  Performanceoriented computation method, the only method supported by the algorithm. 
numBeta  0  Number of beta coefficients used for prediction. 
numBetaReducedModel  0  Number of beta coefficients (
) used for prediction with the reduced linear regression model,
where
out of p beta coefficients are set to zero. 
Algorithm Output
The quality metric algorithm for linear regression calculates the result described below.
Pass the
Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.Result ID  Result 

expectedMeans  Pointer to the
numeric table that contains the mean of expected responses computed for each dependent variable. 
expectedVariance  Pointer to the
numeric table that contains the variance of expected responses computed for each dependent variable. 
regSS  Pointer to the
numeric table that contains the regression sum of squares computed for each dependent variable. 
resSS  Pointer to the
numeric table that contains the sum of squares of residuals computed for each dependent variable. 
tSS  Pointer to the
numeric table that contains the total sum of squares computed for each dependent variable. 
determinationCoeff  Pointer to the
numeric table that contains the determination coefficient computed for each dependent variable. 
fStatistics  Pointer to the
numeric table that contains the Fstatistics computed for each dependent variable. 
By default, these results are objects of the
HomogenNumericTable
class, but you can define the result as an object of any class
derived from NumericTable
, except for PackedTriangularMatrix
, PackedSymmetricMatrix
, and CSRNumericTable
.Examples
C++ (CPU)
Batch Processing:
Java*