Distributed Processing

You can use linear or ridge regression in the distributed processing mode only at the training stage.

This computation mode assumes that the data set is split in nblocks blocks across computation nodes.

Training

Algorithm Parameters

The following table lists parameters of linear and ridge regressions at the training stage in the distributed processing mode. Some of these parameters or their values are specific to a linear or ridge regression algorithm.

Parameter

Algorithm

Default Value

Description

computeStep

any

Not applicable

The parameter required to initialize the algorithm. Can be:

  • step1Local - the first step, performed on local nodes
  • step2Master - the second step, performed on a master node

algorithmFPType

any

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

linear regression

defaultDense

Available methods for linear regression training:

  • defaultDense - the normal equations method
  • qrDense - the method based on QR decomposition

ridge regression

Default computation method used by the ridge regression. The only method supported at the training stage is the normal equations method.

ridgeParameters

ridge regression

Numeric table of size 1 x 1 that contains the default ridge parameter equal to 1.

The numeric table of size 1 x k (k is the number of dependent variables) or 1 x 1. The contents of the table depend on its size:

  • size = 1 x k: values of the ridge parameters λj for j = 1, …, k.
  • size = 1 x 1: the value of the ridge parameter for each dependent variable λ1 = ... = λk.

This parameter can be an object of any class derived from NumericTable, except for PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

interceptFlag

any

true

A flag that indicates a need to compute β0j.

Use the two-step computation schema for linear or ridge regression training in the distributed processing mode, as illustrated below:

Step 1 - on Local Nodes


Linear Regression Training, Distributed Processing, Workflow Step 1

In this step, linear or ridge regression training accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID

Input

data

Pointer to the ni x p numeric table that represents the i-th data block on the local node. This table can be an object of any class derived from NumericTable.

dependentVariables

Pointer to the ni x k numeric table with responses associated with the i-th data block. This table can be an object of any class derived from NumericTable.

In this step, linear or ridge regression training calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID

Result

partialModel

Pointer to the partial linear regression model that corresponds to the i-th data block. The result can only be an object of the Model class.

Step 2 - on Master Node


Linear Regression Training, Distributed Processing, Workflow Step 2

In this step, linear or ridge regression training accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID

Input

partialModels

A collection of partial models computed on local nodes in Step 1. The collection contains objects of the Model class.

In this step, linear or ridge regression training calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID

Result

model

Pointer to the linear or ridge regression model being trained. The result can only be an object of the Model class.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)