Distributed Processing
This mode assumes that the data set is split into
nblocks
blocks across computation nodes.Algorithm Parameters
The KMeans clustering algorithm in the distributed processing mode has the following parameters:
Parameter  Default Value  Description 

computeStep  Not applicable  The parameter required to initialize the algorithm. Can be:

algorithmFPType  float  The floatingpoint type that the algorithm uses for intermediate computations. Can be float or double . 
method  defaultDense  Available computation methods for KMeans clustering:

nClusters  Not applicable  The number of clusters. Required to initialize the algorithm. 
gamma  1.0  The weight to be used in distance calculation for binary categorical features. 
distanceType  euclidean  The measure of closeness between points (observations) being clustered. The only distance type supported so far is the Euclidian distance. 
assignFlag  false  A flag that enables computation of assignments, that is, assigning cluster indices to respective observations. 
To compute KMeans clustering in the distributed processing mode, use the general schema described in Algorithms as follows:
Step 1  on Local Nodes
In this step, the KMeans clustering algorithm accepts the input described below.
Pass the
Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.Input ID  Input 

data  Pointer to the
numeric table that represents the i th data block on the local node.
The input can be an object of any class derived from NumericTable . 
inputCentroids  Pointer to the
numeric table with the initial cluster centroids.
This input can be an object of any class derived from NumericTable. 
In this step, the KMeans clustering algorithm calculates the partial results and results described below.
Pass the
Partial Result ID
or Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.Partial Result ID  Result 

nObservations  Pointer to the
numeric table that contains
the number of observations assigned to the clusters on local node. By default, this result is an object of the HomogenNumericTable class,
but you can define this result as an object of any class derived from NumericTable except CSRNumericTable . 
partialSums  Pointer to the
numeric table with
partial sums of observations assigned to the clusters on the local node. By default, this result is an object of the HomogenNumericTable class,
but you can define the result as an object of any class derived from NumericTable
except PackedTriangularMatrix , PackedSymmetricMatrix , and CSRNumericTable . 
partialObjectiveFunction  Pointer to the
numeric table that contains the value of the partial objective function
for observations processed on the local node. By default, this result is an object of the HomogenNumericTable class,
but you can define this result as an object of any class derived from NumericTable except CSRNumericTable . 
partialCandidatesDistances  Pointer to the
numeric table that contains the value of the nClusters
largest objective function for the observations processed on the local node and stored in descending order.By default, this result if an object of the HomogenNumericTable class,
but you can define this result as an object of any class derived from NumericTable
except PackedTriangularMatrix , PackedSymmetricMatrix , CSRNumericTable . 
partialCandidatesCentroids  Pointer to the
numeric table that contains the observations of the nClusters
largest objective function value processed on the local node and stored in descending order of the objective function.By default, this result if an object of the HomogenNumericTable class,
but you can define this result as an object of any class derived from NumericTable
except PackedTriangularMatrix , PackedSymmetricMatrix , CSRNumericTable . 
Result ID  Result 

assignments  Use when assignFlag = true . Pointer to the
numeric table
with 32bit integer assignments of cluster indices to feature vectors in the input data on the local node.By default, this result is an object of the HomogenNumericTable class,
but you can define this result as an object of any class derived from NumericTable
except PackedTriangularMatrix , PackedSymmetricMatrix , and CSRNumericTable . 
Step 2  on Master Node
In this step, the KMeans clustering algorithm accepts the input from each local node described below.
Pass the
Input ID
as a parameter to the methods that provide input for your algorithm.
For more details, see Algorithms.In this step, the KMeans clustering algorithm calculates the results described below.
Pass the
Result ID
as a parameter to the methods that access the results of your algorithm.
For more details, see Algorithms.Result ID  Result 

centroids  Pointer to the
numeric table with centroids. By default, this result is an object of the HomogenNumericTable class,
but you can define the result as an object of any class derived from NumericTable
except PackedTriangularMatrix , PackedSymmetricMatrix , and CSRNumericTable . 
objectiveFunction  Pointer to the
numeric table that contains the value of the objective function. By default, this result is an object of the HomogenNumericTable class,
but you can define this result as an object of any class derived from NumericTable except CSRNumericTable . 
The algorithm computes assignments using input centroids.
Therefore, to compute assignments using final computed centroids, after the last call to
Step2compute()
method on the master node,
on each local node set assignFlag to true and do one additional call to Step1compute()
and finalizeCompute()
methods.
Always set assignFlag to true and call finalizeCompute()
to obtain assignments in each step.To compute assignments using original
inputCentroids
on the given node,
you can use KMeans clustering algorithm in the batch processing mode with the subset of the data available on this node.
See Batch Processing for more details.