Developer Guide

Contents

Batch Processing

Input

Centroid initialization for K-Means clustering accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.
Input ID
Input
data
Pointer to the
n
x
p
numeric table with the data to be clustered. The input can be an object of any class derived from
NumericTable
.

Parameters

The following table lists parameters of centroid initialization for K-Means clustering, which depend on the initialization method parameter
method
.
Parameter
method
Default Value
Description
algorithmFPType
any
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
Not applicable
defaultDense
Available initialization methods for K-Means clustering:
  • defaultDense
    - uses first
    nClusters
    points as initial centroids
  • deterministicCSR
    - uses first
    nClusters
    points as initial centroids for data in a CSR numeric table
  • randomDense
    - uses random
    nClusters
    points as initial centroids
  • randomCSR
    - uses random
    nClusters
    points as initial centroids for data in a CSR numeric table
  • plusPlusDense
    - uses K-Means++ algorithm [Arthur2007]
  • plusPlusCSR
    - uses K-Means++ algorithm for data in a CSR numeric table
  • parallelPlusDense
    - uses parallel K-Means++ algorithm [Bahmani2012]
  • parallelPlusCSR
    - uses parallel K-Means++ algorithm for data in a CSR numeric table
For more details, see the algorithm description.
nClusters
any
Not applicable
The number of clusters. Required.
nTrials
parallelPlusDense
,
parallelPlusCSR
1
The number of trails to generate all clusters but the first initial cluster. For details, see [Arthur2007], section 5
oversampling
Factor
parallelPlusDense
,
parallelPlusCSR
0.5
A fraction of
nClusters
in each of
nRounds
of parallel K-Means++.
L
=
nClusters
*
oversamplingFactor
points are sampled in a round. For details, see [Bahmani2012], section 3.3.
nRounds
parallelPlusDense
,
parallelPlusCSR
5
The number of rounds for parallel K-Means++. (
L
*
nRounds
) must be greater than
nClusters
. For details, see [Bahmani2012], section 3.3.
engine
any
SharePtr< engines:: mt19937:: Batch>()
Pointer to the random number generator engine that is used internally for random numbers generation.

Output

Centroid initialization for K-Means clustering calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.
Result ID
Result
centroids
Pointer to the
nClusters
x
p
numeric table with the cluster centroids. By default, this result is an object of the
HomogenNumericTable
class, but you can define the result as an object of any class derived from
NumericTable
except
PackedTriangularMatrix
,
PackedSymmetricMatrix
, and
CSRNumericTable
.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804