Developer Guide and Reference

  • 2021.2
  • 03/26/2021
  • Public Content
Contents

Batch Processing

Input

Centroid initialization for K-Means clustering accepts the input described below. Pass the
Input ID
as a parameter to the methods that provide input for your algorithm.
Input ID
Input
data
Pointer to the LaTex Math image. numeric table with the data to be clustered.
The input can be an object of any class derived from
NumericTable
.

Parameters

The following table lists parameters of centroid initialization for K-Means clustering, which depend on the initialization method parameter method.
Parameter
method
Default Value
Description
algorithmFPType
any
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
Not applicable
defaultDense
Available initialization methods for K-Means clustering:
For CPU:
  • defaultDense
    - uses first nClusters points as initial centroids
  • deterministicCSR
    - uses first nClusters points as initial centroids for data in a CSR numeric table
  • randomDense
    - uses random nClusters points as initial centroids
  • randomCSR
    - uses random nClusters points as initial centroids for data in a CSR numeric table
  • plusPlusDense
    - uses K-Means++ algorithm [Arthur2007]
  • plusPlusCSR
    - uses K-Means++ algorithm for data in a CSR numeric table
  • parallelPlusDense
    - uses parallel K-Means++ algorithm [Bahmani2012]
  • parallelPlusCSR
    - uses parallel K-Means++ algorithm for data in a CSR numeric table
For GPU:
  • defaultDense
    - uses first nClusters points as initial centroids
  • randomDense
    - uses random nClusters points as initial centroids
nClusters
any
Not applicable
The number of clusters. Required.
nTrials
  • parallelPlusDense
  • parallelPlusCSR
1
The number of trails to generate all clusters but the first initial cluster. For details, see [Arthur2007], section 5
oversamplingFactor
  • parallelPlusDense
  • parallelPlusCSR
0.5
A fraction of nClusters in each of nRounds of parallel K-Means++. L=nClusters*oversamplingFactor points are sampled in a round. For details, see [Bahmani2012], section 3.3.
nRounds
  • parallelPlusDense
  • parallelPlusCSR
5
The number of rounds for parallel K-Means++. (L*nRounds) must be greater than nClusters. For details, see [Bahmani2012], section 3.3.
engine
any
SharePtr< engines:: mt19937:: Batch>()
Pointer to the random number generator engine that is used internally for random numbers generation.

Output

Centroid initialization for K-Means clustering calculates the result described below. Pass the
Result ID
as a parameter to the methods that access the results of your algorithm.
Result ID
Result
centroids
Pointer to the LaTex Math image. numeric table with the cluster centroids.
By default, this result is an object of the
HomogenNumericTable
class, but you can define the result as an object of any class derived from
NumericTable
except for
PackedTriangularMatrix
,
PackedSymmetricMatrix
, and
CSRNumericTable
.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.