Batch Processing

Input

Centroid initialization for K-Means clustering accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Input ID

Input

data

Pointer to the n x p numeric table with the data to be clustered. The input can be an object of any class derived from NumericTable.

Parameters

The following table lists parameters of centroid initialization for K-Means clustering, which depend on the initialization method parameter method.

Parameter

method

Default Value

Description

algorithmFPType

any

float

The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

method

Not applicable

defaultDense

Available initialization methods for K-Means clustering:

  • defaultDense - uses first nClusters points as initial centroids

  • deterministicCSR - uses first nClusters points as initial centroids for data in a CSR numeric table

  • randomDense - uses random nClusters points as initial centroids

  • randomCSR - uses random nClusters points as initial centroids for data in a CSR numeric table

  • plusPlusDense - uses K-Means++ algorithm [Arthur2007]

  • plusPlusCSR - uses K-Means++ algorithm for data in a CSR numeric table

  • parallelPlusDense - uses parallel K-Means++ algorithm [Bahmani2012]

  • parallelPlusCSR - uses parallel K-Means++ algorithm for data in a CSR numeric table

For more details, see the algorithm description.

nClusters

any

Not applicable

The number of clusters. Required.

nTrails

parallelPlusDense, parallelPlusCSR

1

The number of trails to generate all clusters but the first initial cluster. For details, see [Arthur2007], section 5

oversamplingFactor

parallelPlusDense, parallelPlusCSR

0.5

A fraction of nClusters in each of nRounds of parallel K-Means++. L=nClusters*oversamplingFactor points are sampled in a round. For details, see [Bahmani2012], section 3.3.

nRounds

parallelPlusDense, parallelPlusCSR

5

The number of rounds for parallel K-Means++. (L*nRounds) must be greater than nClusters. For details, see [Bahmani2012], section 3.3.

engine

any

SharePtr< engines:: mt19937:: Batch>()

Pointer to the random number generator engine that is used internally for random numbers generation.

Output

Centroid initialization for K-Means clustering calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Result ID

Result

centroids

Pointer to the nClusters x p numeric table with the cluster centroids. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)