Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 11/07/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

vslSSEditMissingValues

Modifies pointers to arrays associated with the method of supporting missing values in a dataset.

Syntax

status = vslsSSEditMissingValues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates);

status = vsldSSEditMissingValues(task, nparams, params, init_estimates_n, init_estimates, prior_n, prior, simul_missing_vals_n, simul_missing_vals, estimates_n, estimates);

Include Files

  • mkl.h

Input Parameters

Name

Type

Description

task

VSLSSTaskPtr

Descriptor of the task

nparams

const MKL_INT*

Pointer to the number of method parameters

params

const float* for vslsSSEditMissingValues

const double* for vsldSSEditMissingValues

Pointer to the array of method parameters

init_estimates_n

const MKL_INT*

Pointer to the number of initial estimates for mean and a variance-covariance matrix

init_estimates

const float* for vslsSSEditMissingValues

const double* for vsldSSEditMissingValues

Pointer to the array that holds initial estimates for mean and a variance-covariance matrix

prior_n

const MKL_INT*

Pointer to the number of prior parameters

prior

const float* for vslsSSEditMissingValues

const double* for vsldSSEditMissingValues

Pointer to the array of prior parameters

simul_missing_vals_n

const MKL_INT*

Pointer to the size of the array that holds output of the Multiple Imputation method

simul_missing_vals

float* for vslsSSEditMissingValues

double* for vsldSSEditMissingValues

Pointer to the array of size k*m, where k is the total number of missing values, and m is number of copies of missing values. The array holds m sets of simulated missing values for the matrix of observations.

estimates_n

const MKL_INT*

Pointer to the number of estimates to be returned by the routine

estimates

float* for vslsSSEditMissingValues

double* for vsldSSEditMissingValues

Pointer to the array that holds estimates of the mean and a variance-covariance matrix.

Output Parameters

Name

Type

Description

status

int

Current status of the task

Description

The vslSSEditMissingValues routine uses values passed as parameters of the routine to replace pointers to the number and the array of the method parameters, pointers to the number and the array of initial mean/variance-covariance estimates, the pointer to the number and the array of prior parameters, pointers to the number and the array of simulated missing values, and pointers to the number and the array of the intermediate mean/covariance estimates. If you pass a value of NULL for a specific input parameter, the value of that parameter in the task descriptor is unchanged.

Before you call the Summary Statistics routines to process missing values, preprocess the dataset and denote missing observations with one of the following predefined constants:

  • VSL_SS_SNAN, if the dataset is stored in single precision floating-point arithmetic

  • VSL_SS_DNAN, if the dataset is stored in double precision floating-point arithmetic

Intel® oneAPI Math Kernel Library (oneMKL) provides theVSL_SS_METHOD_MI method to support missing values in the dataset based on the Multiple Imputation (MI) approach described in [Schafer97]. The following components support Multiple Imputation:

  • Expectation Maximization (EM) algorithm to compute the start point for the Data Augmentation (DA) procedure

  • DA function

NOTE:

The DA component of the MI procedure is simulation-based and uses the VSL_BRNG_MCG59 basic random number generator with predefined seed = 250 and the Gaussian distribution generator (ICDFmethod) available in Intel® oneAPI Math Kernel Library (oneMKL) [Gaussian].

Pack the parameters of the MI algorithm into the params array. Table "Structure of the Array of MI Parameters" describes the params structure.

Structure of the Array of MI Parameters

Array Position

Algorithm Parameter

Description

0

em_iter_num

Maximal number of iterations for the EM algorithm. By default, this value is 50.

1

da_iter_num

Maximal number of iterations for the DA algorithm. By default, this value is 30.

2

ε

Stopping criterion for the EM algorithm. The algorithm terminates if the maximal module of the element-wise difference between the previous and current parameter values is less than ε. By default, this value is 0.001.

3

m

Number of sets to impute

4

missing_vals_num

Total number of missing values in the datasets

You can also pass initial estimates into the EM algorithm by packing both the vector of means and the variance-covariance matrix as a one-dimensional array init_estimates. The size of the array should be at least p + p(p + 1)/2. For i=0, .., p-1, the init_estimates[i] array contains the initial estimate of means. The remaining positions of the array are occupied by the upper triangular part of the variance-covariance matrix.

If you provide no initial estimates for the EM algorithm, the editor uses the default values, that is, the vector of zero means and the unitary matrix as a variance-covariance matrix. You can also pass prior parameters for μ and Σ into the library: μ0, τ, m, and Λ-1. Pack these parameters as a one-dimensional array prior with a size of at least

(p2 + 3p + 4)/2.

The storage format is as follows:

  • prior[0], ..., prior[p-1] contain the elements of the vector μ0.

  • prior[p] contains the parameter τ.

  • prior[p+1] contains the parameter m.

  • The remaining positions are occupied by the upper-triangular part of the inverted matrix Λ-1.

If you provide no prior parameters, the editor uses their default values:

  • The array of p zeros is used as μ0.

  • τ is set to 0.

  • m is set to p.

  • The zero matrix is used as an initial approximate of Λ-1.

The EditMissingValues editor returns m sets of imputed values and/or a sequence of parameter estimates drawn during the DA procedure.

The editor returns the imputed values as the simul_missing_vals array. The size of the array should be sufficient to hold m sets each of the missing_vals_num size, that is, at least m*missing_vals_num in total. The editor packs the imputed values one by one in the order of their appearance in the matrix of observations.

For example, consider a task of dimension 4. The total number of observations n is 10. The second observation vector misses variables 1 and 2, and the seventh observation vector lacks variable 1. The number of sets to impute is m=2. Then, simul_missing_vals[0] and simul_missing_vals[1] contains the first and the second points for the second observation vector, and simul_missing_vals[2] holds the first point for the seventh observation. Positions 3, 4, and 5 are formed similarly.

To estimate convergence of the DA algorithm and choose a proper value of the number of DA iterations, request the sequence of parameter estimates that are produced during the DA procedure. The editor returns the sequence of parameters as a single array. The size of the array is

m*da_iter_num*(p+(p2+p)/2)

where

  • m is the number of sets of values to impute.

  • da_iter_num is the number of DA iterations.

  • The value p+(p2+p)/2 determines the size of the memory to hold one set of the parameter estimates.

In each set of the parameters, the vector of means occupies the first p positions and the remaining (p2+p)/2 positions are intended for the upper triangular part of the variance-covariance matrix.

Upon successful generation of m sets of imputed values, you can place them in cells of the data matrix with missing values and use the Summary Statistics routines to analyze and get estimates for each of the m complete datasets.

NOTE:

Intel® oneAPI Math Kernel Library (oneMKL) implementation of the MI algorithm rewrites cells of the dataset that contain theVSL_SS_SNAN/VSL_SS_DNAN values. If you want to use the Summary Statistics routines to process the data with missing values again, mask the positions of the empty cells.

See additional details of the algorithm usage model in the Intel® oneAPI Math Kernel Library (oneMKL) Summary Statistics Application Notes document [SS Notes].