Contents

vslSSEditMissingValues

Modifies pointers to arrays associated with the method of supporting missing values in a dataset.

Syntax

status
=
vslsSSEditMissingValues
(
task
,
nparams
,
params
,
init_estimates_n
,
init_estimates
,
prior_n
,
prior
,
simul_missing_vals_n
,
simul_missing_vals
,
estimates_n
,
estimates
);
status
=
vsldSSEditMissingValues
(
task
,
nparams
,
params
,
init_estimates_n
,
init_estimates
,
prior_n
,
prior
,
simul_missing_vals_n
,
simul_missing_vals
,
estimates_n
,
estimates
);
Include Files
  • mkl.h
Input Parameters
Name
Type
Description
task
VSLSSTaskPtr
Descriptor of the task
nparams
const MKL_INT*
Pointer to the number of method parameters
params
const float*
for
vslsSSEditMissingValues
const double*
for
vsldSSEditMissingValues
Pointer to the array of method parameters
init_estimates_n
const MKL_INT*
Pointer to the number of initial estimates for mean and a variance-covariance matrix
init_estimates
const float*
for
vslsSSEditMissingValues
const double*
for
vsldSSEditMissingValues
Pointer to the array that holds initial estimates for mean and a variance-covariance matrix
prior_n
const MKL_INT*
Pointer to the number of prior parameters
prior
const float*
for
vslsSSEditMissingValues
const double*
for
vsldSSEditMissingValues
Pointer to the array of prior parameters
simul_missing_vals_n
const MKL_INT*
Pointer to the size of the array that holds output of the Multiple Imputation method
simul_missing_vals
float*
for
vslsSSEditMissingValues
double*
for
vsldSSEditMissingValues
Pointer to the array of size
k
*
m
, where
k
is the total number of missing values, and
m
is number of copies of missing values. The array holds
m
sets of simulated missing values for the matrix of observations.
estimates_n
const MKL_INT*
Pointer to the number of estimates to be returned by the routine
estimates
float*
for
vslsSSEditMissingValues
double*
for
vsldSSEditMissingValues
Pointer to the array that holds estimates of the mean and a variance-covariance matrix.
Output Parameters
Name
Type
Description
status
int
Current status of the task
Description
The
vslSSEditMissingValues
routine uses values passed as parameters of the routine to replace pointers to the number and the array of the method parameters, pointers to the number and the array of initial mean/variance-covariance estimates, the pointer to the number and the array of prior parameters, pointers to the number and the array of simulated missing values, and pointers to the number and the array of the intermediate mean/covariance estimates. If you pass a value of
NULL
for a specific input parameter, the value of that parameter in the task descriptor is unchanged.
Before you call the Summary Statistics routines to process missing values, preprocess the dataset and denote missing observations with one of the following predefined constants:
  • VSL_SS_SNAN
    , if the dataset is stored in single precision floating-point arithmetic
  • VSL_SS_DNAN
    , if the dataset is stored in double precision floating-point arithmetic
Intel® MKL
provides the
VSL_SS_METHOD_MI
method to support missing values in the dataset based on the Multiple Imputation (MI) approach described in [Schafer97]. The following components support Multiple Imputation:
  • Expectation Maximization (EM) algorithm to compute the start point for the Data Augmentation (DA) procedure
  • DA function
The DA component of the MI procedure is simulation-based and uses the
VSL_BRNG_MCG59
basic random number generator with predefined
seed
= 2
50
and the Gaussian distribution generator (
ICDF
method) available in
Intel® MKL
[
Gaussian
].
Pack the parameters of the MI algorithm into the
params
array. Table
"Structure of the Array of MI Parameters"
describes the
params
structure.
Structure of the Array of MI Parameters
Array Position
Algorithm Parameter
Description
0
em_iter_num
Maximal number of iterations for the EM algorithm. By default, this value is 50.
1
da_iter_num
Maximal number of iterations for the DA algorithm. By default, this value is 30.
2
ε
Stopping criterion for the EM algorithm. The algorithm terminates if the maximal module of the element-wise difference between the previous and current parameter values is less than
ε
. By default, this value is 0.001.
3
m
Number of sets to impute
4
missing_vals_num
Total number of missing values in the datasets
You can also pass initial estimates into the EM algorithm by packing both the vector of means and the variance-covariance matrix as a one-dimensional array
init_estimates
. The size of the array should be at least
p
+
p
(
p
+ 1)/2. For
i
=0, ..,
p
-1, the
init_estimates
[
i
] array contains the initial estimate of means. The remaining positions of the array are occupied by the upper triangular part of the variance-covariance matrix.
If you provide no initial estimates for the EM algorithm, the editor uses the default values, that is, the vector of zero means and the unitary matrix as a variance-covariance matrix. You can also pass
prior
parameters for
μ
and
Σ
into the library:
μ
0
,
τ
,
m
, and
Λ
-1
. Pack these parameters as a one-dimensional array
prior
with a size of at least
(
p
2
+ 3
p
+ 4)/2.
The storage format is as follows:
  • prior
    [0], ...,
    prior
    [
    p
    -1]
    contain the elements of the vector
    μ
    0
    .
  • prior
    [
    p
    ] contains the parameter
    τ
    .
  • prior
    [
    p
    +1]
    contains the parameter
    m
    .
  • The remaining positions are occupied by the upper-triangular part of the inverted matrix
    Λ
    -1
    .
If you provide no
prior
parameters, the editor uses their default values:
  • The array of
    p
    zeros is used as
    μ
    0
    .
  • τ
    is set to 0.
  • m
    is set to
    p
    .
  • The zero matrix is used as an initial approximate of
    Λ
    -1
    .
The
EditMissingValues
editor returns
m
sets of imputed values and/or a sequence of parameter estimates drawn during the DA procedure.
The editor returns the imputed values as the
simul_missing_vals
array. The size of the array should be sufficient to hold
m
sets each of the
missing_vals_num
size, that is, at least
m
*
missing_vals_num
in total. The editor packs the imputed values one by one in the order of their appearance in the matrix of observations.
For example, consider a task of dimension 4. The total number of observations
n
is 10. The second observation vector misses variables 1 and 2, and the seventh observation vector lacks variable 1. The number of sets to impute is
m
=2. Then,
simul_missing_vals
[0]
and
simul_missing_vals
[1]
contains the first and the second points for the second observation vector, and
simul_missing_vals
[2]
holds the first point for the seventh observation. Positions 3, 4, and 5 are formed similarly.
To estimate convergence of the DA algorithm and choose a proper value of the number of DA iterations, request the sequence of parameter estimates that are produced during the DA procedure. The editor returns the sequence of parameters as a single array. The size of the array is
m
*
da_iter_num
*(
p
+(
p
2
+
p
)/2)
where
  • m
    is the number of sets of values to impute.
  • da_iter_num
    is the number of DA iterations.
  • The value
    p
    +(
    p
    2
    +
    p
    )/2
    determines the size of the memory to hold one set of the parameter estimates.
In each set of the parameters, the vector of means occupies the first
p
positions and the remaining
(
p
2
+
p
)/2
positions are intended for the upper triangular part of the variance-covariance matrix.
Upon successful generation of
m
sets of imputed values, you can place them in cells of the data matrix with missing values and use the Summary Statistics routines to analyze and get estimates for each of the
m
complete datasets.
Intel® MKL
implementation of the MI algorithm rewrites cells of the dataset that contain the
VSL_SS_SNAN/VSL_SS_DNAN
values. If you want to use the Summary Statistics routines to process the data with missing values again, mask the positions of the empty cells.
See additional details of the algorithm usage model in the
Intel® MKL
Summary Statistics Application Notes
document [SS Notes].
1

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reservered for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804