# Dealing with Missing Observations

`sets of simulated missing points that can be imputed into the dataset producingm`

`complete data copies. For each dataset, you can compute a specific statistical estimate. The final estimate is a combination of suchm`

`estimates. For details on computational aspects and usage model of the algorithm, see Support of Missing Values in Matrices of Observations.m`

- The EM algorithm iteratesem_iter_numtimes to compute the initial estimate for the mean and variance-covariance used as the start point of the DA algorithm. The EM algorithm can terminate earlier if it achieves the given accuracyem_accuracy.
- The DA algorithm iteratesda_iter_numtimes. This algorithm uses Gaussian random numbers underneath. For this reason, EMDA algorithm usesVSL_BRNG_MCG59basic random number generator with the pre-defined
and Gaussian distribution generator (ICDF method) available in Intel® MKL.seed = 250

em_iter_num = 10; da_iter_num = 5; em_accuracy = 0.001; copy_num = m; miss_value_num = miss_num; params[0] = em_iter_num; params[1] = da_iter_num; params[2] = em_accuracy; params[3] = copy_num; params[4] = missing_value_num;

errcode = vsldSSEditMissingValues( task, &nparams, params, &init_estimates_n, init_estimates, &prior_n, prior, &simul_missing_vals_n, simul_missing_vals, &estimates_n, estimates );

`positions of the array. The upper-triangular part of the variance-covariance matrix occupies the restp`

`entries, wherep*(p+1)/2`

`is the dimension of the task. Thep`

`m*`

`in total. In each set of the estimates, first* ( p + 0.5 * (p2 + p) )`

`entries hold the mean, and the restp`

`entries hold the upper-triangular part of the variance-covariance matrix.0.5 * (p2 + p)`

errcode = vsldSSCompute( task, VSL_SS_MISSING_VALS, SL_SS_METHOD_MI );

**Example:**

`and the number of observationsp = 10`

`. The dataset is generated from a multivariate Gaussian distribution with the zero mean and a variance-covariance matrix that holds 1 on the main diagonal and 0.05 in other entries. The ratio of missing values in the dataset is 10%. Each observation may have one missing point in any position. The goal is to generaten = 10,000`

`sets of lost observations. The start point for the EM algorithm is the vector of zero means and the identity variance-covariance matrix. The pointer to them=100`

- A trial run of the algorithm withda_iter_num= 10 is performed. The analysis of the estimates in theestimatesarray shows that five iterations are sufficient for the DA algorithm.
- 100 sets of missing values are simulated and imputed into the dataset, producing 100 complete data arrays.
- For each complete dataset, means and variance are computed using Summary Statistics algorithms:

Set: Mean: 1 0.013687 0.005529 0.004011 ... 0.008066 2 0.012054 0.003741 0.006907 ... 0.003721 3 0.013236 0.008314 0.008033 ... 0.011987 ... 99 0.013350 0.012816 0.012942 ... 0.004076 100 0.014677 0.011909 0.005399 ... 0.006457 ___________________________________________________ Average 0.012353 0.005676 0.007586 ... 0.006004 Set: Variance: 1 0.989609 0.993073 1.007031 ... 1.000655 2 0.994033 0.986132 0.997705 ... 1.003134 3 1.003835 0.991947 0.997933 ... 0.997069 ... 99 0.991922 0.988661 1.012045 ... 1.005406 100 0.987327 0.989517 1.009951 ... 0.998941 ________________________________________________ Average 0.99241 0.992136 1.007225 ... 1.000804 Between-imputation variance: 0.000007 0.000008 0.000008 ... 0.000007 Within-imputation variance: 0.000099 0.000099 0.000101 ... 0.000100 Total variance: 0.000106 0.000107 0.000108 ... 0.000108

95% confidence interval: Left boundary of interval: -0.008234 -0.015020 -0.013233 ... -0.014736 Right boundary of interval: +0.032939 +0.026372 +0.028406 ... +0.026744