Developer Reference

Processing Data in Blocks

Summary Statistics enables block-based data analysis that can help you:
  1. compute statistical estimates for out-of-memory datasets, splitting them into blocks
  2. analyze in-memory data arrays that become available block by block
  3. tune your applications for out-of-memory data support
To compute statistical estimates for out-of-memory datasets, do the following:
  1. Set the estimates of your interest to zero, or to any other meaningful value:
    for( i = 0; i < p; i++ ) {      Xmean[i] = 0.0;      Raw2Mom[i] = 0.0;      Central2Mom[i] = 0.0;      for(j = 0; j < p; j++)      {          Cov[i][j] = 0.0;      } }
  2. Initialize array
    of size 2 with zero values.
    This array holds accumulated weights that are important for correct computation of the estimates:
    W[0] = 0.0; W[1] = 0.0;
  3. Get the first portion of the dataset into array
    and the corresponding weights into array
    GetNextDataChunk( X, weights );
  4. Follow the common usage model of the Summary Statistics algorithms:
    /* Create a task */ xstorage = VSL_SS_MATRIX_STORAGE_COLS; errcode = vsldSSNewTask( &task, &p, &nblock,                          &xstorage, X, weights, indices );   /* Edit the task parameters */ errcode = vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W ); errcode = vsldSSEditTask( task, VSL_SS_ED_VARIATION, Variation ); errcode = vsldSSEditMoments( task, Xmean, Raw2Mom, 0, 0, Central2Mom, 0, 0 );   covstorage = VSL_SS_MATRIX_STORAGE_FULL; errcode = vsldSSEditCovCor( task, Xmean, cov, &covstorage, 0, 0 );   /* Compute the estimates for the dataset split into chunks */ estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION; for( nchunk = 0;  nchunk++; )      errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD );      If ( nchunk >= N ) break;      GetNextDataChunk( X, weights ); }   /* Deallocate task resources */ errcode = vslSSDeleteTask( &task );
Summary statistics domain also enables reading the next data block into a different array. The whole computation scheme remains the same. You just need to provide the address of this data block to the library:
double* NextXChunk[N]; estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION; for( nchunk = 0; nchunk++; ) { errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD ); If ( nchunk >= N ) break; GetNextDataChunk( NextXChunk, [nchunk], weights ); errcode = vsldSSEditTask( task, VSL_SS_ED_OBSERV, NextXChunk,[nchunk] ); }
For the list of estimators that support processing datasets in blocks, see Table VS Summary Statistics Estimates Obtained with Compute Routine in the Summary Statistics section of [MKLMan].

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804