Processing Data in Blocks

 

Summary Statistics enables block-based data analysis that can help you:

  1. compute statistical estimates for out-of-memory datasets, splitting them into blocks

  2. analyze in-memory data arrays that become available block by block

  3. tune your applications for out-of-memory data support

To compute statistical estimates for out-of-memory datasets, do the following:

  1. Set the estimates of your interest to zero, or to any other meaningful value:

    for( i = 0; i < p; i++ )
    {
         Xmean[i] = 0.0;
         Raw2Mom[i] = 0.0;
         Central2Mom[i] = 0.0;
         for(j = 0; j < p; j++)
         {
             Cov[i][j] = 0.0;
         }
    }
    
  2. Initialize array W of size 2 with zero values.

    This array holds accumulated weights that are important for correct computation of the estimates:

    W[0] = 0.0; W[1] = 0.0;
    
  3. Get the first portion of the dataset into array X, and the corresponding weights into array weights:

    GetNextDataChunk( X, weights );
    
  4. Follow the common usage model of the Summary Statistics algorithms:

    /* Create a task */
    xstorage = VSL_SS_MATRIX_STORAGE_COLS;
    errcode = vsldSSNewTask( &task, &p, &nblock,
                             &xstorage, X, weights, indices );
     
    /* Edit the task parameters */
    errcode = vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W );
    errcode = vsldSSEditTask( task, VSL_SS_ED_VARIATION, Variation );
    errcode = vsldSSEditMoments( task, Xmean, Raw2Mom, 0, 0, Central2Mom, 0, 0 );
     
    covstorage = VSL_SS_MATRIX_STORAGE_FULL;
    errcode = vsldSSEditCovCor( task, Xmean, cov, &covstorage, 0, 0 );
     
    /* Compute the estimates for the dataset split into chunks */
    estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION;
    for( nchunk = 0;  nchunk++; )
         errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD );
         If ( nchunk >= N ) break;
         GetNextDataChunk( X, weights );
    }
     
    /* Deallocate task resources */
    errcode = vslSSDeleteTask( &task );
    

Summary statistics domain also enables reading the next data block into a different array. The whole computation scheme remains the same. You just need to provide the address of this data block to the library:

double* NextXChunk[N];
estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION;
for( nchunk = 0; nchunk++; )
{
     errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD );
     If ( nchunk >= N ) break;
     GetNextDataChunk( NextXChunk, [nchunk], weights );
     errcode = vsldSSEditTask( task, VSL_SS_ED_OBSERV, NextXChunk,[nchunk] );
}

For the list of estimators that support processing datasets in blocks, see Table VSL Summary Statistics Estimates Obtained with Compute Routine in the Summary Statistics section of [MKLMan].

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

有关编译器优化的更完整信息,请参阅优化通知