Working with Streamed Data in Intel Atom Processor Based Intelligent Systems



One common use case for a variety of embedded workloads is the processing of different types of streamed data. In other words, you are quickly acquiring large vectors of data and need to do some sort of processing of the data. In the case of streamed data, you might be calling a transcendental function on each individual data element, or you might be generating summary statistics. Both of these types of operations can be summed up as Vector Math and Vector Statistics, respectively, which are useful components out of the Intel Math Kernel Library that can be used with Intel Atom™ Processor Based Intelligent Systems.  

You use the Vector Math and Vector Statistics Libraries when you are working with large blocks of data and need those operations done on every individual element. So instead of setting up a 'for' loop that does something to each element, you can use the library to work at a whole vector level.



Here is a list of each of the Vector Math Library real and complex functions. You have the choice of single or double precision along with choosing what sort of accuracy suits your workload. The default is High Accuracy which is still much faster than writing the functions out yourself, since Intel MKL takes advantage of both vectorization and threading for you. For around 30 to 50 percent greater performance, you can choose Low Accuracy or Enhanced Performance modes.



Here is a good taxonomy of various “sweet spots” for vector math precision and accuracy. Double precision, low accuracy is sufficient for the majority of streamed data applications out there, but if you are looking at benchmarking or acceptance tests, you can always switch to high accuracy. For mobile devices, single precision enhanced performance is a great fit.

So how do you call the Vector Math Library?

v{s,d}Exp( n, input_array, output_array )

You start with v, then your precision level, then the type of real or complex function you would like to operate over your vector.

vmlSetMode( VML_EP )

If you would like to override the defaults, you can now change the accuracy mode through a function and that will cascade throughout all vector math functions you call unless you deem otherwise. We have samples included for generating your arguments and passing them properly. All you have to do is switch out the data you want to work with.

Our best recommendation is to start with Enhanced Performance mode first and see if your application qualitatively changes. Enhanced performance has the most screaming performance and is the most common for most applications. Another fact of working with these small vectors is that while each of the functions are threaded, if your vector is too small, threading may not be effective. So we recommend threading at the higher level first and relying on the good vectorization that calling these VML functions bring.

So now let’s go into the Vector Statistics Library which includes Random Number Generators and Summary Statistics. A lot of people don’t realize that you can simply feed in a vector and get a wide variety of useful statistical information from it using our library. Sometimes writing out those statistical functions yourself can be cumbersome to get the math operations working correctly on your data structure, and even harder then to get them vectorized and threaded. We also have convolution and correlation functions that you can call. In calling those convolutions, you can focus on the result of your transformations rather than how you can get all those for loops optimized. It saves a lot of time.



Here is a list of our random number generation capabilities. We have all the most major and recent ones already implemented, and we constantly use the latest techniques published in research papers to code them up. You can also choose from a wide variety of distributions. If something isn’t covered, let us know and we will look into putting an optimized version in the product.

#include “mkl_vsl.h”
#define N      1000                      /* Vector size       */
#define SEED    777                      /* Seed for BRNG     */
#define BRNG   VSL_BRNG_MT19937          /* VSL BRNG          */
#define METHOD VSL_METHOD_DGAUSSIAN_ICDF /* Generation method */
main()
{
    double r[N], a = 0, sigma = 1.0;
    VSLStreamStatePtr stream;
    int errcode;
    errcode = vslNewStream( &stream, BRNG,  SEED );            /* Initialize random stream    */
    errcode = vdRngGaussian( METHOD, stream, N, r, a, sigma ); /* Call Gaussian Generator     */
    errcode = vslDeleteStream( &stream );                      /* De-initialize random stream */ 
    …
}


In the code above, we create the stream, pick the RNG method, then delete the stream. There’s many other things you can do for advanced features and more control, but we will leave that for another discussion.

The other part of the Vector Statistics Library is Summary Statistics. You feed in a vector and can get back a wide variety of estimates. Of course these routines and threaded and vectorized, so all you focus on is your end result analysis.



The easiest way to work with VSL is to first understand how your data is being streamed in an how you will store it. So you need to figure out how many elements you are working with, known as the observations, and then how many dimensions. You place everything inside of a task object, modify the task parameters as you see fit, then decide what statistical estimates you would like done on the data. Each estimate will be stored in a separate structure that you can use later.

#include “mkl.h”
    #define DIM      3       /* Task dimension */
    #define N        10000   /* Number of observations */
    
    int main()
    {
       VSLSSTaskPtr task;
       MKL_INT dim, n, x_storage, cov_storage cor_storage;
       double x[N*DIM], cov[DIM*DIM], cor[DIM*DIM], mean[DIM];
       …  
       vsldSSNewTask( &task, &dim, &n, &x_storage, x, 0, 0 );                /* Create a task */    
       vsldSSEditCovCor( task, mean, cov, &cov_storage, cor, &cor_storage ); /* Modify the task parameters */
       vsldSSCompute( task, VSL_SS_COV|VSL_SS_COR, VSL_SS_METHOD_FAST );     /* Compute statistical estimates */
       vslSSDeleteTask( &task );                                             /* Destroy the task */
    }


The same streamed data can also take advantage of our convolution and correlation functions. Naturally, you can work with real and complex data with different precision levels. We offer dimensions up to seven and the most common algorithms are well covered.

The convolution and correlation functions work very much like the summary statistics library functions. You create your task, set the parameters on the task, execute your task on the data by calling the convolution or correlation function. After you delete the task you are left with your desired output vector.

Here is an example that ties the vector math and statistics library together. Suppose you have a constant flow of data, and you want to investigate any particular dependencies on that data for a given time block, so you are incrementally filtering the data.



This looks a bit complicated, but it actually isn’t. The data comes in, and you break it up into chunks. The key kernel of this whole workflow is the filter that will separate the signal for the noise component. You want to update a correlation matrix using the latest sunch, then split the analysis into the two mentioned groups. You could spend a great deal of time figuring out the math on how to do this for your particular situation, and even more time figuring out how to get it to run vectorized and threaded. With this library you can focus on the end result and the functions used to get to that end result. Then you can move onto the further analysis and be done with it.

#define P 450  /* dimension 1 */
#define M 1000 /* dimension 2 */ 
…
VSLSSTaskPtr task;
double x[P*M], cor[P*P], W[2];
MKL_INT p, m, x_storage, cor_storage;

/* Initialize VSL Summary Stats task */
P = P; m = M;
x_storage = VSL_SS_MATRIX_STORAGE_COLS;
vsldSSNewTask( &task, &n, &m, x, &x_storage, 0,0 );

/* Set-up parameters of the task */
/* Specify memory for correlation estimate in task */
cor_storage = VSL_SS_MATRIX_STORAGE_FULL;
vsldSSEditCovCor( task, mean, 0, 0, cor, cor_storage );

/* Specify the parameter for progressive estimation of
   correlation */
W[0] = W[1] = 0.0;
vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W );
…

/* set threshold that define noise component */
l1 = ( 1.0 – sqrt ( p / m ) );
l2 = ( 1.0 + sqrt ( p / m ) );
/* loop over data blocks */
for ( nblock = 0; ; nblock ++ )
{
    /* Get the next chunk of size p x m into x */
    GetNextChunk( p, m, x );

    /* Update correlation estimate in cor  */
    vsldSSCompute( task, VSL_SS_COR,
                          VSL_SS_METHOD_FAST );
    /* Apply PCA and compute eigen-values that
       belong to (l1, l2) and define noise */       
       dsyevr(…,l1, l2, …);
 
    /* Assembly correlation matrix of noise */
    ... 
    dsyrk( evect_n, ..., cor_n,... );
    /* compute correlation matrix of signal 
       by substracting cor_n from cor */    

vslSSDeleteTask( task );
MKL_Free_Buffers();
 …

} 

Here is a bit of sample code for demonstration. Notice that you first set up the environment from which to acquire and store your data. Next, you set up the task that you want to do. So in this situation, we initialize the statistics task that we want to do and get our correlation memory allocated. Since this is streamed in data, we want to grab chunks as we get them an create an evolving progression. You can set your threshold as you see fit according ot l1 and l2 and begin looping over each set of blocks. Notice that you can make these blocks as large as small as you would like. So the workflow is as follows. Grab a chunk, update the estimate, store it, and from there, you can break up your signal and noise as you see fit.

So in summary, if you are working in an Atom-based environment, you can use the Intel Math Kernel Library for both Vector Math and Vector Statistics, particularly when you are working with a constant stream of data that needs constant updating. You can focus more on your workflow and less on how to write SSSE3 and threaded code for the Atom processor. Our routines will do all that for you.

Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.