Real life datasets can have missing values. Sociological surveys and measurement of complex biological systems are two examples where the researcher can arrive at the point in which he should do something with missing observations. One can also treat outliers in datasets as samples which are also lost. Intel® Summary Statistics Library already contains functionality to detect outliers or get robust estimates in presence of “suspicious” observations.
Algorithm for parameterization of correlation matrix. The algorithm transforms the input which lacks property of positive semidefiniteness into the output meeting properties of correlation matrix. The algorithm is based on spectral decomposition method and can be used in financial computations.
In my previous posts I described some features and usage model of Intel® Statistics Library. However, there are many available statistical packages that provide good similar functionality. Does Intel® Summary Statistics Library deliver difference, bring something new and specific? The answer is yes.
Today it was necessary for me to compute statistical estimates for a dataset. The observations are weighted, and only several components of the random vector had to be analyzed. How often do we solve such tasks and how do we solve them in our every day life? If we meet such problems rarely or their size is small then use of a popular statistical package or development of a data processing program will be a proper way to address the problem. What if I need to process huge data arrays regularly analyzing gene expression levels for example?
Welcome to Intel® Summary Statistics Library, solution for parallel statistical processing of multi-dimensional datasets. It contains functions for initial statistical analysis of raw data which allow investigating structure of datasets and get their basic characteristics, estimates, and internal dependencies.
The library provides rich set of tools intended to compute various statistical estimates for datasets: