how do I use this library with R

how do I use this library with R

Hi DAAL team,

would you teach me how do I use this library with R?

--Gennady

 

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Here is the instruction on how to extend R by writing a wrapper for an Intel® Data Analytics Acceleration Library (Intel DAAL) function and call it from your R script.

The wrapper

For this exercise we'll choose the covariance matrix computation algorithm that takes matrix X and computes the covariance matrix Cov(X).

Here is the code of the wrapper:

/* file: daal_cov.cpp */
#include "R.h"
#include "Rdefines.h"
#include "Rinternals.h"

#include "daal.h"

using namespace daal;
using namespace daal::data_management;
using namespace daal::algorithms;
using namespace daal::services;

extern "C"
{
/*
//  Compute covariance matrix using Intel DAAL
*/
SEXP daal_cov(SEXP X)
{
    SEXP COV;
    SEXP DIM;

    int *dim;
    int nFeatures, nObservations;
    double *x, *mean, *cov;

    /* Check input argument */
    if (!isMatrix(X))
    {
        error("x is not a matrix");
        return (R_NilValue);
    }

    /* Convert input arguments to C types */
    PROTECT(X = AS_NUMERIC(X));
    x = NUMERIC_POINTER(X);
    
    /* Get input matrix size */
    DIM = getAttrib(X, R_DimSymbol);
    PROTECT(DIM = AS_INTEGER(DIM));
    dim = INTEGER_POINTER(DIM);

    nObservations = dim[0];
    nFeatures     = dim[1];

    /* Create structure-of-arrays (SOA) numeric table to store input data */
    SharedPtr<SOANumericTable> dataTable(new SOANumericTable(nFeatures, nObservations));
    for (int i = 0; i < nFeatures; i++)
    {
        dataTable->setArray(x + i*nObservations, i);
    }

    /* Allocate memory to store results */
    PROTECT(COV = allocMatrix(REALSXP, nFeatures, nFeatures));
    cov = NUMERIC_POINTER(COV);

    mean = new double[nFeatures];

    /* Create homogeneous numeric tables to store results */
    SharedPtr<HomogenNumericTable<> > covarianceTable (new HomogenNumericTable<>(cov, nFeatures, nFeatures));
    SharedPtr<HomogenNumericTable<> > meanTable       (new HomogenNumericTable<>(mean, nFeatures, 1));

    /* Create algorithm to compute covariance matrix using default method */
    covariance::Batch<> algorithm;
    algorithm.input.set(covariance::data, dataTable);

    /* Create object to store the results of DAAL computations */
    SharedPtr<covariance::Result> result(new covariance::Result());

    /* Provide memory for storing the results of DAAL computations */
    result->set(covariance::covariance, covarianceTable);
    result->set(covariance::mean,       meanTable);

    /* Register the object for storing results in DAAL algorithm */
    algorithm.setResult(result);

    /* Compute covariance matrix */
    algorithm.compute();

    delete [] mean;
    UNPROTECT(3); 

    /* Return covariance matrix */
    return COV;
}

} // extern "C"

Building the wrapper

On Linux it is easily done by installing R packages.
Sequence of commands for Red Hat / Fedora:

sudo yum install R

Sequence of commands for Ubuntu:

sudo apt-get update
sudo apt-get install r-base
sudo apt-get install r-base-dev

I then build shared library with the wrapper function using the following commands:

export R_HOME=/usr/local/lib64/R
export DAAL_ROOT=/opt/intel/compilers_and_libraries_2016.x.xxx/linux

icc -c -fPIC -I${R_HOME}/include -I${DAAL_ROOT}/daal/include daal_cov.cpp -odaal_cov.o
icc -shared -Wl,-soname,daal_cov.so -odaal_cov.so daal_cov.o  \
    -L${DAAL_ROOT}/tbb/lib/intel64_lin/gcc4.4                 \
    ${DAAL_ROOT}/daal/lib/intel64_lin/libdaal_thread.so       \
    ${DAAL_ROOT}/daal/lib/intel64_lin/libdaal_core.so         \
    -ltbb -liomp5

 

Calling the function from R

Here then is a simple R script that calls the function we've just built:

# Load shared library with the wrapper of the DAAL covariance function
dyn.load("~/daal_cov.so")
# Create R function that calls the wrapper function from the shared library
daal_cov <- function(x) .Call("daal_cov", x)
p <- 3
n <- 5
x <- matrix(runif(p*n, min=0, max=10), nrow=n, ncol=p)
# Compute covariance matrix using DAAL
covariance_daal <- daal_cov(x)
# Compute covariance using built-in R function
covariance_r    <- cov(x)

print(x)
print(covariance_daal)
print(covariance_r)

Here is the output of the script:

> print(x)
         [,1]     [,2]     [,3]
[1,] 8.301148 5.468456 1.919926
[2,] 7.521658 6.787216 9.090895
[3,] 1.863535 6.558717 8.866953
[4,] 6.110760 1.765668 1.395358
[5,] 2.891390 6.273947 9.276443
> print(covariance_daal)
          [,1]      [,2]      [,3]
[1,]  8.050892 -1.435147 -6.718956
[2,] -1.435147  4.309892  6.736175
[3,] -6.718956  6.736175 16.574363
> print(covariance_r)
          [,1]      [,2]      [,3]
[1,]  8.050892 -1.435147 -6.718956
[2,] -1.435147  4.309892  6.736175
[3,] -6.718956  6.736175 16.574363

 

Here is the instruction on how to extend R by writing a wrapper for K-means algorithm from Intel® Data Analytics Acceleration Library (Intel® DAAL).
 

The wrapper:
 

#include "R.h"
#include "Rdefines.h"
#include "Rinternals.h"
#include "daal.h"

using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace daal::services;

extern "C"
{
/*
//  Compute K-Means clustering using Intel DAAL
//
//  Input:
//  X     - input matrix
//  K     - number of clusters, the parameters of K-means algorithm
//  NITER - number of iterations of Lloyd's algorithm
*/
SEXP daal_kmeans(SEXP X, SEXP K, SEXP NITER)
{
    /* Check input argument */
    if (!isMatrix(X))
    {
        error("x is not a matrix");
        return (R_NilValue);
    }

    /* Convert input arguments to C types */
    PROTECT(X = AS_NUMERIC(X));
    PROTECT(K = AS_INTEGER(K));
    PROTECT(NITER = AS_INTEGER(NITER));
    double *x = NUMERIC_POINTER(X);
    int nCentroids = *INTEGER_POINTER(K);
    int nIterations = *INTEGER_POINTER(NITER);

    /* Get input matrix dimensions */
    SEXP DIM = getAttrib(X, R_DimSymbol);
    PROTECT(DIM = AS_INTEGER(DIM));
    int *dim = INTEGER_POINTER(DIM);

    int nObservations = dim[0];
    int nFeatures     = dim[1];

    /* Create structure-of-arrays (SOA) numeric table to store input data */
    SharedPtr<SOANumericTable> dataTable(new SOANumericTable(nFeatures, nObservations));
    for (int i = 0; i < nFeatures; i++)
    {
        dataTable->setArray(x + i*nObservations, i);
    }

    /* Get initial centroids for the K-Means algorithm */
    kmeans::init::Batch<double, kmeans::init::randomDense> init(nCentroids);

    init.input.set(kmeans::init::data, dataTable);
    init.compute();

    NumericTablePtr inputCentroidsTable = init.getResult()->get(kmeans::init::centroids);

    /* Allocate memory to store the results */
    SEXP CENTROIDS;
    PROTECT(CENTROIDS = allocMatrix(REALSXP, nCentroids, nFeatures));
    double *centroids = NUMERIC_POINTER(CENTROIDS);
    SEXP ASSIGNMENTS;
    PROTECT(ASSIGNMENTS = allocVector(INTSXP, nObservations));
    int *assignments = INTEGER_POINTER(ASSIGNMENTS);
    SEXP GOAL;
    PROTECT(GOAL = NEW_NUMERIC(1));
    double *goal = NUMERIC_POINTER(GOAL);
    SEXP RESULTING_N_ITERATIONS;
    PROTECT(RESULTING_N_ITERATIONS = NEW_INTEGER(1));
    int *resNIterations = INTEGER_POINTER(RESULTING_N_ITERATIONS);

    SEXP RES;
    PROTECT(RES = allocVector(VECSXP, 4));
    SET_VECTOR_ELT(RES, 0, CENTROIDS);
    SET_VECTOR_ELT(RES, 1, ASSIGNMENTS);
    SET_VECTOR_ELT(RES, 2, GOAL);
    SET_VECTOR_ELT(RES, 3, RESULTING_N_ITERATIONS);

    /* Create SOA numeric table to store resulting centroids */
    SharedPtr<SOANumericTable> centroidsTable(new SOANumericTable(nFeatures, nCentroids));
    for (int i = 0; i < nFeatures; i++)
    {
        centroidsTable->setArray(centroids + i*nCentroids, i);
    }

    /* Create homogeneous numeric tables to store resulting assignments, goal function value
       and the number of iterations performed by the algorithm */
    NumericTablePtr assignmenstTable(new HomogenNumericTable<int>(assignments,    1, nObservations));
    NumericTablePtr goalTable       (new HomogenNumericTable<>   (goal,           1, 1));
    NumericTablePtr nIterationsTable(new HomogenNumericTable<int>(resNIterations, 1, 1));

    /* Create algorithm to compute K-means clustering results using default method */
    kmeans::Batch<> algorithm(nCentroids, nIterations);
    algorithm.input.set(kmeans::data,           dataTable);
    algorithm.input.set(kmeans::inputCentroids, inputCentroidsTable);

    /* Create object to store results of DAAL computations */
    SharedPtr<kmeans::Result> result(new kmeans::Result());

    /* Provide memory for storing results of DAAL computations */
    result->set(kmeans::centroids,    centroidsTable);
    result->set(kmeans::assignments,  assignmenstTable);
    result->set(kmeans::goalFunction, goalTable);
    result->set(kmeans::nIterations,  nIterationsTable);

    /* Register the object for storing results in DAAL algorithm */
    algorithm.setResult(result);

    /* Compute K-Means clustering */
    algorithm.compute();

    UNPROTECT(9);
    return RES;
}

} // extern "C"

Build the shared library from the wrapper code by executing the same commands as in the covariance example above.

Calling the function from R

Here then is a simple R script that calls the function we've just built:

# Load shared library with the wrapper of the DAAL K-Means function
dyn.load("~/daal_kmeans.so")
# Create R function that calls the wrapper function
daal_kmeans <- function(x, k, nIter) .Call("daal_kmeans", x, k, nIter)

data(iris)
nClusters <- 3
nIterations <- 10

cl <- daal_kmeans(as.matrix(iris[,1:4]), nClusters, nIterations)
print(cl)

Here is the output of the script:

> print(cl)
[[1]]
         [,1]     [,2]     [,3]     [,4]
[1,] 5.883607 2.740984 4.388525 1.434426
[2,] 5.006000 3.428000 1.462000 0.246000
[3,] 6.853846 3.076923 5.715385 2.053846

[[2]]
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [75] 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 2 2 2 2 0 2 2 2 2
[112] 2 2 0 0 2 2 2 2 0 2 0 2 0 2 2 0 0 2 2 2 2 2 0 2 2 2 2 0 2 2 2 0 2 2 2 0 2
[149] 2 0

[[3]]
[1] 78.85567

[[4]]
[1] 10

The first value in the output list is the matrix of cluster centers found by the algorithm: first row of the matrix is the center of the first cluster, second row - the center of the second cluster and so on.

The second value in the output list is the vector of cluster assignments for each observation in the input data set.

The third value in the output list is the achieved value of goal function, within-cluster sum of squares.

The fourth value in the output list is the number of iterations performed by the algorithm.

Is this example based on the 2016 version of daal in the parallel studio package? I am having trouble locating NumericTablePtr. Kmeans examples in the daal directory do not seem to use this storage object. Thanks.

Hello Steena,

This example is based on Intel® DAAL 2017 Update 1.

If you use Intel® DAAL 2016 please add the following line of code into the example:

typedef SharedPtr<NumericTable> NumericTablePtr;

Best regards,

Victoriya

Leave a Comment

Please sign in to add a comment. Not a member? Join today