Thread Safety on Intel DAAL functions

Thread Safety on Intel DAAL functions


I would like to use Intel DAAL in a shared memory application. I don't want to use the internal Intel DAAL parallelism, but run the Intel DAAL algorithms in custom pthreads in parallel. For example, I want to compute a cholesky kernel. To do so, I would like to "manually" create n threads, all of which share the same FileDataSource object from which they obtain a different block of data of the same file. Then, each thread would partially compute cholesky on its block of data (algorithm.compute()) and when all threads are done, the main thread would finish with (algorithm.finalizeCompute()).

I have not been able to find any specific notes on thread safety for Intel DAAL functions. It is possible? Could you please point me to some best practices?

Thank you.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


The present version of Intel DAAL documentation does not provide many details on the thread safety aspect of the library, we consider to add it in the future releases, thank you.

Answering your question:

- The library provides distributed versions of the algorithms such as PCA, SVD, k-means, Linear Regression, … that can be used  on the computer with shared memory as you describe below.

- You can use the File Data Source to access different blocks of data in parallel to inject them into distributed algorithms

I provide a PCA/csv data source based code sample that demonstrate those ideas


// Preparation for master thread


pca::Distributed<step1Local, algorithmFPType, pca::svdDense> localAlgorithms[nThread];

pca::PartialResultPtr partialResults[nThread];


// Local code for each thread


// Reading the local data

FileDataSource<CSVFeatureManager> dataSource(datasetName);

dataSource.loadDataBlock(nRowsInBlock, iThread*nRowsInBlock, nRowsInBlock);

localAlgorithms[iThread].input.set(pca::data, dataSource.getNumericTable());


//  Compute PCA decomposition


partialResults[iThread] = localAlgorithms[iThread].getPartialResult();


// Master thread code


// Set local partial results as input for the master-node algorithm  from each thread

pca::Distributed<step2Master, algorithmFPType, pca::svdDense> masterAlgorithm;

for(size_t i=0; i<nThread; ++i)

masterAlgorithm.input.add(pca::partialResults, partialResults[i]);

// Merge and finalize PCA decomposition on the master thread





You can find additional examples at

Also, the present version of Intel DAAL Cholesky algorithm does not support distributed computations yet. Please, clarify whether distributed Cholesky is important for your applications

Let us know, if it answer your questions or you need additional details on the library and its components, and we will gladly help.

Hello Egor,

Thank you very much for your example. No, cholesky is not particularly relevant for me, pca is perfectly fine.

I have successfully built and run a PCA application using your guidelines. However, I have encountered the following issues with your code snippet:

  1. "pca::PartialResultPtr" does not exist, I have changed it by "services::SharedPtr<pca::PartialResult<pca::svdDense> > partialResults[nThread];"
  2. "dataSource.loadDataBlock(nRowsInBlock, iThread*nRowsInBlock, nRowsInBlock);" complains for iThread > 0, I changed it for "dataSource.loadDataBlock(nRowsInBlock, iThread*nRowsInBlock, nThread*nRowsInBlock);"

Are the proposed changes correct?

Thank you!

Leave a Comment

Please sign in to add a comment. Not a member? Join today