Get Started with Intel® oneAPI oneAPI Data Analytics Library

ID 772405
Date 4/11/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Get Started with the Intel® oneAPI Data Analytics Library

Intel® oneAPI Data Analytics Library (oneDAL) is a library that helps speed up big data analysis by providing highly optimized algorithmic building blocks for all stages of data analytics (preprocessing, transformation, analysis, modeling, validation, and decision making) in batch, online, and distributed processing modes of computation.

For general information about oneDAL, visit oneDAL official page.

Before You Begin

oneDAL is located in <install_dir>/dal directory where <install_dir> is the directory in which Intel® oneAPI Base Toolkit was installed.

The current version of oneDAL with SYCL support is available for Linux* and Windows* 64-bit operating systems. The prebuilt oneDAL libraries can be found in the <install_dir>/dal/<version>/redist directory.

To learn about the system requirements and the dependencies needed to build examples, refer to the System Requirements page.

End-to-end Example

Below you can find a typical usage workflow for a oneDAL algorithm on GPU. The example is provided for Principal Component Analysis algorithm (PCA).

The following steps depict how to:

  • Read the data from CSV file

  • Run the training and inference operations for PCA

  • Access intermediate results obtained at the training stage

  1. Include the following header that makes all oneDAL declarations available.

    #include "oneapi/dal.hpp"
    
    /* Standard library headers required by this example */
    #include <cassert>
    #include <iostream>
  2. Create a SYCL* queue with the desired device selector. In this case, GPU selector is used:

    const auto queue = sycl::queue{ sycl::gpu_selector{} };
  3. Since all oneDAL declarations are in the oneapi::dal namespace, import all declarations from the oneapi namespace to use dal instead of oneapi::dal for brevity:

    using namespace oneapi;
  4. Use CSV data source to read the data from the CSV file into a table:

    const auto data = dal::read<dal::table>(queue, dal::csv::data_source{"data.csv"});
  5. Create a PCA descriptor, configure its parameters, and run the training algorithm on the data loaded from CSV.

    const auto pca_desc = dal::pca::descriptor<float>
       .set_component_count(3)
       .set_deterministic(true);
    
    const dal::pca::train_result train_res = dal::train(queue, pca_desc, data);
  6. Print the learned eigenvectors:

    const dal::table eigenvectors = train_res.get_eigenvectors();
    
    const auto acc = dal::row_accessor<const float>{eigenvectors};
    for (std::int64_t i = 0; i < eigenvectors.row_count(); i++) {
    
       /* Get i-th row from the table, the eigenvector stores pointer to USM */
       const dal::array<float> eigenvector = acc.pull(queue, {i, i + 1});
       assert(eigenvector.get_count() == eigenvectors.get_column_count());
    
       std::cout << i << "-th eigenvector: ";
       for (std::int64_t j = 0; j < eigenvector.get_count(); j++) {
          std::cout << eigenvector[j] << " ";
       }
       std::cout << std::endl;
    }
  7. Use the trained model for inference to reduce dimensionality of the data:

    const dal::pca::model model = train_res.get_model();
    
    const dal::table data_transformed =
       dal::infer(queue, pca_desc, data).get_transformed_data();
    
    assert(data_transformed.column_count() == 3);

Build and Run Examples

Perform the following steps to build and run examples demonstrating the basic usage scenarios of oneDAL with SYCL support. Go to <install_dir>/dal/<version> and then set up an environment as shown in the example below:

NOTE:
All content below that starts with # is considered a comment and should not be run with the code.
  1. Set up the required environment for oneDAL (variables such as CPATH, LIBRARY_PATH, and LD_LIBRARY_PATH):

    • On Linux, there are two possible ways to set up the required environment: via vars.sh script or via modulefiles.

      • Setting up oneDAL environment via vars.sh script

        Run the following command:

        source ./env/vars.sh
      • Setting up oneDAL environment via modulefiles

        1. Initialize modules:

          source $MODULESHOME/init/bash
          NOTE:
          Refer to Environment Modules documentation for details.
        2. Provide modules with a path to the modulefiles directory:

          module use ./modulefiles
        3. Run the module:

          module load dal
    • On Windows, run the following command:

      /env/vars.bat
  2. Copy ./examples/oneapi/dpc to a writable directory if necessary (since it creates temporary files):

    cp –r ./examples/oneapi/dpc ${WRITABLE_DIR}
  3. Set up the compiler environment for Intel® oneAPI DPC++/C++ Compiler. See Get Started with Intel® oneAPI DPC++/C++ Compiler for details.

  4. Build and run the examples that show how to use oneDAL with SYCL support:

    NOTE:
    You need to have write permissions to the examples folder to build examples, and execute permissions to run them. Otherwise, you need to copy examples/oneapi/dpc and examples/oneapi/data folders to the directory with right permissions. These two folders must be retained in the same directory level relative to each other.
    • On Linux:

      # Navigate to the directory containing examples and then build them:
      cd /examples/oneapi/dpc
      make so example=svm_two_class_thunder_dense_batch # This will compile and run Correlation example using Intel(R) oneAPI DPC++/C++ Compiler
      make so mode=build                         # This compiles all examples in the current directory
    • On Windows:

      # Navigate to the directory containing examples and then build them:
      cd /examples/oneapi/dpc
      nmake dll example=svm_two_class_thunder_dense_batch+ # This will compile and run Correlation example using Intel(R) oneAPI DPC++/C++ Compiler
      nmake dll mode=build                         # This compiles all examples in the current directory

    To see all available parameters of the build procedure, type make on Linux* or nmake on Windows*.

  5. The resulting example binaries and log files are written into the _results directory.

    NOTE:
    You should run the examples from examples/oneapi/dpc folder, not from _results folder. Most examples require data to be stored in examples/oneapi/data folder and to have a relative link to it started from examples/oneapi/dpc folder.

    You can build traditional C++ examples located in examples/oneapi/cpp folder in a similar way.

Compile and build applications with pkg-config

The pkg-config tool is a widely used tool for building software with dependencies. Intel® oneAPI Data Analytics Library provides files with pkg-config metadata for compiling and linking an application to the library.

Set up the environment

To use pkg-config, build the library and then set up the environment using vars.sh or vars.bat scripts:

  • On Linux: source ./env/vars.sh

  • On Windows: /env/vars.bat

Choose a metadata file

The metadata files provided by oneDAL cover only host device configuration on 64-bit Linux, macOS, or Windows operating system for C++.

Choose the metadata file based on oneDAL threading mode and linking method you use:

oneDAL pkg-config metadata files
 

Single-threaded (non-threaded)

Multi-threaded (internally threaded)

Static linking

dal-static-sequential-host

dal-static-threading-host

Dynamic linking

dal-dynamic-sequential-host

dal-dynamic-threading-host

Compile a program using pkg-config

To compile a test.cpp program with oneDAL and pkg-config, provide the name of the oneDAL pkg-config metadata file as an input parameter. For example:

  • On Linux or macOS:

    icc test.cpp pkg-config --cflags --libs dal-dynamic-threading-host
  • On Windows:

    for /F "delims=," %i in ('pkg-config --cflags --libs dal-dynamic-threading-host) do icl test.cpp %i

A sample code for svm_two_class_thunder_dense_batch example with SYCL support. Run the following from the examples/oneapi/cpp directory:

  • On Linux or macOS:

    icc -I source/ source/svm/svm_two_class_thunder_dense_batch.cpp icc test.cpp pkg-config --cflags --libs dal-dynamic-threading-host
  • On Windows:

    for /F "delims=," %i in ('pkg-config --cflags --libs dal-dynamic-threading-host) do icl -I source/ icl svm_two_class_thunder_dense_batch.cpp %i

Find More

Document

Description

Developer Guide and Reference

Refer to oneDAL Developer Guide and Reference for detailed information about implemented algorithms.

System Requirements

Check system requirements before you install Intel® oneAPI Data Analytics Library.

Release Notes

Refer to release notes for Intel® oneAPI Data Analytics Library to learn about new updates in the latest release.

Code Samples

Learn how to use oneDAL with daal4py, a Python* API.

oneDAL Specification

Learn about requirements for implementations of oneAPI Data Analytics Library.

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.