Get Started with Intel® oneAPI Collective Communications Library

ID 772605
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Get Started with Intel® oneAPI Collective Communications Library

Intel® oneAPI Collective Communications Library (oneCCL) is a scalable and high-performance communication library for Deep Learning (DL) and Machine Learning (ML) workloads. It develops the ideas originated in Intel(R) Machine Learning Scaling Library and expands the design and API to encompass new features and use cases.

Before You Begin

Before you start using oneCCL, make sure to set up the library environment. There are two ways to set up the environment:

  • Using standalone oneCCL package installed into <ccl_install_dir>:

    source <ccl_install_dir>/env/setvars.sh
  • Using oneCCL from Intel® oneAPI Base Toolkit installed into <toolkit_install_dir> (/opt/intel/inteloneapi by default):

    source <toolkit_install_dir>/setvars.sh

System Requirements

Refer to the oneCCL System Requirements page.

Sample Application

The sample code below shows how to use oneCCL API to perform allreduce communication for SYCL USM memory.

#include <iostream>
#include <mpi.h>
#include "oneapi/ccl.hpp"

void mpi_finalize() {
    int is_finalized = 0;
    MPI_Finalized(&is_finalized);

    if (!is_finalized) {
        MPI_Finalize();
    }
}

int main(int argc, char* argv[]) {
    constexpr size_t count = 10 * 1024 * 1024;

    int size = 0;
    int rank = 0;

    ccl::init();

    MPI_Init(nullptr, nullptr);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    atexit(mpi_finalize);

    sycl::default_selector device_selector;
    sycl::queue q(device_selector);
    std::cout << "Running on " << q.get_device().get_info<sycl::info::device::name>() << "\n";

    /* create kvs */
    ccl::shared_ptr_class<ccl::kvs> kvs;
    ccl::kvs::address_type main_addr;
    if (rank == 0) {
        kvs = ccl::create_main_kvs();
        main_addr = kvs->get_address();
        MPI_Bcast((void*)main_addr.data(), main_addr.size(), MPI_BYTE, 0, MPI_COMM_WORLD);
    }
    else {
        MPI_Bcast((void*)main_addr.data(), main_addr.size(), MPI_BYTE, 0, MPI_COMM_WORLD);
        kvs = ccl::create_kvs(main_addr);
    }

    /* create communicator */
    auto dev = ccl::create_device(q.get_device());
    auto ctx = ccl::create_context(q.get_context());
    auto comm = ccl::create_communicator(size, rank, dev, ctx, kvs);

    /* create stream */
    auto stream = ccl::create_stream(q);

    /* create buffers */
    auto send_buf = sycl::malloc_device<int>(count, q);
    auto recv_buf = sycl::malloc_device<int>(count, q);

    /* open buffers and modify them on the device side */
    auto e = q.submit([&](auto& h) {
        h.parallel_for(count, [=](auto id) {
            send_buf[id] = rank + id + 1;
            recv_buf[id] = -1;
        });
    });

    int check_sum = 0;
    for (int i = 1; i <= size; ++i) {
        check_sum += i;
    }

    /* do not wait completion of kernel and provide it as dependency for operation */
    std::vector<ccl::event> deps;
    deps.push_back(ccl::create_event(e));

    /* invoke allreduce */
    auto attr = ccl::create_operation_attr<ccl::allreduce_attr>();
    ccl::allreduce(send_buf, recv_buf, count, ccl::reduction::sum, comm, stream, attr, deps).wait();

    /* open recv_buf and check its correctness on the device side */
    sycl::buffer<int> check_buf(count);
    q.submit([&](auto& h) {
        sycl::accessor check_buf_acc(check_buf, h, sycl::write_only);
        h.parallel_for(count, [=](auto id) {
            if (recv_buf[id] != static_cast<int>(check_sum + size * id)) {
                check_buf_acc[id] = -1;
            }
        });
    });

    q.wait_and_throw();

    /* print out the result of the test on the host side */
    {
        sycl::host_accessor check_buf_acc(check_buf, sycl::read_only);
        size_t i;
        for (i = 0; i < count; i++) {
            if (check_buf_acc[i] == -1) {
                std::cout << "FAILED\n";
                break;
            }
        }
        if (i == count) {
            std::cout << "PASSED\n";
        }
    }

    sycl::free(send_buf, q);
    sycl::free(recv_buf, q);
}

Prerequisites

  • oneCCL with SYCL support is installed and oneCCL environment is set up (see installation instructions)

  • Intel® MPI Library is installed and MPI environment is set up

Run the sample

  1. Use the C++ driver with the -fsycl option to build the sample:

    • Linux* OS

    icpx -fsycl -o sample sample.cpp -lccl -lmpi
    • Windows* OS

    icx-cl -fsycl -o sample sample.cpp -lccl -lmpi
  2. Run the sample:

    mpiexec <parameters> ./sample

where <parameters> represents optional mpiexec parameters such as node count, processes per node, hosts, and so on.

Compile and build applications with pkg-config

The pkg-config tool is widely used to simplify building software with library dependencies. It provides command line options for compiling and linking applications to a library. Intel® oneAPI Collective Communications Library provides pkg-config metadata files for this tool starting with the oneCCL 2021.4 release.

The oneCCL pkg-config metadata files cover both configurations of oneCCL: with and without SYCL support.

Set up the environment

Set up the environment before using the pkg-config tool. To do this, use one of the following options (commands are given for a Linux install to the standard /opt/intel/oneapi location):

  • Intel(R) oneAPI Base Toolkit setvars.sh script:

    source /opt/intel/oneapi/setvars.sh
  • oneCCL setvars.sh script (the prerequisites for this option are listed below):

    source /opt/intel/oneapi/ccl/latest/env/setvars.sh

Prerequisites for the setup with oneCCL setvars.sh

To set up the environment with oneCCL setvars.sh script, you have to install additional dependencies in the environment:

  • Intel® MPI Library (for both configurations of oneCCL: with and without SYCL support)

  • Intel® oneAPI DPC++/C++ Compiler for oneCCL with SYCL support

Compile a program using pkg-config

To compile a test sample.cpp program with oneCCL, run:

icpx -o sample  sample.cpp $(pkg-config --libs --cflags ccl-cpu_gpu_icpx)

--cflags provides the include path to the API directory:

pkg-config --cflags ccl-cpu_gpu_icpx

The output:

-I/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//include/ -I/opt/intel/oneapi/ccl/latest/lib/pkgconfig/../..//include/cpu_gpu_icpx

--libs provides the oneCCL library name, all other dependencies (such as SYCL and MPI), and the search path to find it:

pkg-config --libs ccl-cpu_gpu_icpx

The output:

-L/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//lib/ -L/opt/intel/oneapi/mpi/latest/lib/pkgconfig/../..//lib/release/ -L/opt/intel/oneapi/ccl/latest/lib/pkgconfig/../..//lib/cpu_gpu_icpx -lccl -lsycl -lmpi -lmpicxx -lmpifort

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.