Intel® oneAPI Collective Communications Library(Beta)

Implement Multi-Node Communication Patterns

The Intel® oneAPI Collective Communications Library enables developers and researchers to more quickly train newer and deeper models. This is done by using optimized communication patterns to distribute model training across multiple nodes.

The library is designed for easy integration into deep learning (DL) frameworks, whether you are implementing them from scratch or customizing existing ones.

  • Built on top of lower-level communication middleware. MPI and libfabrics transparently support many interconnects, such as Intel® Omni-Path Architecture, InfiniBand*, and Ethernet.
  • Optimized for high performance on Intel® CPUs and GPUs.
  • Allows the tradeoff of compute for communication performance to drive scalability of communication patterns.
  • Enables efficient implementations of collectives that are heavily used for neural network training, including all-gather, all-reduce, and reduce-scatter.

Develop, Test, and Run Your oneAPI Code in the Cloud

Get what you need to build and optimize your oneAPI projects for free. With an Intel® DevCloud account, you get 120 days of access to the latest Intel® hardware—CPUs, GPUs, FPGAs—and Intel oneAPI tools and frameworks. No software downloads. No configuration steps. No installations.

Get Access

IconDownload Intel oneAPI Collective Communications Library as Part of the Intel® oneAPI Base Toolkit

Get It Now


Common APIs to Support DL Frameworks

This library exposes a collective API that supports:

  • Commonly used collective operations found in deep learning and machine learning workloads
  • Interoperability with OpenCL™ APIs and SYCL* from The Khronos Group*

Unique DL Optimizations

The runtime implementation enables several optimizations that are unavailable in MPI and other communication libraries, including:

  • Asynchronous progress for compute communication overlap
  • Dedication of one or more cores to ensure optimal network use
  • Message prioritization, persistence, and out-of-order execution
  • Collectives in low-precision data types

Key Specifications


  • Intel® Xeon® processors


  • Intel® Processor Graphics Gen9


  • Dynamic RAM
  • Intel® Optane® DC persistent memory

Host operating systems:

  • Linux*

Target operating systems:

  • Linux


  • Data Parallel C++ (DPC++)
  • C and C++


  • GNU Compiler Collection (GCC)*

Distributed environments:

  • MPI (MPICH-based, Open MPI)
  • Libfabrics

For more information, see the system requirements.

Ready to Get Started?

Get the Intel® oneAPI Base Toolkit  |  Try Your Code in the Intel® DevCloud