oneAPI Collective Communications Library Release Notes

By Hung-Ju Tsai, Jennifer L Jiang

Published:12/05/2020   Last Updated:03/11/2021

Overview

The Intel® oneAPI Collective Communications Library (oneCCL) enables developers and researchers to more quickly train newer and deeper models. This is done by using optimized communication patterns to distribute model training across multiple nodes.

The library is designed for easy integration into deep learning (DL) frameworks, whether you are implementing them from scratch or customizing existing ones.

  • Built on top of lower-level communication middleware - MPI and OFI (libfabrics) which transparently support many interconnects, such as Intel® Omni-Path Architecture, InfiniBand*, and Ethernet.
  • Optimized for high performance on Intel® CPUs and GPUs.
  • Allows the tradeoff of compute for communication performance to drive scalability of communication patterns.
  • Enables efficient implementations of collectives that are heavily used for neural network training, including allreduce, and allgather.

Version History

Date Version Major Change Summary
Mar 2021 2021.2 bug fixes and improvement
Dec 2020 2021.1 Initial Release


Major Features Supported

Table1
Functionality Subitems CPU GPU
Collective operations Allgatherv X X
  Allreduce X X
  Alltoall X X
  Alltoallv X X
  Barrier X X
  Bcast X X
  Reduce X X
  ReduceScatter X X
Data types [u]int[8, 16, 32, 64] X X
  fp[16, 32, 64], bf16 X X
Persistency   X -
Prioritization LIFO X -
  User-defined X -
Batched collectives Allreduce X -
Custom reduction   X -
Scaling Scale-up X 1 GPU per process
  Scale-out X 1 GPU per process
Tracking dependencies and completion Test/Wait calls X X
Programming model Rank = device 1 rank per process 1 rank per process

Service functionality

  • Interoperability with SYCL*:
    • Construction of oneCCL communicator object based on SYCL context and SYCL devices
    • Construction of oneCCL stream object based on SYCL queue
    • Passing SYCL buffer as source/destination parameter of oneCCL collective operation

What's New

  • Added float16 datatype support.
  • Added ip-port hint for customization of KVS creation.
  • Optimized communicator creation phase.
  • Optimized multi-GPU collectives for single-node case.
  • Bug fixes

Known issues and limitations

  • Limitations imposed by Intel®  DPC++ compiler:
    • SYCL buffers cannot be used from different queues
  • The 'using namespace oneapi;' directive is not recommended, as it may result in compilation errors 
    when oneCCL is used with other oneAPI libraries. You can instead create a namespace alias for oneCCL, e.g. 
namespace oneccl = ::oneapi::ccl;
oneccl::allreduce(...);

System Requirements

What's New

  • Added [u]int16 support

  • Added initial support for external launch mechanism

  • Fixed bugs

Known issues and limitations

  • Limitations imposed by Intel®  DPC++ compiler:
    • SYCL buffers cannot be used from different queues
  • The 'using namespace oneapi;' directive is not recommended, as it may result in compilation errors 
    when oneCCL is used with other oneAPI libraries. You can instead create a namespace alias for oneCCL, e.g. 

namespace oneccl = ::oneapi::ccl; oneccl::allreduce(...);

 

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.