Intel® MPI Library

Deliver flexible, efficient, and scalable cluster messaging.

One Library with Multiple Fabric Support

Intel® MPI Library is a multifabric message-passing library that implements the open source MPICH specification. Use the library to create, maintain, and test advanced, complex applications that perform better on HPC clusters based on Intel® and compatible processors.

Develop applications that can run on multiple cluster interconnects that you choose at runtime.
Quickly deliver maximum end-user performance without having to change the software or operating environment.
Achieve the best latency, bandwidth, and scalability through automatic tuning.
Reduce the time to market by linking to one library and deploying on the latest optimized fabrics.

Download as Part of the Toolkit

Intel MPI Library is included in the Intel® HPC Toolkit. Get the toolkit to analyze, optimize, and deliver applications that scale.

Get It Now

Download the Stand-Alone Version

A stand-alone download of Intel MPI Library is available. You can download binaries from Intel or choose your preferred repository.

Download

Runtime Version

Develop in the Cloud

Build and optimize oneAPI multiarchitecture applications using the latest Intel-optimized oneAPI and AI tools, and test your workloads across Intel® CPUs and GPUs. No hardware installations, software downloads, or configuration necessary.

Get Access

For FPGA and People's Republic of China (PRC) Based Developers

Features

OpenFabrics Interface* (OFI) Support

This optimized framework exposes and exports communication services to HPC applications. Key components include APIs, provider libraries, kernel services, daemons, and test applications.

Intel MPI Library uses OFI to handle all communications.

Enables a more streamlined path that starts at the application code and ends with data communications
Allows tuning for the underlying fabric to happen at runtime through simple environment settings, including network-level features like multirail for increased bandwidth
Helps you deliver optimal performance on extreme scale solutions based on Mellanox InfiniBand* and Cornelis Networks*

As a result, you gain increased communication throughput, reduced latency, simplified program design, and a common communication infrastructure.

Scalability

This library implements the high-performance MPI 3.1 and 4.0 standard on multiple fabrics. This lets you quickly deliver maximum application performance (even if you change or upgrade to new interconnects) without requiring major modifications to the software or operating systems.

Thread safety allows you to trace hybrid multithreaded MPI applications for optimal performance on multicore and manycore Intel architectures.
Improved start scalability is through the mpiexec.hydra process manager, which is:
- a process management system for starting parallel jobs
- designed to natively work with multiple network protocols such as ssh, rsh, pbs, slurm, and sge
Built-in cloud support for Amazon Web Services*, Microsoft Azure*, and Google* Cloud Platform

Performance and Tuning Utilities

Two additional functionalities help you achieve top performance from your applications.

Interconnect Independence

The library provides an accelerated, universal, multifabric layer for fast interconnects via OFI, including for these configurations:

Transmission Control Protocol (TCP) sockets
Shared memory
Interconnects based on Remote Direct Memory Access (RDMA), including Ethernet and InfiniBand

It accomplishes this by dynamically establishing the connection only when needed, which reduces the memory footprint. It also automatically chooses the fastest transport available.

Develop MPI code independent of the fabric, knowing it will run efficiently on whatever network you choose at runtime.
Use a two-phase communication buffer-enlargement capability to allocate only the memory space required.

Application Binary Interface Compatibility

An application binary interface (ABI) is the low-level nexus between two program modules. It determines how functions are called and also the size, layout, and alignment of data types. With ABI compatibility, applications conform to the same set of runtime naming conventions.

Intel MPI Library offers ABI compatibility with existing MPI-1.x and MPI-2.x applications. So even if you are not ready to move to the new 3.1 and 4.0 standards, you can take advantage of the library’s performance improvements by using its runtimes, without recompiling.

Intel® MPI Benchmarks are used as a set of MPI performance measurements for point-to-point and global communication operations across a range of message sizes. Run all of the supported benchmarks or specify a single executable file in the command line to get results for a particular subset.

The generated benchmark data fully characterizes:

Performance of a cluster system, including node performance, network latency, and throughput
Efficiency of the MPI implementation

User Guide

The library has a robust set of default parameters that you can use as is, or refine them to ensure the highest performance. If you want to tune parameters beyond the defaults, use mpitune to adjust your cluster or application parameters, and then iteratively adjust and fine-tune the parameters until you achieve the best performance.

Windows*

Linux*

Documentation

Get Started

Developer Guides

Developer References

View All Documentation

Featured Documentation

Training

Tutorials

Use the MPI Tuner for Intel MPI Library: Linux (PDF) | Windows (PDF)
Analyze an OpenMP and MPI Application on Linux

Specifications

Processors:

Intel® Xeon® processors and CPUs with compatible Intel® 64 architecture
Intel® Data Center GPU Max Series

Development environments:

Windows: Microsoft Visual Studio*
Linux: Eclipse* and Eclipse C/C++ Development Tooling (CDT)*

Languages:

Natively supports C, C++, and Fortran development

Interconnect fabric support:

Shared memory
Sockets such as TCP/IP over Ethernet and Gigabit Ethernet Extender*

Operating systems:

Windows
Linux

Related Tools

Intel® Trace Analyzer and Collector: Achieve high performance for parallel cluster applications.
Intel® VTune™ Profiler: Locate performance bottlenecks fast.
Intel® Advisor: Optimize HPC code for modern hardware.
Intel® Inspector: Find root-cause errors early before you release.
Intel® oneAPI Collective Communications Library: Scalable and efficient distributed training for deep neural networks. It is built on the Intel MPI Library.

Stay In the Know on All Things CODE

Sign up to receive the latest trends, tutorials, tools, training, and more to
help you write better code optimized for CPUs, GPUs, FPGAs, and other
accelerators—stand-alone or in any combination.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in