Intel® Parallel Computing Centers

Universities, institutions, and labs that work to optimize open-source applications.


Get Started | Intermediate & Advanced | Upcoming Webinars

As the world of high-performance computing (HPC) evolves and becomes accessible, use this starting point to optimization and gaining better compute performance. While many applications already use features of modern hardware, many more do not extract parallelism in their algorithms, nor do they leverage other new capabilities including larger caches, Single Instruction Multiple Data (SIMD), threading, fabric technology, new file architecture, and nonvolatile memory technology.

Invited Talk Series

Presented by global partners who use Intel® architecture for scientific breakthroughs, these talks share optimization techniques, best practices, and results. This series is for students, educators, developers, scientists, data analysts, system administrators, and more, who work to maximize software efficiency using Intel® technology.

A Hybrid MPI-plus-Threads Approach to Group Finding Using Union-Find

James Willis, Institute for Computational Cosmology (ICC), University of Durham

Learn about a novel implementation of a structure-finding algorithm in the context of a cosmological simulation. The Friends-of-Friends​​ (FoF) algorithm is the standard technique for identifying structures in cosmological simulations. Current implementations have problems with high memory usage and low parallel efficiency. Hear about a new method that uses the common Union-Find data structure​​ and a hybrid MPI-plus-threads approach to scale up to thousands of cores. The threaded part of the algorithm can also be expressed elegantly in a task-based formalism if the rest of the application uses this framework.

Thursday, January 31, 2019

8:00 a.m. - 8:30 a.m. Pacific standard time



Hybrid Quantum-Classical Computing Architectures

Martin Suchara is a computational scientist at Argonne National Laboratory. Martin's research focuses on quantum communication and networking, quantum error correction, quantum simulations, and distributed quantum computing. 

Classical supercomputing can help unreliable, intermediate-sized quantum processors to solve large problems reliably. Martin describes the benefits of using a hybrid quantum-classical architecture. Larger quantum circuits are broken into smaller subcircuits that are evaluated separately, either using a quantum processor or a quantum simulator running on a classical supercomputer. Circuit compilation techniques that determine which qubits are simulated classically greatly impact the system performance and provide a trade-off between circuit reliability and run time.

Thursday, February 28th, 2019

8:00 a.m. - 8:30 a.m. Pacific standard time


Get Started 

This collection of self-paced training and reference materials  provides an overview of parallel programming on Intel® architecture.

   This icon indicates that select videos or training are provided by third parties and may require registration to their site.

Intel® Xeon® Processor Family & Intel® Xeon Phi™ Product Family

Learn how to modernize code for the Intel® Xeon Phi™ processors. Gain insight for OpenMP*, Intel® MPI Library, and Intel® software to write code using better vectorization and parallelism for hardware optimization.

   Why Use Code Modernization?
   The Purpose of Intel® Many Integrated Core Architecture
   Think Parallel: Modern Applications for Modern Hardware
   Six Steps to Getting Ready for the Intel Xeon Phi Processor
   Parallel Programming Models and Optimization Strategies
   Deep Dive with Code Modernization Experts


   Code for Speed with High-Bandwidth Memory on Intel Xeon Phi Processors
   Optimize for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with or without Intel AVX-512 Hardware


   A Crash Course on Multithreading with OpenMP
   An Overview of Programming Options
   Vectorization: The "Other" Parallelism You Need
   Beyond Shared Memory Parallel Programming


   Debugging and Profiling Tools for Intel Xeon Phi Coprocessors
   Intel® HPC Orchestrator
   What's New in Intel® Parallel Studio XE?


   Leverage Open-Source Software Defined Visualization

Solutions for Lustre* Training

Use these materials to further your knowledge of the Lustre* file system, gain deeper insight into solutions from Intel, and explore fundamental concepts and advanced implementation and configuration details.

   A New Generation of Lustre Software Expands HPC
   Get the Most from Your Data
   High-Performance Parallel Storage for the Enterprise

Intel® Omni-Path Architecture (Intel® OPA) Training

The next generation of HPC switch technology, Intel® Omni-Path Fabric (Intel® OP Fabric), is designed for improving system-level packaging and network efficiency. It enables a broad class of computations requiring scalable, tightly coupled processor, memory, and storage resources. These training materials help you become familiar with Intel® OPA.

   Webinar Series
   Design Fabrics with Intel OPA
   Next-Generation Fabric: Details on the Intel OPA
   Advanced Features of the Intel OPA Network Layers
   Democratize Best-in-Class Interconnect Performance
   The Intel OPA Launch
   Maximize HPC Storage Performance

Intermediate & Advanced

Access hands-on workshops, code samples, case studies, and domain-focused training to get the most out of your code on Intel architecture. We also encourage you to check out the Intel® Software Innovator and Intel® Black Belt Software Developer programs.

Intel Xeon Processor Family & Intel Xeon Phi Product Family

Get continued training for OpenMP*, Intel MPI Library, Intel® Parallel Studio, Intel Xeon Phi processor and coprocessor, expressing parallelism, and performance optimization methods.


   Program and Optimizate with Parallel Architectures from Intel


   Multi-Channel DRAM (MCDRAM) on Intel Xeon Phi Products – Analysis Methods and Tools
   How to Detect Intel AVX 512 Support (Intel Xeon Phi Processor)
   Scale your Application Across Shared and Distributed Memory
   Squashing Races, Deadlocks, and Memory Bugs


   Software Defined Visualization: Data Analysis for Current and Future Cyber Infrastructure
   Benefits of Leveraging Software Defined Visualization (OSPRay)
   From Correct to Correct and Efficient with Molecular Dynamics Benchmarks
   From Correct to Correct and Efficient with Hydro2D


   Optimization of Vector Arithmetics in Intel Architecture
   Optimization of Multithreading in Intel Architecture
  Gain Performance through Vectorization Using Fortran
  Exploit Multilevel Parallelism in HPC Applications
  Roofline Analysis: Visualize Impact of Compute Versus Memory Optimizations

Data Layout

   SIMD Parallelism and Intrinsics


   Analyze Python* App Performance with Intel® VTune™ Amplifier
   How Non-Uniform Memory Access (NUMA) Affects Your Workloads for Intel VTune Amplifier

Solutions for Lustre Training

This advanced training is for anyone who wishes to further their knowledge of the file system and gain deeper insights into solutions from Intel for software. The training exposes you to many implementation concepts and configuration details.

   Analyze Whole Human Genomes for as Little as $22
   Lustre Delivers Performance, Scale, and Platform Stability for The Sanger Institute
   DownUnder GeoSolutions* Deploys Lustre for Seismic Data Processing in the Oil and Gas Industry