Optimizing your application for multi-core technology can result in big performance improvements, but it requires a plan of action that is well suited to your application. This article gives an overview of key steps to follow as you optimize your code.
Since that brief aside on terminology is out of the way, let us continue with the kitchen analogy.
Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
LAMMPS is an open-source software package that simulates classical molecular dynamics. As it supports many energy models and simulation options, its versatility has made it a popular choice. It was first developed at Sandia National Laboratories to use large-scale parallel computation.
Applications often use files to store data from one run to the next, but high-capacity, non-volatile memory devices make it possible to store data more effectively than using a disk-based file system. This article describes how to design your application to take advantage of these memory devices, thereby avoiding the need for files to serve as persistent memory.
Intel’s non-uniform memory access (NUMA) strategy is based on several new memory technologies that promise significant improvements in both capability and performance. This article provides information on Multi-Channel DRAM (MCDRAM) and High-Bandwidth Memory (HBM), Non-volatile dual inline-memory modules (NVDIMMs), and Intel® Omni-Path Fabric (Intel® OP Fabric).
Do you have a problem that Intel non-uniform memory access (NUMA) hardware and the related tools and strategies can solve? The answer depends on the problem you are facing and if you can make decisions about choosing/changing your hardware, your software, or both. This article walks you through the decision.
Learn how to build an application that runs effectively on non-uniform memory access (NUMA) hardware. This article walks you through choosing the algorithm all the way through to measuring your application's performance.
Modern Memory Subsystems Benefits for Data Base Codes, Linear Algebra Codes, Big Data, and Enterprise StorageThis article describes and contrasts advantages different types of memory, including Multi-Channel DRAM (MCDRAM) and High-Bandwidth Memory (HBM), the future 3D XPoint™ memory devices, and Intel® Omni-Path Fabric (Intel® OP Fabric).
If printf or fprintf functions cause transaction aborts, use Intel® Processor Trace as a work-around.