- Visão geral
Speaker: Mike Voss, Intel
Due to energy constraints, hardware designers can no longer provide performance gains solely through increased processor frequencies or by simply including more general purpose cores per node. As a result, computing systems have become increasingly heterogeneous, achieving greater performance per watt through hardware that is tuned for specific computational kernels or application domains. However, to use these heterogeneous resources, developers must:
- Identify critical kernels in their applications and decide if they can be accelerated
- Express and optimize kernels for the hardware
- Manage the offload to and the communication between those resources
In this presentation, we focus on an alternative approach that uses nodes that contain Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Programming models and the development tools are identical for these resources, greatly simplifying development. We discuss how the same models for vectorization and threading can be used across these compute resources to create software that performs well on them. We further propose an extension to the Intel® Threading Building Blocks (Intel® TBB) flow graph interface that enables intra-node distributed memory programming, simplifying communication, and load balancing between the processors and coprocessors. Finally, we validate this approach by presenting a benchmark of a risk analysis implementation that achieves record-setting performance.