Servidor
Enabling Connectionless DAPL UD in the Intel® MPI Library
What is DAPL UD?
Traditional InfiniBand* support involves MPI message transfer over the Reliable Connection (RC) protocol. While RC is long-standing and rich in functionality, it does have certain drawbacks: since it requires that each pair of processes setup a one-to-one connection at the start of the execution, memory consumption could (at the worst case) grow linearly as more MPI ranks are added and the number of pair connections grows.
A case study comparing AoS (Arrays of Structures) and SoA (Structures of Arrays) data layouts for a compute-intensive loop run on Intel® Xeon® processors and Intel® Xeon Phi™ product family coprocessors
Ganhe um convite para o TDC Florianópolis!

Você é da região de Florianópolis e gostaria de ir ao The Developers Conference nos dias 24, 25 e 26 de maio?
IDC White Paper: Running Mission-Critical Workloads on Enterprise Linux x86 Servers
This IDC white paper, sponsored by Intel, examines the growth of mission-critical workloads being hosted on x86 servers based on the Intel Xeon E7 series of processors running enterprise Linux operating systems. It looks at the way in which x86 servers are taking on more demanding workloads, including databases and enterprise applications. It also discusses IDC Workloads data that shows the growth of mission-critical business processing workloads on enterprise Linux platforms.
Intel(r) Transactional Synchronization Extensions (Intel(r) TSX) profiling with Linux perf
Intel TSX exposes a speculative execution mode to the programmer to improve locking performance.. Tuning speculation requires heavily on a PMU profiler. This document describes TSX profiling using the Linux perf) (or “perf events”) profiler, that comes integrated with newer Linux systems.
Intel® Xeon® & Xeon® Phi™ Webinar
This two day webinar series introduces you to the world of multicore and manycore computing with Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors.
This page contains replays of multiple sessions covering a variety of topics as listed below:
Modern locking
Modern locking
Most multi-threaded software uses locking. Lock optimization traditionally has aimed to reduce lock contention, that is make the critical regions smaller. In optimized software, this often results in a lot of very small critical regions, protected by many locks. Each critical region does only a little work, before releasing the lock and potentially letting some other CPU access the same data.
Measuring Load Imbalance using the Intel® Vtune™ Amplifier XE
OpenMP on the Intel® Xeon Phi™ coprocessor performs as well as on Intel® Xeon processors. However, the slower clock on the Intel Xeon Phi coprocessor and the sheer number of threads accentuates OpenMP overhead. In most cases, the problem is either load imbalance or a significant amount of serial execution and is rarely the overhead itself.
Let’s take a look at the following Intel Vtune screenshot.

Check out the Intel® Advisor XE 2013 Update 3..
Intel® Advisor XE 2013 Update 3 guides developers to add parallelism to their existing C/C++ programs. Using this tool, you can identify where most of the time is spent in your code, which of those locations can actually scale to multi-core and what correctness issues are lurking in those locations. The information provided by this tool can help you decide where to thread your code more judiciously. You can learn more about this tool at the Intel(R) Advisor XE 2013 home page.

