Servidor

Enabling Connectionless DAPL UD in the Intel® MPI Library

What is DAPL UD?

Traditional InfiniBand* support involves MPI message transfer over the Reliable Connection (RC) protocol. While RC is long-standing and rich in functionality, it does have certain drawbacks: since it requires that each pair of processes setup a one-to-one connection at the start of the execution, memory consumption could (at the worst case) grow linearly as more MPI ranks are added and the number of pair connections grows.

  • Desenvolvedores
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Servidor
  • C/C++
  • Fortran
  • Intermediário
  • Biblioteca MPI Intel®
  • user datagrams
  • ud
  • dapl ud
  • IB
  • InfiniBand
  • scalability
  • Interface de transferência de mensagens
  • Computação de cluster
  • IDC White Paper: Running Mission-Critical Workloads on Enterprise Linux x86 Servers

    This IDC white paper, sponsored by Intel,  examines the growth of mission-critical workloads being hosted on x86 servers based on the Intel Xeon E7 series of processors running enterprise Linux operating systems. It looks at the way in which x86 servers are taking on more demanding workloads, including databases and enterprise applications. It also discusses IDC Workloads data that shows the growth of mission-critical business processing workloads on enterprise Linux platforms.

  • Desenvolvedores
  • Parceiros
  • Linux*
  • Servidor
  • Intermediário
  • Mission-critical
  • Xeon
  • Linux x86
  • Computação em nuvem
  • Computação de cluster
  • Corporações
  • Código aberto
  • Segurança
  • Intel(r) Transactional Synchronization Extensions (Intel(r) TSX) profiling with Linux perf

    Intel TSX exposes a speculative execution mode to the programmer to improve locking performance.. Tuning speculation requires heavily on a PMU profiler. This document describes TSX profiling using the Linux  perf) (or “perf events”) profiler, that comes integrated with newer Linux systems.

    Modern locking

    Modern locking

    Most multi-threaded software uses locking. Lock optimization traditionally has aimed to reduce lock contention, that is make the critical regions smaller. In optimized software, this often results in a lot of very small critical regions, protected by many locks. Each critical region does only a little work, before releasing the lock and potentially letting some other CPU access the same data.

    Measuring Load Imbalance using the Intel® Vtune™ Amplifier XE

    OpenMP on the Intel® Xeon Phi™ coprocessor performs as well as on Intel® Xeon processors. However, the slower clock on the Intel Xeon Phi coprocessor and the sheer number of threads accentuates OpenMP overhead.  In most cases, the problem is either load imbalance or a significant amount of serial execution and is rarely the overhead itself.

    Let’s take a look at the following Intel Vtune screenshot.

  • Desenvolvedores
  • Professores
  • Estudantes
  • Servidor
  • Intel® VTune™ Amplifier XE
  • MIC
  • Knights Corner
  • Intel Xeon Phi
  • Arquitetura Intel® Many Integrated Core
  • Otimização
  • Computação paralela
  • Thread
  • Check out the Intel® Advisor XE 2013 Update 3..

    Intel® Advisor XE 2013 Update 3 guides developers to add parallelism to their existing C/C++ programs. Using this tool, you can identify where most of the time is spent in your code, which of those locations can actually scale to multi-core and what correctness issues are lurking in those locations. The information provided by this tool can help you decide where to thread your code more judiciously. You can learn more about this tool at the Intel(R) Advisor XE 2013 home page.

    Páginas

    Assine o Servidor