NUMA

Evaluating the Power Efficiency and Performance of Multi-core Platforms Using HEP Workloads

As Moore’s Law drives the silicon industry towards higher transistor counts, processor designs are becoming more and more complex. The area of development includes core count, execution ports, vector units, uncore architecture and finally instruction sets. This increasing complexity leads us to a place where access to the shared memory is the major limiting factor, resulting in feeding the cores with data a real challenge. On the other hand, the significant focus on power efficiency paves the way for power-aware computing and less complex architectures to data centers. In this paper we try to examine these trends and present results of our experiments with Intel® Xeon® E5 v3 (code named Haswell-EP) processor family and highly scalable High-Energy Physics (HEP) workloads.
  • Developers
  • Linux*
  • Server
  • Haswell
  • CERN
  • NUMA
  • Intel® Advanced Vector Extensions
  • Code Modernization
  • Data Center
  • Parallel Computing
  • Power Efficiency
  • Threading
  • Vectorization
  • 评估使用 HEP 工作负载的多核平台的能效和性能

    As Moore’s Law drives the silicon industry towards higher transistor counts, processor designs are becoming more and more complex. The area of development includes core count, execution ports, vector units, uncore architecture and finally instruction sets. This increasing complexity leads us to a place where access to the shared memory is the major limiting factor, resulting in feeding the cores with data a real challenge. On the other hand, the significant focus on power efficiency paves the way for power-aware computing and less complex architectures to data centers. In this paper we try to examine these trends and present results of our experiments with Intel® Xeon® E5 v3 (code named Haswell-EP) processor family and highly scalable High-Energy Physics (HEP) workloads.
  • Developers
  • Linux*
  • Server
  • Haswell
  • CERN
  • NUMA
  • Intel® Advanced Vector Extensions
  • Code Modernization
  • Data Center
  • Parallel Computing
  • Power Efficiency
  • Threading
  • Vectorization
  • A Mission-Critical Big Data Platform for the Real-Time Enterprise

    As the volume and velocity of enterprise data continue to grow, extracting high-value insight is becoming more challenging and more important. Businesses that can analyze fresh operational data instantly—without the delays of traditional data warehouses and data marts—can make the right decisions faster to deliver better outcomes.
  • Developers
  • Server
  • Java*
  • JVM
  • Intel® Xeon® processor E7 v3
  • NRI
  • Intel® AVX2
  • OLTP
  • NUMA
  • Intel® Advanced Vector Extensions
  • Big Data
  • Vectorization
  • Знакомство с Numa

    Во время конкурса Acceler8 были обсуждения векторизации, разворотов циклов, и различных микрооптимизаций. Но, наверное, многие, как и я, сталкивались с проблемой, что какие бы оптимизации они не делали, программа на компьютере с небольшим числом ядер разгонялась, а на MTL скорость лишь неумолимо падала. Тогда стало ясно, что нужно каким-то образом разгружать RAM, иначе какой-либо прогресс невозможен.

    Optimizing Software Applications for NUMA: Part 5 (of 7)

    3.2. Data Placement Using Implicit Memory Allocation Policies

    In the simple case, many operating systems transparently provide support for NUMA-friendly data placement. When a single-threaded application allocates memory, the processor will simply assign memory pages to the physical memory associated with the requesting thread’s node (CPU package), thus insuring that it is local to the thread and access performance is optimal.

    Subscribe to NUMA