Multi-thread apps for Multi-Core

How Special Silicon Facilitates Parallel Arithmetic

SIMD (for single instruction, multiple data) enables one arithmetic instruction to be performed across multiple data items simultaneously. This article lightly touches on the advantages of SIMD for parallel applications.
  • Entwickler
  • Intel® C++-Compiler
  • Intel® Streaming SIMD Extensions
  • Multi-thread apps for Multi-Core
  • physics
  • visual computing
  • Spieleentwicklung
  • Grafik
  • Parallel Computing
  • Parallel reduce


    Parallel version of the Sieve of Eratosthenes

    Copyright 2005-2006 Intel Corporation. All Rights Reserved.

    Example program that computes number of prime numbers up to n, where n is a command line argument. The algorithm here is a fairly efficient version of the sieve of Eratosthenes. The parallel version demonstrates how to use parallel_reduce, and in particular how to exploit lazy splitting.

  • Multi-thread apps for Multi-Core
  • Threaded Code examples
  • Parallel Computing
  • Threading
  • Parallel while


    Introduction

    Example that uses parallel_while to do parallel preorder traversal of a sparse graph.
    Each vertex in the graph is called a "cell". Each cell has a value. The value is a matrix. Some of the cells have operators that compute the cell's value, using other cell's values as input. A cell that uses the value of cell x is called a successor of x.

    The algorithm works as follows.

     

  • Intel® Threading Building Blocks
  • Multi-thread apps for Multi-Core
  • Threaded Code examples
  • Parallel Computing
  • Substring finder


    Introduction

    A simple example that uses the parallel_for template in a substring matching program. For each position in a string, the program displays the length of the largest matching substring elsewhere in the string. The program also displays the location of a largest match for each position. Consider the string "babba" as an example. Starting at position 0, "ba" is the largest substring with a match elsewhere in the string (position 3).

    ==============================================================================

  • Intel® Threading Building Blocks
  • Multi-thread apps for Multi-Core
  • Threaded Code examples
  • Parallel Computing
  • Task - tree sum


    Introduction

    This is a simple example that sums values in a tree. The example exhibits some speedup, but not a lot, because it quickly saturates the system bus on a multiprocessor. For good speedup, there needs to be more computation cycles per memory reference. The point of the example is to teach how to use the raw task interface, so the computation is deliberately trivial.


    SerialSumTree.cpp

    ==============================================================================

    Copyright 2005-2006 Intel Corporation. All Rights Reserved.

  • Intel® Threading Building Blocks
  • Multi-thread apps for Multi-Core
  • Threaded Code examples
  • Parallel Computing
  • Resolve 64K Alias Conflicts on Hyper-Threading Technology-Enabled Systems


    Challenge

    Avoid performance degradation due to 64K alias conflicts for cache resources. Intel® processors with Hyper-Threading Technology share the first-level data cache among logical processors. Two data virtual addresses that reside on cache lines that are modulo 64 KB apart will conflict for the same cache line in the first-level data cache. This can both affect first-level data cache performance and impact the branch-prediction unit.

  • Hyper-Threading
  • Multi-thread apps for Multi-Core
  • Parallel Computing
  • Performance Degradation Due to Spin-Wait Loops on Hyper-Threading Technology-Enabled Systems


    Challenge

    Prevent negative performance impacts in application execution due to spin-wait loops on systems that support Hyper-Threading Technology.

    A spin-wait loop is a technique used in multithreaded applications whereby one thread waits for other threads. The wait can be required for protection of a critical section, for barriers, or for other necessary synchronizations. Typically, the structure of a spin-wait loop consists of a loop that compares a synchronization variable with a predefined value as shown in the following sample code:

  • Multi-thread apps for Multi-Core
  • How to thread?
  • Design
  • Parallel Computing
  • Multi-thread apps for Multi-Core abonnieren