This is the second article in a series of articles about High Performance Computing with the Intel Xeon Phi. The Intel Xeon Phi is the first commercial product of Intel to incorporate the Many Integrated Core architecture. In this article I will present various frameworks for unleashing the power of multiple threads on the Xeon Phi. We will also have a look at interesting properties and advantages / disadvantages of each framework.
The included source code implements a parallel search algorithm on an input sorted sequence of strings, as described in the included problem description text file. The included write-up of the parallel algorithm sets for the pros and the cons for using either OpenMP or Pthreads to implement the parallelism. The implemented Pthreads code attempts to overlap I/O with the parallel searches to boost performance. The code was intended for Linux OS and includes a makefile to build the application.
The included source code implements a parallel search algorithm on an input sorted sequence of strings, as described in the included problem description text file. The parallel algorithm divides the input data into a number of non-intersecting segments, one per thread, and does parallel binary searches on the assigned segment by each thread for each of the multiple search strings. POSIX Threads is used for implementing the parallelism. The code was intended for Linux OS and includes a makefile to build the application.
The included source code implements a parallel Straight Radix Sort algorithm, as described in the included problem description text file. The code uses Pthreads to implement the parallelism. The submission includes Python scripts to check output sorting results and to generate random input test files. The code was intended for Linux OS.
Recommended Audience Programmers experienced in C/C++ (Linux), and POSIX threads, especially those interested/studying sorting algorithms
The included source code implements Strassen’s Algorithm for matrix-matrix multiplication in parallel, as described in the included problem description text file. The parallel algorithm uses Pthreads to implement a thread pool and the standard recursive algorithm to create smaller instances of the matrix multiplication. The smaller instances become tasks that can be assigned to threads within the thread pool. The code was intended for Linux OS and includes a makefile to build the application.
Betweenness is a metric applied to a vertex within a weighted graph. For the purposes of this problem we will define betweenness of vertex T as the number of shortest paths between two vertices in the graph that includes the vertex T, but does not start or end with vertex T.
The included code and white paper provides a parallel solution for the betweenness computation utilizing a modified version of Dijkstra's algorithm for single source shortest paths. The parallel algorithm uses calls to Dijkstra’s algorithm from each possible source node in the graph.
The included source code implements a variation of the Graph Coloring decision problem, as described in the included problem description text file. The application uses a simple brute force test of coloring the input graph with a given number of colors. If no such coloring is possible, the number of colors is incremented and the brute force search is executed again with the new number of possible colors. The actual attempts to assign colors to nodes are done in parallel through a task pool mechanism implemented with POSIX threads.