In part 4 we saw the effects of the QuickThread Parallel Tag Team Transpose method of Matrix Multiplication performed on a Dual Xeon 5570 systems with 2 sockets and two L3 caches, each shared by four cores (8 threads). and each processor with four L2 and four L1 caches each shared by one core and 2 threads, we find:
In the last installment (Part 3) we saw the effects of the QuickThread Parallel Tag Team method of Matrix Multiplication performed on two single processor systems:
By Jim Dempsey
In the previous article (part 2) we have seen that by reorganizing the loops and with use of temporary array we can observe a performance gain with SSE small vector optimizations (compiler does this) but a larger gain came from better cache utilization due to the layout change and array access order. The improvements pushed us into a memory bandwidth limitation whereby the Serial method now outperforms the Parallel method (of the Serial method).
October 2013: This WhatIf project has been retired, but this page remains for historical/archival purposes.
Warning! You are about to download an old, unsupported version of this software. For information about the current version of the compiler and matching runtime, please visit the Intel® Cilk™ Plus page. For tools supporting that compiler, please visit the SDK page.
Tachyon is a ray-tracer application, rendering objects described in data files. The Tachyon program is located in the product samples directory: <install-dir>/composerxe/Samples/<locale>/C++/tachyon_compiler.tar.gz. By default we use balls.dat as the input file. Data files are stored in the directory tachyon/dat. Originally, Tachyon was an application with parallelism implemented in function pthread_create() with explicit threads: one thread does the rendering, and the other makes calculations.