SSCA2 Graph Sample

SSCA, or Scalable Synthetic Compact Application, is a collection of kernels applied to a weighted, directed graph. The output of each kernel is the input to the next.

  • Kernel 1: generate the graph randomly
  • Kernel 2: using the generated graph, find vertex pairs with largest integer weight and place them in an edge list S
  • Kernel 3: create a list of sub-graphs of all the paths of length SubGraphPathLength that start with an edge in S
  • Kernel 4: find the set of vertices in the graph with the highest Betweenness Centrality score, or shortest path enumeration-based centrality metric

This sample looks particularly at kernel 4. The betweenness centrality is calculated for each vertex in the graph, using the number of shortest paths between two vertices s and t and the number of those paths that pass through the vertex of interest v. Specifically, the betweenness centrality of a vertex v is the ratio of the number of paths from s to t that pass through v and the total number of paths from s to t. The results from kernel 3 serve as the inputs for s and t.

While the implementations of kernels 1-3 are interesting in their own right, this sample focuses on kernel 4. If you would like to learn more about these other kernels, and see OpenMP* implementations of all kernels, you can visit the HPC Graph Analysis website to download a research paper on the sample and the sample's unmodified source code.

Mention original author if it is external to Intel, and also a link to the samples page, e.g. "Click here for a more in-depth discussion about the sample as well as more information about the design choices of this sample."

  • System Requirements
  • Hardware:
    • Any Intel processor with AVX support like 2nd Generation Intel Core™ i3, i5, or i7 processors and Intel Xeon® E3 or E5 processor family, or newer
    For Microsoft Windows*:
    • Microsoft Visual Studio 2010* or 2012* standard edition or above
    • Intel® C++ Composer XE 2013 SP1 for Windows
    For Linux*:
    • GNU* GCC 4.5 or newer
    • Intel® C++ Composer XE 2013 SP1 for Linux*

Code Change Highlights:

Few words on how each change is applied. The below sections should be changed based on what is actually used in the code. Descriptions and code snippets may or may not be necessary, depending on clarity.

  • cilk_for
  • linear version:
    for (int y = 0; y<height; ++y) {
        render_x(width, height, y, ambient_occlusion_scalar, fimg);
    }
    
    
    cilk_for version:
    cilk_for (int y = 0; y<height; ++y) {
        render_x(width, height, y, ambient_occlusion_scalar, fimg);
    }
    
    
    This simple change creates code that ran _x faster on our machine.
  • pragma simd
  • The autovectorizer does an OK job of vectorizing the inner loop of ambient_occlusion_scalar, but because there are so many different kinds of objects, it will only use a vector length of 2 when 4 floats will fit. pragma simd provides the programmer with the opportunity to enhance the compiler's knowledge by giving more details, such as the vectorlength.

    scalar version:

    int ntheta = c_num_ao_samples;
    int nphi = c_num_ao_samples;
    float occlusion = 0.0f;
    for (int j = 0; j<ntheta; ++j) {
        for (int i = 0; i<nphi; ++i) {
            float theta = sqrtf(random_table[2*(nphi*j+i)]);
            float phi = 2.0f * static_cast&lfloat>(M_PI) * random_table[2*(nphi*j+i)+1];
            ...
        }
    }
    

    array notation version:

    int ntheta = c_num_ao_samples;
    int nphi = c_num_ao_samples;
    float occlusion = 0.0f;
    for (int j = 0; j<ntheta; ++j) {
    #pragma simd vectorlength(4)
        for (int i = 0; i<nphi; ++i) {
            float theta = sqrtf(random_table[2*(nphi*j+i)]);
            float phi = 2.0f * static_cast<float>(M_PI) * random_table[2*(nphi*j+i)+1];
            ...
        }
    }
    
  • cilk_for + pragma simd
  • To combine cilk_for and pragma simd, simply use the cilk_for call for each line, and pragma simd function to calculate ambient occlusion for each pixel.

Performance Data:

Serial SpeedupModified SpeedupCompiler (Intel-64)Compiler optionsSystem specifications
1x AN: 1.0x
cilk_for: 4.6x
Both: 5.3x
Intel® C++ Compiler 14.0 for Windows /O2 /Oi /Ot /fp:fast /QxHost /Qip /MD Windows Server* 2012
2nd Generation Intel® Xeon® E3 1280 CPU @ 3.50GHz
8GB memory
3068ms AN: 497ms
cilk_for: 894ms
Both: 133ms
Intel® C++ Compiler 14.0 for Linux -O2 -fast -fp-model fast -xHost -ip Ubuntu* 10.04
3rd Generation Intel® Core™ i7-2600K CPU @ 3.40GHz
8GB memory

 

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
    • Requirements: Microsoft* Visual Studio* 2010/2012 standard edition or above; Intel® C++ Composer XE 2013 for Windows
    • Open the solution .sln file
    • [Optional] To collect performance numbers (will run example 5 times and take average time):
      • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    • Choose a configuration (for best performance, choose a release configuration):
      • Intel-debug and Intel-release: uses Intel® C++ compiler
      • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
  • For Windows* Command Line users:
    • Enable your particular compiler environment
    • for Intel® C++ Compiler:
      • open the appropriate Intel C++ compiler command prompt
      • navigate to project folder
      • compile with Build.bat [perf_num]
        • perf_num: collect performance numbers (will run example 5 times and take average time)
      • to run: Build.bat run [help|0|1|2|3|4]
    • for Visual C++ Compiler (only linear/scalar will run):
      • open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
      • navigate to project folder
      • to compile: Build.bat [perf_num]
        • perf_num: collect performance numbers (will run example 5 times and take average time)
  • For Linux* users:

    • to check if SDL, SDL_ttf, and freetype are installed: make check_sdl
    • if any are missing, install via native package manager or from source
    • to install from source:
      • SDL 1.2: Download source code and extract folder to sdl_source. Version in this code: 1.2.15
      • SDL_ttf 2.0: Download source code and extract folder to sdl_source. Version in this code: 2.0.11
      • Freetype2: Download source code and extract folder to sdl_source. Version in this code: 2.5.0
      • to build and install missing libraries: make build_sdl
    • set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    • navigate to project folder
    • for Intel® C++ compiler:
      • to compile: make [icpc] [perf_num=1]
        • perf_num=1: collect performance numbers (will run example 5 times and take average time)
      • to run: make run [option=help|0|1|2|3|4]
    • for gcc (only linear/scalar will run):
      • compile with make gcc [perf_num=1]
        • perf_num=1: collect performance numbers (will run example 5 times and take average time)
      • to run: make run
For more complete information about compiler optimizations, see our Optimization Notice.