Life Science: Longest Common Subsequence

This sample finds the longest subsequence common to all sequences in two sequences. It demonstrates how to improve the performance of finding longest common subsequence using Intel® Cilk™ Plus. To demonstrate the performance increase, you will find two sequences to be stored in a single file separated by white-space. Let it as input, and assign an output file, you will get the longest common subsequence of two input sequences.

 
  • System Requirements
  • Hardware:
    • Any Intel processor with Intel® Advanced Vector Extensions (Intel® AVX) support like 2nd Generation Intel® Core™ i3, i5, or i7 processors and Intel® Xeon® E3 or E5 processor family, or newer
    For Microsoft* Windows*:
    • Microsoft* Visual Studio 2010* or 2012* standard edition or above
    • Intel® C++ Composer XE 2013 SP1 for Windows
    For Linux*:
    • GNU* GCC* 4.5 or newer
    • Intel® C++ Composer XE 2013 SP1 for Linux*

Code Change Highlights:

This code is optimized on finding length of all subsequences and the direction of arrows for trace back, which utilizes the cilk_for Keyword.
  • cilk_for
  • linear version:
    for(int i=1; i<=n; ++i) {
        for(int j=1; j<=m; ++j) {
            if (par_sequence_1[i-1] == par_sequence_2[j-1]) {
                subsequence_length[i][j] = subsequence_length[i-1][j-1] + 1;
                arrow_direction[i][j] = c_up_and_left;
            }
            else{
                subsequence_length[i][j] = subsequence_length[i-1][j-1] + 0;
                arrow_direction[i][j] = c_neither;
            }
            if (subsequence_length[i-1][j] >= subsequence_length[i][j]) {
                subsequence_length[i][j] = subsequence_length[i-1][j];
                arrow_direction[i][j] = c_up;
            }
            if (subsequence_length[i][j-1] >= subsequence_length[i][j]) {
                subsequence_length[i][j] = subsequence_length[i][j-1];
                arrow_direction[i][j] = c_left;
            }
        }
    }
    cilk_for version:
    for(int i=1; i<=n; ++i) {
        cilk_for(int j=1; j<=m; ++j) {
            if (par_sequence_1[i-1] == par_sequence_2[j-1]) {
                subsequence_length[i][j] = subsequence_length[i-1][j-1] + 1;
                arrow_direction[i][j] = c_up_and_left;
            }
            else{
                subsequence_length[i][j] = subsequence_length[i-1][j-1] + 0;
                arrow_direction[i][j] = c_neither;
            }
            if (subsequence_length[i-1][j] >= subsequence_length[i][j]) {
                subsequence_length[i][j] = subsequence_length[i-1][j];
                arrow_direction[i][j] = c_up;
            }
        }
        for(int j=1; j<=m; ++j) {
            if (subsequence_length[i][j-1] >= subsequence_length[i][j]) {
                subsequence_length[i][j] = subsequence_length[i][j-1];
                arrow_direction[i][j] = c_left;
            }
        }
    }
    This simple change creates code that ran almost 1.5x faster on our machine.

Performance Data:

Serial Speedup Modified Speedup Compiler
(Intel® 64)
Compiler options System specifications
1x cilk_for: 1.23x Intel C++ Compiler 14.0 for Windows /W3 /QxSSE4.2 /Zi /O2 /Qvec-report1 /Qipo /Oi /MD /EHsc /DPERF_NUM Windows 7​ (x64)
Intel(R) Core(TM) i5 CPU M 560 @ 2.67GHz
4GB memory
1x cilk_for: 1.30x Intel C++ Compiler 14.0 for Linux -W3 -xSSE4.2 -O2 -vec-report1 -ipo -Oi -MD -EHsc Linux 2.6.32-71.el6.x86_64
3rd Generation Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
6GB memory

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
  • Open the solution .sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    Choose a configuration (for best performance, choose a release configuration):
    • Intel-debug and Intel-release: uses Intel® C++ compiler
    • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)
  • For Windows* Command Line users:
  • Enable your particular compiler environment
    for Intel® C++ Compiler:
    • open the appropriate Intel C++ compiler command prompt
    • navigate to project folder
    • compile with Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • to run: Build.bat run
    for Visual C++ Compiler (only linear/scalar will run):
    • open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
    • navigate to project folder
    • to compile: Build.bat [perf_num]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
    • to run: Build.bat run>
  • For Linux* or OS X* users:
  • set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    navigate to project folder
    for Intel® C++ compiler:
    • to compile: make [icpc] [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • to run: make run
    for gcc (only linear/scalar will run):
    • compile with make gcc [perf_num=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
    • to run: make run
For more complete information about compiler optimizations, see our Optimization Notice.