Finance: Binomial Lattice

Option pricing is the problem of computing the expected present value of a financial instrument (most usually stocks, but also interest rates, foreign exchange rates, bonds, etc). This is based on a forecast of cashflows over a specific time horizon. The expected present value is used to determine the fair premium of an option on that instrument. The Binomial (Tree or Lattice) option pricing model is a confluent discretization of a Brownian motion process. Note that for a large number of steps the binomial distribution approximates a Gaussian distribution. The discretization takes the form of a recombinant tree, where each level of the tree represents the set of values the underlying can take at specific points in time in the lifetime of the option. Intel® Cilk™ Plus cilk_for is used to distribute the calculations of pricing batches among the multi-cores and the "#pragma simd" of the Intel® Cilk™ Plus also further makes the computation to be data-parallel among the SIMD registers in one single core.

 

Code Change Highlights:

cilk_for is used to calculate pricing batches in parallel, while pragma simd is used internally to each iteration.
  • cilk_for
  • linear version:
    for (int idx = 0; idx < NUM_OPTIONS; idx += NUM_OPTIONS_PER_BATCH) { for (int k = idx; k < idx + NUM_OPTIONS_PER_BATCH; k++) { fptype* opt_price = new fptype[MAX_NTIMESTEPS]; fptype* upow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* dpow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* tmp_val = new fptype[MAX_NTIMESTEPS]; ... } }
    cilk_for version:
    cilk_for (int idx = 0; idx < NUM_OPTIONS; idx += NUM_OPTIONS_PER_BATCH) { cilk_for (int k = idx; k < idx + NUM_OPTIONS_PER_BATCH; ++k) { fptype* opt_price = new fptype[MAX_NTIMESTEPS]; fptype* upow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* dpow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* tmp_val = new fptype[MAX_NTIMESTEPS]; ... } }
  • cilk_for + pragma simd
  • The autovectorizer can vectorize the inner for loops in the serial version, but require explicit vectorization for the cilk_for version.
    serial version:
    for (int idx = 0; idx < NUM_OPTIONS; idx += NUM_OPTIONS_PER_BATCH) { for (int k = idx; k < idx + NUM_OPTIONS_PER_BATCH; ++k) { fptype* opt_price = new fptype[MAX_NTIMESTEPS]; fptype* upow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* dpow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* tmp_val = new fptype[MAX_NTIMESTEPS]; ... for (int i = 0; i <= OPT_TIMESTEPS; ++i) { opt_price[i] = std::max((exe_price - upow_tbl[OPT_TIMESTEPS - i] * dpow_tbl[i]), ZERO); } for (int i = OPT_TIMESTEPS - 1; i >= 0; --i) { for (int j = 0; j <= i; ++j) { tmp_val[j] = (p * opt_price[j] + (ONE - p) * opt_price[j+1]) * multiplier; } for (int j = 0; j <= i; ++j) { fptype t1 = upow_tbl[i - j] * dpow_tbl[j]; fptype early_exe_value = std::max(exe_price - t1, ZERO); opt_price[j] = std::max(early_exe_value, tmp_val[j]); } } ... } }
    Looking closer at the second pragma simd, you can see that the compiler can correctly vectorize a calculation involving a forward reference, namely using opt_price[j] in conjunction with opt_price[j+1]
    cilk_for + pragma simd version:
    cilk_for (int idx = 0; idx < NUM_OPTIONS; idx += NUM_OPTIONS_PER_BATCH) { cilk_for (int k = idx; k < idx + NUM_OPTIONS_PER_BATCH; ++k) { fptype* opt_price = new fptype[MAX_NTIMESTEPS]; fptype* upow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* dpow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* tmp_val = new fptype[MAX_NTIMESTEPS]; ... #pragma simd for (int i = 0; i <= OPT_TIMESTEPS; ++i) { opt_price[i] = std::max((exe_price - upow_tbl[OPT_TIMESTEPS - i] * dpow_tbl[i]), ZERO); } for (int i = OPT_TIMESTEPS - 1; i >= 0; --i) { // Vectorization is possible even with forward reference of the array opt_price (j and j+1) #pragma simd for (int j = 0; j <= i; ++j) { tmp_val[j] = (p * opt_price[j] + (ONE - p) * opt_price[j+1]) * multiplier; } #pragma simd for (int j = 0; j <= i; ++j) { fptype t1 = upow_tbl[i - j] * dpow_tbl[j]; fptype early_exe_value = std::max(exe_price - t1, ZERO); opt_price[j] = std::max(early_exe_value, tmp_val[j]); } } ... } }

Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified Speedup Compiler (Intel® 64) Compiler options System specifications
cilk_for: 3.63x
Both: 3.7x
Intel C++ Compiler 15.0 for Windows /O2 /QxAVX /Qipo /fp:fast /EHsc Microsoft Windows 7 Enterprise
Intel® Core™ i5-4670T CPU @2.30GHz
16GB memory
cilk_for: 1.31x
Both: 1.4x
Intel C++ Compiler 15.0 for Linux -O2 -fp-model fast -xAVX -ipo RHEL 7 (x64)
4rd Generation Intel Core™ i7-4790 CPU @ 3.60GHz
32GB memory

Build Instructions:

  • For Microsoft Visual Studio* 2010 and 2012 users:
  • Open the solution .sln file
    [Optional] To collect performance numbers (will run example 5 times and take average time):
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add PERF_NUM
    [Optional] To use double-precision floats:
    • Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add DOUBLE_PRECISION
    Choose a configuration (for best performance, choose a release configuration):
    • Intel-debug and Intel-release: uses Intel® C++ compiler
    • VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)

    Open the results price_results_*.txt in folder OptionPricing

  • For Windows* Command Line users:
  • Enable your particular compiler environment
    For Intel® C++ Compiler:
    • Open the appropriate Intel C++ compiler command prompt
    • Navigate to project folder
    • Compile with Build.bat [perf_num] [double]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
      • double: use double-precision floats
    • To run: Build.bat run
    For Visual C++ Compiler (only linear/scalar will run):
    • Open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
    • Navigate to project folder
    • To compile: Build.bat [perf_num] [double]
      • perf_num: collect performance numbers (will run example 5 times and take average time)
      • double: use double-precision floats
    • To run: Build.bat run

    Open the results price_results_*.txt at root project folder

  • For Linux* or OS X* users:
  • Set the icc environment: source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}
    Navigate to project folder
    For Intel® C++ compiler:
    • To compile: make [icpc] [perf_num=1] [double=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
      • double=1: use double-precision floats
    • To run: make run
    For gcc (only linear/scalar will run):
    • Compile with make gcc [perf_num=1] [double=1]
      • perf_num=1: collect performance numbers (will run example 5 times and take average time)
      • double=1: use double-precision floats
    • To run: make run

    Open the results price_results_*.txt at root project folder

For more complete information about compiler optimizations, see our Optimization Notice.