# Finance: Binomial Lattice

Option pricing is the problem of computing the expected present value of a financial instrument (most usually stocks, but also interest rates, foreign exchange rates, bonds, etc). This is based on a forecast of cashflows over a specific time horizon. The expected present value is used to determine the fair premium of an option on that instrument. The Binomial (Tree or Lattice) option pricing model is a confluent discretization of a Brownian motion process. Note that for a large number of steps the binomial distribution approximates a Gaussian distribution. The discretization takes the form of a recombinant tree, where each level of the tree represents the set of values the underlying can take at specific points in time in the lifetime of the option. Intel® Cilk™ Plus `cilk_for` is used to distribute the calculations of pricing batches among the multi-cores and the `"#pragma simd"` of the Intel® Cilk™ Plus also further makes the computation to be data-parallel among the SIMD registers in one single core.

#### Code Change Highlights:

`cilk_for` is used to calculate pricing batches in parallel, while `pragma simd` is used internally to each iteration.
• `cilk_for`
• linear version:
`for (int idx = 0; idx < NUM_OPTIONS; idx += NUM_OPTIONS_PER_BATCH) { for (int k = idx; k < idx + NUM_OPTIONS_PER_BATCH; k++) { fptype* opt_price = new fptype[MAX_NTIMESTEPS]; fptype* upow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* dpow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* tmp_val = new fptype[MAX_NTIMESTEPS]; ... } }`
`cilk_for` version:
`cilk_for (int idx = 0; idx < NUM_OPTIONS; idx += NUM_OPTIONS_PER_BATCH) { cilk_for (int k = idx; k < idx + NUM_OPTIONS_PER_BATCH; ++k) { fptype* opt_price = new fptype[MAX_NTIMESTEPS]; fptype* upow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* dpow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* tmp_val = new fptype[MAX_NTIMESTEPS]; ... } }`
• `cilk_for + pragma simd`
• The autovectorizer can vectorize the inner `for` loops in the serial version, but require explicit vectorization for the `cilk_for` version.
serial version:
`for (int idx = 0; idx < NUM_OPTIONS; idx += NUM_OPTIONS_PER_BATCH) { for (int k = idx; k < idx + NUM_OPTIONS_PER_BATCH; ++k) { fptype* opt_price = new fptype[MAX_NTIMESTEPS]; fptype* upow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* dpow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* tmp_val = new fptype[MAX_NTIMESTEPS]; ... for (int i = 0; i <= OPT_TIMESTEPS; ++i) { opt_price[i] = std::max((exe_price - upow_tbl[OPT_TIMESTEPS - i] * dpow_tbl[i]), ZERO); } for (int i = OPT_TIMESTEPS - 1; i >= 0; --i) { for (int j = 0; j <= i; ++j) { tmp_val[j] = (p * opt_price[j] + (ONE - p) * opt_price[j+1]) * multiplier; } for (int j = 0; j <= i; ++j) { fptype t1 = upow_tbl[i - j] * dpow_tbl[j]; fptype early_exe_value = std::max(exe_price - t1, ZERO); opt_price[j] = std::max(early_exe_value, tmp_val[j]); } } ... } }`
Looking closer at the second `pragma simd`, you can see that the compiler can correctly vectorize a calculation involving a forward reference, namely using `opt_price[j]` in conjunction with `opt_price[j+1]`
`cilk_for` + `pragma simd` version:
`cilk_for (int idx = 0; idx < NUM_OPTIONS; idx += NUM_OPTIONS_PER_BATCH) { cilk_for (int k = idx; k < idx + NUM_OPTIONS_PER_BATCH; ++k) { fptype* opt_price = new fptype[MAX_NTIMESTEPS]; fptype* upow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* dpow_tbl = new fptype[MAX_NTIMESTEPS]; fptype* tmp_val = new fptype[MAX_NTIMESTEPS]; ... #pragma simd for (int i = 0; i <= OPT_TIMESTEPS; ++i) { opt_price[i] = std::max((exe_price - upow_tbl[OPT_TIMESTEPS - i] * dpow_tbl[i]), ZERO); } for (int i = OPT_TIMESTEPS - 1; i >= 0; --i) { // Vectorization is possible even with forward reference of the array opt_price (j and j+1) #pragma simd for (int j = 0; j <= i; ++j) { tmp_val[j] = (p * opt_price[j] + (ONE - p) * opt_price[j+1]) * multiplier; } #pragma simd for (int j = 0; j <= i; ++j) { fptype t1 = upow_tbl[i - j] * dpow_tbl[j]; fptype early_exe_value = std::max(exe_price - t1, ZERO); opt_price[j] = std::max(early_exe_value, tmp_val[j]); } } ... } }`

#### Performance Data:

Note: Modified Speedup shows performance speedup with respect to serial implementation.

Modified Speedup Compiler (Intel® 64) Compiler options System specifications
cilk_for: 3.63x
Both: 3.7x
Intel C++ Compiler 15.0 for Windows /O2 /QxAVX /Qipo /fp:fast /EHsc Microsoft Windows 7 Enterprise
Intel® Core™ i5-4670T CPU @2.30GHz
16GB memory
cilk_for: 1.31x
Both: 1.4x
Intel C++ Compiler 15.0 for Linux -O2 -fp-model fast -xAVX -ipo RHEL 7 (x64)
4rd Generation Intel Core™ i7-4790 CPU @ 3.60GHz
32GB memory

### Build Instructions:

• For Microsoft Visual Studio* 2010 and 2012 users:
• Open the solution `.sln` file
[Optional] To collect performance numbers (will run example 5 times and take average time):
• Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add `PERF_NUM`
[Optional] To use double-precision floats:
• Project Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions: add `DOUBLE_PRECISION`
Choose a configuration (for best performance, choose a release configuration):
• Intel-debug and Intel-release: uses Intel® C++ compiler
• VSC-debug and VSC-release: uses Visual C++ compiler (only linear/scalar will run)

Open the results `price_results_*.txt` in folder `OptionPricing`

• For Windows* Command Line users:
• Enable your particular compiler environment
For Intel® C++ Compiler:
• Open the appropriate Intel C++ compiler command prompt
• Navigate to project folder
• Compile with `Build.bat [perf_num] [double]`
• `perf_num`: collect performance numbers (will run example 5 times and take average time)
• `double`: use double-precision floats
• To run: `Build.bat run`
For Visual C++ Compiler (only linear/scalar will run):
• Open the appropriate MicrosoftVisual Studio* 2010 or 2012 command prompt
• Navigate to project folder
• To compile: `Build.bat [perf_num] [double]`
• `perf_num`: collect performance numbers (will run example 5 times and take average time)
• `double`: use double-precision floats
• To run: `Build.bat run`

Open the results `price_results_*.txt` at root project folder

• For Linux* or OS X* users:
• Set the icc environment: `source <icc-install-dir>/bin/compilervars.sh {ia32|intel64}`
Navigate to project folder
For Intel® C++ compiler:
• To compile: `make [icpc] [perf_num=1] [double=1]`
• `perf_num=1:` collect performance numbers (will run example 5 times and take average time)
• `double=1`: use double-precision floats
• To run: `make run`
For gcc (only linear/scalar will run):
• Compile with `make gcc [perf_num=1] [double=1]`
• `perf_num=1:` collect performance numbers (will run example 5 times and take average time)
• `double=1`: use double-precision floats
• To run: `make run`

Open the results `price_results_*.txt` at root project folder

For more complete information about compiler optimizations, see our Optimization Notice.