Many people have discussed how parallel programming practice can be applied to the Black-Scholes model and the Black-Scholes formula that prices European options analytically. They have successfully applied the parallelization methods to achieve high performance on European-style options pricing algorithm. However, nearly all the options written on a wide variety of underlying financial products including stocks, commodities and foreign exchanges are American-style with an early exercise clause embedded into the options contracts. Unlike the European-style options pricing problems, there is no close-form solution for this American-style option pricing problem. The pricing of American options has mainly focused on using the finite difference methods of Brennan and Schwartz , the binomial of Cox, Ross Rubinstein  and trinomial of Tian . While these numerical methods are capable of producing accurate solutions to American option pricing problems, they are also difficult to use and consume at least two magnitude more computationally resources. As a result, for the past 40 years, many talented financial mathematicians have been searching for newer and better numerical methods that can produce results with a more efficient use of computational resources. In this paper, we look at one of these successful efforts, pioneered by Barone-Adesi and Whaley , and apply the high performance parallel computing entailed in the modern microprocessors to create a program that can exceed our expectation for high performance with a suitable numerical result.
Consider an option on a stock providing a dividend yield equal to q. We will denote the difference between the American and European option price by v. Because both the American and the European option prices satisfy the Black–Scholes differential equation, v also does so.
For convenicence, We define
Without loss of generality, we can also write:
v = h(τ)g(S, h)
Change variables and substitution
The approximation involves assuming that the final term on the left-hand side is zero, so that
The ignored term is generally fairly small. When τ is large, 1-h is close to zero; when τ is small, ∂g/∂h is close to zero.
The American call and put prices at time t will be denoted by C(S, t) and P(S, t), where S is the stock price, and the corresponding European call and put price will be denoted by c(S, t) and p(S, t). Equation (1) can be solved using standard techniques. After boundary conditions have been applied, it is found that
The variable S* is the critical price of the stock above which the option should be exercised. It is estimated by solving the equation
Iteratively. For a put option, the valuation formula is
The variable S** is the critical price of the stock below which the option should be exercised. It is estimated by solving the equation
Iteratively, the other variables that have been used here are
Options on stock indices, currencies, and futures contracts are analogous to options on a stock providing a constant dividend yield. Hence the quadratic approximation approach can easily be applied to all of these types of options.
The source code for Black-Scholes-Merton formula is maintained by Shuo Li and is available under the BSD 3-Clause Licensing Agreement. The program runs natively on Intel® Xeon Phi™ processors in a single node environment.
To get access to the code and test workloads, go to the source location and download the BAWAmericanOptions.tar file.
Here are the steps for rebuilding the program:
[sli@ortce-knl7 ~]$ lscpu
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 272 On-line CPU(s) list: 0-271 Thread(s) per core: 4 Core(s) per socket: 68 Socket(s): 1 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 87 Model name: Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz Stepping: 1 CPU MHz: 1400.273 BogoMIPS: 2793.61 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K NUMA node0 CPU(s): 5,6,11,12,17,18,23-25,73-86,135-148,197-210,259-271 NUMA node1 CPU(s): 1,2,7,8,13,14,19,20,43-58,105-120,167-182,229-244 NUMA node2 CPU(s): 3,4,9,10,15,16,21,22,59-72,121-134,183-196,245-258 NUMA node3 CPU(s): 0,26-42,87-104,149-166,211-228For Double Precision processing:
Run am_call_sp.knl and am_call_dp.knl [sli@wsl-knl-02 test_baw]$ ./am_call_sp.knl Call price using Barone-Adesi Whaley approximation Optimized = 5.743389 cycles consumed is 99246 Pricing American Options using BAW Approximation in single precision. Compiler Version = 16 Release Update = 3 Build Time = May 27 2016 20:27:47 Input Dataset = 142606336 Worker Threads = 272 Completed pricing 142.60634 million options in 0.56671 seconds: Parallel version runs at 251.64027 million options per second. [sli@wsl-knl-02 test_baw]$ ./am_call_dp.knl Call price using Barone-Adesi Whaley approximation Optimized = 5.743386 cycles consumed is 122640 Pricing American Options using BAW Approximation in double precision. Compiler Version = 16 Release Update = 3 Build Time = May 27 2016 20:34:32 Input Dataset = 142606336 Worker Threads = 272 Completed pricing 142.60634 million options in 2.10704 seconds: Parallel version runs at 67.68101 million options per second.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804