In a scientific application, I need to avoid the cost of writing data to memory. I want to prevent an array of double-precision numbers to be written to memory. The array should reside in L2 cache as long as possible. The size of the array is about 64 kilobytes. The array may be read or written by other threads. At the end of execution, the array can be written to memory. Is this achievable? Are there any pragmas or functions to enforce this constraint?
For more complete information about compiler optimizations, see our Optimization Notice.